The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the recent literature, drug design relying on molecular docking (MD) techniques is becoming a very promising field. Most of these techniques rely on the way ligands interact with protein target using only one binding site, in addition, they ignore the fact that assorted ligands interact with unconnected parts of the target. However, by taking the latter fact into consideration, the computational...
String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...
Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...
Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...
The conventional OpenCL 1.x style CPU-GPU heterogeneous computing paradigm treats the CPU and GPU processors as loosely connected separate entities. At best each executes independent tasks, but, more commonly, the CPU idles while waiting for results from the GPU. No data-sharing and communications are allowed during kernel execution. This model limits the number of applications that can harness the...
Modern GPUs embrace on-chip cache memory to exploit the locality present in applications. However, the behavior and effect of the cache on GPUs are different from those on conventional processors due to the Single Instruction Multiple Thread (SIMT) thread execution model and resulting memory access patterns. Previous studies report that caching data can hurt the performance due to increased memory...
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core Intel® architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include...
Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...
Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.
This paper presents the design and implementation of a hardwired OS kernel circuitry inside a Java application processor to provide the system services that are traditionally implemented in software. The hardwired system functions in the proposed SoC include the thread manager, the memory manager, and the I/O subsystem interface. There are many advantages in making the OS kernel a hardware component,...
Data analytics is undergoing a revolution in many scientific domains, and demands cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like the Xeon Phi have an order of magnitude greater computation power and memory bandwidth. To harness their computing capabilities,...
With the continuously increasing power of computation, especially in the region of parallel computing, computerbased texture analysis, computer-assisted classification methods, automated pathology detections, etc. are more and more commonly performed on medical images, like X-ray, Magnetic Resonance (MR) images, for clinical or scientific purposes. These procedures almost always include a stage of...
A recent research trend is driven to increase the monitoring and control capabilities of low voltage networks. This paper describes a probabilistic forecasting methodology based on kernel density estimation and that makes use of distributed computing techniques to create a highly scalable forecasting system for LV networks. The results show that the proposed algorithm outperforms three benchmark models...
Brain storm optimization (BSO) is a newly emerging family of swarm intelligence techniques inspired by the human's creative problem-solving process, which has achieved successes in many applications. BSO is characterized by its unique process of grouping a population of ideas and carrying out brainstorming based on the grouped ideas to search for optima generation by generation. Although the original...
Dirty Copy on Write (COW) vulnerability, discovered by Phil Oester on October 2016, it is a serious vulnerability which could escalate unprivileged user to gain full control on devices (Computers, Mobile Smart Phones, Gaming devices that run Linux based operating systems). This means that any user who exploits this bug, would escalate his/her privileges; and can do anything either locally or remotely...
Signal processing is of central importance in biomedical systems, in which pre-processing steps are unavoidable in order to reduce noise, remove unwanted artefacts, segment time series into smaller epochs, or extract statistical and other descriptive features that can be used in consecutive classification stages. The high sampling rates and electrode counts used e.g. in advanced EEG or body-surface...
The power consumed by memory system in GPUs is a significant fraction of the total chip power. As thread level parallelism increases, GPUs are likely to stress cache and memory bandwidth even more, thereby exacerbating power consumption. We observe that neighboring concurrent thread arrays (CTAs) within GPU applications share considerable amount of data. However, the default GPU scheduling policy...
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts...
CNNs (Convolutional Neural Networks) have demonstrated superior results in a wide range of applications. However, the time-consuming convolution operations required by CNNs pose great challenges to designers. GPGPUs (General Purpose Graphic Processing Units) have been widely used to exploiting the massive parallelism of convolution operations. This paper proposes a software-based loop-unrolling technique...
Data movement is increasingly becoming the bottleneck of both performance and energy efficiency in modern computation. Until recently, it was the case that there is limited freedom for communication optimization on GPUs, as conventional GPUs only provide two types of methods for inter-thread communication: using shared memory or global memory. However, a new warp shuffle instruction has been introduced...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.