The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with...
With the increasing prominence of many-core architectures and decreasing per-core resources on large supercomputers, a number of applications developers are investigating the use of hybrid MPI+threads programming to utilize computational units while sharing memory. An MPI-only model that uses one MPI process per system core is capable of effectively utilizing the processing units, but it fails to...
Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...
With the number of cores growing faster than memory per node, hybrid programming models (mixing message passing with shared memory paradigms) become a requirement for efficient use of HPC systems. For this scenario, achieving efficient communication is challenging. This is true even when using asynchronous communication, as most MPI implementations can only advance communication inside library calls...
Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current "hybrid programming'' approaches...
I/O performance is vital for most HPC applications especially those that generate a vast amount of data with the growth of scale. Many studies have shown that scientific applications tend to issue small and noncontiguous accesses in an interleaving fashion, causing different processes to access overlapping regions. In such scenario, collective I/O is a widely used optimization technique. However,...
Hybrid CPU/GPU computing architecture recently has become an alternative platform for high performance computing. This architecture provides massive computational power with lower energy consumption and less economic cost than the traditional one using only CPUs. However, the complexity of the GPU programming is too high for users to move their applications toward this hybrid computing architecture...
This paper proposes a parallelization of the Adaboost algorithm through hybrid usage of MPI, OpenMP, and transactional memory. After detailed analysis of the Adaboost algorithm, we show that multiple levels of parallelism exists in the algorithm. We develop the lower level of parallelism through OpenMP and higher level parallelism through MPI. Software transactional memory are used to facilitate the...
With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing units, is one of the most promising high performance parallel computing platforms, currently there is no programming environment, interface or...
The ever increasing number of cores per chip will be accompanied by a pervasive data deluge whose size will probably increase even faster than CPU core count over the next few years. This suggests the importance of parallel data analysis and data mining applications with good multicore, cluster and grid performance. This paper considers data clustering, mixture models and dimensional reduction presenting...
High performance computing with low cost machines becomes a reality. As an example, the Sony playstation3 gaming console offers performances up to 150 gflops for a machinepsilas retail price of $400. Unfortunately, higher performances are achieved when the programmer exploits the architectural specificities of its Cell processor: he has to focus on inter-processor communications, task allocations...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.