Search results

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

Ilya Ivanov, Jing Gong, Dana Akhmetova, Ivy Bo Peng, more

2015 IEEE International Conference on Cluster Computing > 760 - 767

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with...

chapter

Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS

Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Satoshi Matsuoka

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1075 - 1083

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

With the increasing prominence of many-core architectures and decreasing per-core resources on large supercomputers, a number of applications developers are investigating the use of hybrid MPI+threads programming to utilize computational units while sharing memory. An MPI-only model that uses one MPI process per system core is capable of effectively utilizing the processing units, but it fails to...

chapter

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Felix Schmitt, Jonas Stolle, Robert Dietrich

2014 43rd International Conference on Parallel Processing Workshops > 186 - 195

2014 43nd International Conference on Parallel Processing Workshops (ICCPW)

Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...

chapter

Real Asynchronous MPI Communication in Hybrid Codes through OpenMP Communication Tasks

David Buettner, Jean-Thomas Acquaviva, Josef Weidendorfer

2013 International Conference on Parallel and Distributed Systems > 208 - 215

2013 International Conference on Parallel and Distributed Systems (ICPADS)

With the number of cores growing faster than memory per node, hybrid programming models (mixing message passing with shared memory paradigms) become a requirement for efficient use of HPC systems. For this scenario, achieving efficient communication is challenging. This is true even when using asynchronous communication, as most MPI implementations can only advance communication inside library calls...

chapter

Integrating Asynchronous Task Parallelism with MPI

Sanjay Chatterjee, Sagnak Tasirlar, Zoran Budimlic, Vincent Cave, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 712 - 725

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current "hybrid programming'' approaches...

chapter

A Transparent Collective I/O Implementation

Yongen Yu, Jingjin Wu, Zhiling Lan, Douglas H. Rudd, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 297 - 307

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

I/O performance is vital for most HPC applications especially those that generate a vast amount of data with the growth of scale. Many studies have shown that scientific applications tend to issue small and noncontiguous accesses in an interleaving fashion, causing different processes to access overlapping regions. In such scenario, collective I/O is a widely used optimization technique. However,...

chapter

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

Tyng-Yeu Liang, Hung-Fu Li, Jun-Yao Chiu

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2369 - 2377

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Hybrid CPU/GPU computing architecture recently has become an alternative platform for high performance computing. This architecture provides massive computational power with lower energy consumption and less economic cost than the traditional one using only CPUs. However, the complexity of the GPU programming is too high for users to move their applications toward this hybrid computing architecture...

chapter

Parallization of Adaboost Algorithm through Hybrid MPI/OpenMP and Transactional Memory

Kun Zeng, Yuhua Tang, Fudong Liu

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 94 - 100

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

This paper proposes a parallelization of the Adaboost algorithm through hybrid usage of MPI, OpenMP, and transactional memory. After detailed analysis of the Adaboost algorithm, we show that multiple levels of parallelism exists in the algorithm. We develop the lower level of parallelism through OpenMP and higher level parallelism through MPI. Software transactional memory are used to facilitate the...

chapter

CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications

S. Yamagiwa, L. Sousa

2009 Eighth International Symposium on Parallel and Distributed Computing > 161 - 168

2009 Eighth International Symposium on Parallel and Distributed Computing (ISPDC)

With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing units, is one of the most promising high performance parallel computing platforms, currently there is no programming environment, interface or...

chapter

Parallel Data Mining on Multicore Clusters

Xiaohong Qiu, G. Fox, Huapeng Yuan, Seung-Hee Bae, more

2008 Seventh International Conference on Grid and Cooperative Computing > 41 - 49

2008 Seventh International Conference on Grid and Cooperative Computing

The ever increasing number of cores per chip will be accompanied by a pervasive data deluge whose size will probably increase even faster than CPU core count over the next few years. This suggests the importance of parallel data analysis and data mining applications with good multicore, cluster and grid performance. This paper considers data clustering, mixture models and dimensional reduction presenting...

chapter

SysCellC: SystemC on Cell

L. Kaouane, D. Houzet, S. Huet

2008 International Conference on Computational Sciences and Its Applications > 234 - 244

International Conference on Computational Sciences and its Applications

High performance computing with low cost machines becomes a reality. As an example, the Sony playstation3 gaming console offers performances up to 150 gflops for a machinepsilas retail price of $400. Unfortunately, higher performances are achieved when the programmer exploits the architectural specificities of its Cell processor: he has to focus on inter-processor communications, task allocations...

INFONA - science communication portal

Search results

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Real Asynchronous MPI Communication in Hybrid Codes through OpenMP Communication Tasks

Integrating Asynchronous Task Parallelism with MPI

A Transparent Collective I/O Implementation

Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture

Parallization of Adaboost Algorithm through Hybrid MPI/OpenMP and Transactional Memory

CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications

Parallel Data Mining on Multicore Clusters

SysCellC: SystemC on Cell

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options