The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces a new scalable integer sort application inspired by the NAS Parallel Benchmark integer sort. We provide a detailed analysis of the NPB integer sort to motivate the development of ISx---a new integer sort for co-design. ISx is a highly modular application implemented in the OpenSHMEM parallel programming model and supports both strong and weak scaling studies.
Partitioned Global Address Space (PGAS) and one-sided communication models allow shared data to be transparently and asynchronously accessed by any process within a parallel computation. In order to ensure that updates are performed in the intended order, the programmer must either use potentially slower ordered communication, or perform operations that order unordered communication, such as a fence...
A set of parallel features, broadly referred to as Fortran coarrays, was added to the Fortran 2008 standard. It is expected that several new parallel features, designed to complement or augment this feature set, will be added to the next revision of the standard. This includes statements for forming and changing between image teams, as well as statements for performing communication and synchronization...
We investigated a software cache for PGAS PUT and GET operations. The cache is implemented as a software write-back cache with dirty bits, local memory consistency operations, and programmer-guided prefetch. This cache supports programmer productivity while enabling communication aggregation and overlap. We evaluated an implementation of this cache for remote data within the Chapel programming language...
CPU Frequency scaling is a common approach used for achieving energy savings in parallel applications. A typical approach for achieving power savings is by reducing the frequency of a processor whenever the invested CPU cycles do not contribute to the progress of an application (e.g. polling for events). Many recent research efforts have been directed towards employing this approach within HPC applications...
A subset of the Parallel Research Kernels (PRK),simplified parallel application patterns, are used to studythe behavior of different runtimes implementing the PGASprogramming model. The goal of this paper is to show thatsuch an approach is practical and effective as we approachthe exascale era. Our experimental results indicate that forthe kernels we selected, MPI with two-sided communicationsoutperforms...
XcalableMP (XMP) is a PGAS language for distributed memory environments. It employs Coarray Fortran (CAF) features as the local-view programming model. We implemented the main part of CAF in the form of a translator, i.e., a source-to-source compiler, as a part of Omni XMP compiler. The compiler uses GASNet and the Fujitsu RDMA interface to allocate static and allocatable coarrays and to get and put...
In this paper, we present an implementation of the OpenFabrics Interfaces (OFI) libfabric API in support of multithreaded PGAS programming models. Specifically, we describe a libfabric provider implementation for the Cray XCTM system using the Generic Network Interface (GNI) library. OFI libfabric is a new portable network API designed to address the needs of high performance networking software....
Structured grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained messagesizes without manual packing and unpacking. This approach is the distributed...
Accelerators/co-processors have made their way into supercomputing systems. These modern heterogeneous systems feature multiple layers of memory hierarchies, and produce a high degree of thread-level parallelism. To ensure that current and future applications perform well on these systems, it is important that users be able to cleanly express the various types of parallelism found in their applications...
This paper describes a data-centric profilingtool that provides a way to map performance problemsback to data structures in Chapel programs. Wedescribe the tool's implementation, and illustrate its usewith two simple test programs.
Parallel computers are becoming deeply hierarchical. Locality aware programming models allow programmers to control locality at one level through establishing affinity between data and executing activities. This, however, does not enable locality exploitation at other levels. Therefore, we must conceive an efficient abstraction of hierarchical locality and develop techniques to exploit it. Techniques...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Provides an abstract for each of the invited presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings.
Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.