The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator...
The present paper introduces the XcalableACC (XACC) programming model, which is a hybrid model of the XcalableMP (XMP) Partitioned Global Address Space (PGAS) language and OpenACC. XACC defines directives that enable programmers to mix XMP and OpenACC directives in order to develop applications that can use accelerator clusters with ease. Moreover, in order to improve the performance of stencil applications,...
The first exascale supercomputers are expected by the end of this decade and will presumably feature an increase in core count, but a decrease in the amount of memory available per core. As of now, it is still unclear if the current programming models will provide high performance on exascale systems. One programming model considered to be an alternative to MPI is the so-called partitioned global...
Many of today's embedded devices are based on MultiProcessor System-on-Chips(MPSoCs) Such devices are usually heterogeneous, containing DSPs and specialized accelerators as well as one or more CPUs. This heterogeneity allows efficient implementations in specialized domains but is a barrier to their wider use. They are difficult to program as only the CPU is directly exposed to the programmer with...
I/O performance is vital for most HPC applications especially those that generate a vast amount of data with the growth of scale. Many studies have shown that scientific applications tend to issue small and noncontiguous accesses in an interleaving fashion, causing different processes to access overlapping regions. In such scenario, collective I/O is a widely used optimization technique. However,...
Deterministic parallelism promises many benifits for parallel programming. Exist deterministic runtimes, however, either require modifying programs to use a restricted set of synchronization primitives, or limit calability by relying on a centralized, deterministic thread scheduler. We addressed these challenges with single producer multi-consumer (SPMC) virtual memory as a possible foundation for...
Xcalable MP (XMP) is a partitioned global address space language, which is directive based. In XMP, programmers can include explicit synchronizations by adding directives to their source code. In this sense, XMP provides programmers with performance awareness. As such, part of the performance of programs can be attributed to the programmers, i.e., XMP requires interactive programming by the programmers...
We are facing a hardware revolution given by the increasing availability of multicore computers, clusters, Grids, and combinations of these. Consequently, there is plenty of computational power, but today's programmers are not fully prepared to exploit parallelism and distribution. Particularly, Java has helped in handling the heterogeneity of such environments, but there is a lack of facilities to...
In this paper we evaluate the performance of the Chapel programming language from the perspective of its language primitives and features, where the micro benchmarks are synthesized from our lessons learned in developing molecular dynamics simulation programs in Chapel. Experimental results show that most language building blocks have comparable performance to corresponding hand-written C code, while...
Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL,...
Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used - with a crossbar-like medium inside each cluster and a network-on-chip (NoC) at the global level - which make memory operations non-uniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where...
We have developed a high-level language, called Kanor, for declaratively specifying communication in parallel programs. Designed as an extension of C++, it serves to coordinate partitioned address space programs written in the bulk synchronous parallel (BSP) style. Kanor's declarative semantics enable the programmers to write correct and maintainable parallel applications. The communication abstraction...
The paper presents a method that allows formal refinement of BSP programs. We may consider a BSP program as a set of parameterized processes that communicate via message-passing. A parameterized process is refined into a sequence of BSP super steps each containing an ordinary sequential process and a communication process. The method uses parameterized pre- and post-conditions, and takes into account...
The key to performance improvements in the multi-core era is for software to utilize the available concurrency. This paper presents a lightweight programming framework called Gossamer that is easy to use, enables the solution of a broad range of parallel programming problems, and produces efficient code. Gossamer contains (1) a set of high-level annotations that one adds to a sequential program to...
Software reuse technology can improve the efficiency of program development greatly. A reusable Apla-Java component has been developed in the research of PAR (Partition and Recur) method and their tools. We have made the most of reuse-driven software theory and the partial implementation theory for reference which ensure the accuracy of the components effectively. Apla-Java component is an important...
Open MPD is a language extension for programming on distributed memory systems that helps users by having minimal and simple notations. Although MPI is the de facto standard for parallel programming on distributed memory systems, writing MPI programs is often a time-consuming and complicated process. Open MPD supports typical parallelization-based on the data parallel paradigm and work sharing, and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.