Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In this paper, we provide comparison of languagefeatures and runtime systems of commonly used threadingparallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, NvidiaCUDA, OpenCL, C++11 and PThreads. We then report ourperformance comparison of OpenMP, Cilk Plus and C++11 fordata and task parallelism on CPU using benchmarks. The resultsshow...
MPI (Message Passing Interface), OpenMP are two tools broadly used to develop parallel programs. On the one hand, MPI has the advantage of high performance while being difficult to use. On the other hand, OpenMP is very easy to use but is restricted to shared-memory architectures. CAPE is an approach based on checkpoints to allow the execution of OpenMP programs on distributed-memory architectures...
We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-Net with Intel's Concurrent Collections (CnC). As a coordination language S-Net achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data...
Parallel programming and data-parallel algorithms have been the main techniques supporting high-performance computing for many decades. A major conceptual step was taken by L. Valiant who introduced the Bulk-Synchronous Parallel (BSP) model. Parallel algorithms on BSP can be designed and measured by taking into account not only the classical balance between time and parallel space but also communication...
Application portability between different multicore architecture-parallel programming paradigm/tool pairs is a big problem nowadays leading often to a complete rewrite of an application when switching from an architecture-paradigm pair to another. This is caused by a wide variety of architectural properties requiring different optimization techniques for different architectures, typically hiding the...
Predicting how well applications may run on modern systems is becoming increasingly challenging. It is no longer sufficient to look at number of floating point operations and communication costs, but one also needs to model the underlying systems and how their topology, heterogeneity, system loads, etc, may impact performance. This work focuses on developing a practical model for heterogeneous computing...
The Coarse-Grained Monte Carlo (CGMC) method is a multi-scale stochastic mathematical and simulation framework for spatially distributed systems. CGMC simulations are important tools for studying phenomena such as catalysis, crystal growth, surface diffusion, phase transitions on single crystals, and cell membrane receptor dynamics. In parallel CGMC, the tau-leap method is used for parallel simulations...
Sparse and unstructured computations are widely involved in scientific and engineering applications. It means that data arrays could be indexed indirectly through the values of other arrays or non-affine subscripts. Data access pattern would not be known until runtime. So far all the parallel computing strategies for this kind of irregular problem are single network topology oriented, which cannot...
Owing to the fast development of network technologies, executing parallel programs on distributed systems that connect heterogeneous machines became feasible but we still face some challenges: Workload imbalance in such environment may not only be due to uneven load distribution among machines as in parallel systems but also due to distribution that is not adequate with the characteristics of each...
Correctness and performance are the principal requirement of a parallel system. Due to the complicated and uncertainty, it is necessary to model it. A hierarchical TCPN model proposed in this paper can investigate on various levels of abstraction and analyze concerning performance, functional validity and correctness. It describes the parallel program and the resources respectively to bring less effect...
We present a scalable spatial decomposition coloring approach to implement molecular dynamics simulations with embedded atom method (EAM) on multi-core architectures. It effectively solves parallelization of reduction operations on irregular arrays in molecular dynamics simulations. In OpenMP program model, our methodology avoids that the same memory location is simultaneously modified by more than...
This paper introduces a new parallel programming model motivated by: 1) the concept that computation should move to, and execute near, the global data which it accesses, 2) a set of extended memory semantics to provide fine-grained global synchronization, 3) architectural support for fast lightweight thread creation/destruction/migration, and 4) the need for a high performance language to provide...
As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explicitly synchronized communication with other processes in the system. Accordingly, several modern and legacy parallel programming models (such as MPI, UPC, global arrays) have provided many programming constructs that enable...
The behavioral correctness of parallel programs has a pivotal role in computational sciences and engineering applications as researchers draw scientific conclusions from the results generated by parallel applications. Moreover, with the advent of multicore processors, the development of parallel programs should be facilitated for the mainstream developers. While numerous programming models and APIs...
In this work we present optimizations of a grid-based projector-augmented wave method software, GPAW for the Blue Gene/P architecture. The improvements are achieved by exploring the advantage of shared and distributed memory programming also known as hybrid programming. The work focuses on optimizing a very time consuming operation in GPAW, the finite-different stencil operation, and different hybrid...
The development of computer processor has stepped into the era of multi-core, providing a good chance to spread the parallel discrete event simulation. The parallel programming model and synchronization problem during the parallelization of discrete event simulation on multi-core platform were discussed. A parallel discrete event simulator based on multi-core platform was designed and implemented...
Location consistency (LC) is a weak memory consistency model which is defined entirely on partial order execution semantics of parallel programs. Compared with sequential consistency (SC), LC is scalable and provides ample theoretical parallelism. This makes LC an interesting memory model in the upcoming many-core parallel processing era. Previous work has pointed out that LC does not guarantee SC...
The emergence of multi-core and many-core processors has introduced new opportunities and challenges to EDA research and development. While the availability of increasing parallel computing power holds new promise to address many computing challenges in CAD, the leverage of hardware parallelism can only be possible with a new generation of parallel CAD applications. In this paper, we propose a novel...
Computer clusters are a very cost-effective approach for high performance computing, but simulating a complete cluster is still an open research problem. The obvious approach - to parallelize individual node simulators - is complex and slow. Combining individual parallel simulators implies synchronizing their progress of time. This can be accomplished with a variety of parallel discrete event simulation...
Traditional implementations of conditional critical regions and monitors can lead to unproductive "busy waiting" if processes are allowed to wait on arbitrary boolean expressions. Techniques from global flow analysis may be employed at compile time to obtain information about which critical regions (monitor calls) are enabled by the execution of a given critical region (monitor call). We...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.