The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...
In this paper, we provide comparison of languagefeatures and runtime systems of commonly used threadingparallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, NvidiaCUDA, OpenCL, C++11 and PThreads. We then report ourperformance comparison of OpenMP, Cilk Plus and C++11 fordata and task parallelism on CPU using benchmarks. The resultsshow...
Many applications have irregular behavior — e.g., input-dependent solvers, irregular memory accesses, or unbiased branches — that cannot be captured using today's automated performance modeling techniques. We describe new hierarchical critical path analyses for the Palm model generation tool. To obtain a good tradeoff between model accuracy, generality, and generation cost, we combine static and dynamic...
In Machine learning (ML) the model we use is increasingly important, and the model's parameters, the key point of the ML, are adjusted through iteratively processing a training dataset until convergence. Although data-parallel ML systems often engage a perfect error tolerance when synchronizing the model parameters for maximizing parallelism, the synchronization of model parameters may delay in completion,...
Parallel Computing has been gaining interest nowadays due to physical constraints preventing frequency scaling. Therefore, in order to achieve high performance on multicore systems, programmers need to focus on parallelizing their programs. Although there are many available parallelized APIs written by experts that should improve coding, they do not automatically guarantee good performance. This paper...
The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features...
OpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities. Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption. This paper proposes an OpenMP4 run-time that reduces the memory consumption while...
Parallel machine learning workloads have become prevalent in numerous application domains. Many of these workloads are iterative convergent, allowing different threads to compute in an asynchronous manner, relaxing certain read-after-write data dependencies to use stale values. While considerable effort has been devoted to reducing the communication latency between nodes by utilizing asynchronous...
Distributed vertex-centric graph processing systems have been recently proposed to perform different types of analytics on large graphs. These systems utilize the parallelism of shared nothing clusters. In this work we propose a novel model for the performance cost of such clusters.We also define novel metrics related to the workload balance and network communication cost of clusters processing massive...
In the ongoing quest for greater computational power, single-core processors exposed many limitations. Multi-core processors become the inevitable product of technological development and application requirements. Then, it also brings many problems of fundamental technologies to be resolved. Inter-core communication is a part of them. The major manufacturers have proposed different inter-core communication...
Synchronization and Time management are the important mechanisms for the parallel discrete event simulation. Time management ensures events are executed in the correct order without any repeated execution. Synchronization management is important in ensuring faster execution of synchronization procedure and while reducing the wait time for synchronization. In this paper, we studied about predicting...
Exascale applications for civil engineering, simulations and other fields related with current research make intensive use of large sparse matrices. A characteristic of these matrices is the difficulty of balancing communication and computation, so that even when these two phases are overlapped the application does not achieve a good overall scalability, but instead suffers from a loss of performance...
The prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. OpenMP Decoupled Software Pipelining (DSWP) is proposed to exploit pipeline parallelism lurking in ordinary programs, which cannot be dealt with by traditional techniques. While cost model is important in helping evaluate...
Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current "hybrid programming'' approaches...
Parallel computing model plays a great basic role in advanced computing; Based on researching existing parallel computing models, this paper brings forward a parallel computing model-Layer Forward Net toward Cubic-R architecture, and describes the model's structure, parameter, logic abstractly. Lastly towards typical N-Body problem, this paper designs a parallel algorithm, and analyses its complexity...
A new optimistic synchronization algorithm for Parallel Discrete Event Simulation (PDES) called Safe BTW is proposed in this paper. This new algorithm eliminates risky event processing in the Time Warp processing stage of the original BTW algorithm and is founded on a concept called "safe causal relation". In our new algorithm, the length of any chained rollback operations is limited to...
The parallel nature of the multi-core architectural design can only be fully exploited by concurrent applications. This status quo pushed productivity to the forefront of the language design concerns. The community is demanding for new solutions in the design, compilation, and implementation of concurrent languages, making this research area one of great importance and impact. To that extent this...
Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be...
In this paper we evaluate the performance of the Chapel programming language from the perspective of its language primitives and features, where the micro benchmarks are synthesized from our lessons learned in developing molecular dynamics simulation programs in Chapel. Experimental results show that most language building blocks have comparable performance to corresponding hand-written C code, while...
How to make use of multicore computing resources to accelerate the block cryptography applications has become a common concern problem. And the block cryptography applications have not yet been explored in procedure level speculation thoroughly. This paper proposes a procedure level speculation mechanism for accelerating block cryptography applications, including execution model, synchronization strategy...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.