Search results

chapter

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Zhongqi Cheng, Tim Schmidt, Guantao Liu, Rainer Doomer

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT) > 74 - 81

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT)

The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...

chapter

Comparison of Threading Programming Models

Solmaz Salehian, Jiawen Liu, Yonghong Yan

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 766 - 774

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we provide comparison of languagefeatures and runtime systems of commonly used threadingparallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, NvidiaCUDA, OpenCL, C++11 and PThreads. We then report ourperformance comparison of OpenMP, Cilk Plus and C++11 fordata and task parallelism on CPU using benchmarks. The resultsshow...

chapter

Generating Performance Models for Irregular Applications

Ryan D. Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 317 - 326

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Many applications have irregular behavior — e.g., input-dependent solvers, irregular memory accesses, or unbiased branches — that cannot be captured using today's automated performance modeling techniques. We describe new hierarchical critical path analyses for the Palm model generation tool. To obtain a good tradeoff between model accuracy, generality, and generation cost, we combine static and dynamic...

chapter

More Effective Synchronization Scheme in ML Using Stale Parameters

Yabin Li, Han Wan, Bo Jiang, Xiang Long

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 757 - 764

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

In Machine learning (ML) the model we use is increasingly important, and the model's parameters, the key point of the ML, are adjusted through iteratively processing a training dataset until convergence. Although data-parallel ML systems often engage a perfect error tolerance when synchronizing the model parameters for maximizing parallelism, the synchronization of model parameters may delay in completion,...

chapter

Image Feature Matching and Its Parallelization Using OpenMP

Nupur Kohli, Teng-Sheng Moh

2016 International Conference on Collaboration Technologies and Systems (CTS) > 249 - 256

2016 International Conference on Collaboration Technologies and Systems (CTS)

Parallel Computing has been gaining interest nowadays due to physical constraints preventing frequency scaling. Therefore, in order to achieve high performance on multicore systems, programmers need to focus on parallelizing their programs. Although there are many available parallelized APIs written by experts that should improve coding, they do not automatically guarantee good performance. This paper...

chapter

Outline of a Thick Control Flow Architecture

Martti Forsell, Jussi Roivainen, Ville Leppanen

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 1 - 6

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features...

chapter

A lightweight OpenMP4 run-time for embedded systems

Roberto E. Vargas, Sara Royuela, Maria A. Serrano, Xavi Martorell, more

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) > 43 - 49

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

OpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities. Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption. This paper proposes an OpenMP4 run-time that reduces the memory consumption while...

chapter

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

Joo Hwan Lee, Jaewoong Sim, Hyesoon Kim

2015 International Conference on Parallel Architecture and Compilation (PACT) > 241 - 252

2015 International Conference on Parallel Architecture and Compilation (PACT)

Parallel machine learning workloads have become prevalent in numerous application domains. Many of these workloads are iterative convergent, allowing different threads to compute in an asynchronous manner, relaxing certain read-after-write data dependencies to use stale values. While considerable effort has been devoted to reducing the communication latency between nodes by utilizing asynchronous...

chapter

Performance modeling of computation and communication tradeoffs in vertex-centric graph processing clusters

Amirreza Abdolrashidi, Lakshmish Ramaswamy, David Seamus Narron

10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing > 55 - 63

2014 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom)

Distributed vertex-centric graph processing systems have been recently proposed to perform different types of analytics on large graphs. These systems utilize the parallelism of shared nothing clusters. In this work we propose a novel model for the performance cost of such clusters.We also define novel metrics related to the workload balance and network communication cost of clusters processing massive...

chapter

Task assignments based on shared memory multi-core communication

Xiaojie Xu, Network Center, Lisheng Wang

The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014) > 324 - 328

2014 2nd International Conference on Systems and Informatics (ICSAI)

In the ongoing quest for greater computational power, single-core processors exposed many limitations. Multi-core processors become the inevitable product of technological development and application requirements. Then, it also brings many problems of fundamental technologies to be resolved. Inter-core communication is a part of them. The major manufacturers have proposed different inter-core communication...

chapter

Improving performance of optimistic simulation for distributed simulation system using speculative computation

Murugadoss Venu, Inwhee Joe

2014 International Conference on Information and Communication Technology Convergence (ICTC) > 428 - 432

2014 International Conference on Information and Communication Technology Convergence (ICTC)

Synchronization and Time management are the important mechanisms for the parallel discrete event simulation. Time management ensures events are executed in the correct order without any repeated execution. Synchronization management is important in ensuring faster execution of synchronization procedure and while reducing the wait time for synchronization. In this paper, we studied about predicting...

chapter

Analyzing the impact of programming models for efficient communication overlap in high-speed networks

Gladys Utrera, Marisa Gil, Xavier Martorell

2014 International Conference on High Performance Computing & Simulation (HPCS) > 218 - 225

2014 International Conference on High Performance Computing & Simulation (HPCS)

Exascale applications for civil engineering, simulations and other fields related with current research make intensive use of large sparse matrices. A characteristic of these matrices is the difficulty of balancing communication and computation, so that even when these two phases are overlapped the application does not achieve a good overall scalability, but instead suffers from a loss of performance...

chapter

A Compile-Time Cost Model for Automatic OpenMP Decoupled Software Pipelining Parallelization

Xiaoxian Liu, Rongcai Zhao, Lin Han

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 253 - 260

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

The prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. OpenMP Decoupled Software Pipelining (DSWP) is proposed to exploit pipeline parallelism lurking in ordinary programs, which cannot be dealt with by traditional techniques. While cost model is important in helping evaluate...

chapter

Integrating Asynchronous Task Parallelism with MPI

Sanjay Chatterjee, Sagnak Tasirlar, Zoran Budimlic, Vincent Cave, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 712 - 725

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current "hybrid programming'' approaches...

chapter

Research on Parallel Computing Model for Cubic-R Architecture

Nan Yu, Shan Zheng

2012 Fourth International Conference on Multimedia Information Networking and Security > 870 - 873

2012 4th International Conference on Multimedia Information Networking and Security (MINES)

Parallel computing model plays a great basic role in advanced computing; Based on researching existing parallel computing models, this paper brings forward a parallel computing model-Layer Forward Net toward Cubic-R architecture, and describes the model's structure, parameter, logic abstractly. Lastly towards typical N-Body problem, this paper designs a parallel algorithm, and analyses its complexity...

chapter

SafeBTW: A Scalable Optimistic Yet Non-risky Synchronization Algorithm

Yaocheng Zhang, Ge Li

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation > 75 - 77

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS)

A new optimistic synchronization algorithm for Parallel Discrete Event Simulation (PDES) called Safe BTW is proposed in this paper. This new algorithm eliminates risky event processing in the Time Warp processing stage of the original BTW algorithm and is founded on a concept called "safe causal relation". In our new algorithm, the length of any chained rollback operations is limited to...

chapter

Single Operation Multiple Data - Data Parallelism at Subroutine Level

Eduardo Marques, Herve Paulino

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 254 - 261

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

The parallel nature of the multi-core architectural design can only be fully exploited by concurrent applications. This status quo pushed productivity to the forefront of the language design concerns. The community is demanding for new solutions in the design, compilation, and implementation of concurrent languages, making this research area one of great importance and impact. To that extent this...

chapter

Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds

Gueyoung Jung, Nathan Gnanasambandam, Tridib Mukherjee

2012 IEEE Fifth International Conference on Cloud Computing > 811 - 818

2012 IEEE 5th International Conference on Cloud Computing (CLOUD)

Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be...

chapter

An Empirical Performance Study of Chapel Programming Language

Nan Dun, Kenjiro Taura

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 497 - 506

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper we evaluate the performance of the Chapel programming language from the perspective of its language primitives and features, where the micro benchmarks are synthesized from our lessons learned in developing molecular dynamics simulation programs in Chapel. Experimental results show that most language building blocks have comparable performance to corresponding hand-written C code, while...

chapter

Accelerating Block Cryptography Algorithms in Procedure Level Speculation

Yaobin Wang, Hong An, Zhiqin Liu, Kang Xu, more

2011 Seventh International Conference on Computational Intelligence and Security > 874 - 877

2011 Seventh International Conference on Computational Intelligence and Security (CIS)

How to make use of multicore computing resources to accelerate the block cryptography applications has become a common concern problem. And the block cryptography applications have not yet been explored in procedure level speculation thoroughly. This paper proposes a procedure level speculation mechanism for accelerating block cryptography applications, including execution model, synchronization strategy...

INFONA - science communication portal

Search results

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Comparison of Threading Programming Models

Generating Performance Models for Irregular Applications

More Effective Synchronization Scheme in ML Using Stale Parameters

Image Feature Matching and Its Parallelization Using OpenMP

Outline of a Thick Control Flow Architecture

A lightweight OpenMP4 run-time for embedded systems

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

Performance modeling of computation and communication tradeoffs in vertex-centric graph processing clusters

Task assignments based on shared memory multi-core communication

Improving performance of optimistic simulation for distributed simulation system using speculative computation

Analyzing the impact of programming models for efficient communication overlap in high-speed networks

A Compile-Time Cost Model for Automatic OpenMP Decoupled Software Pipelining Parallelization

Integrating Asynchronous Task Parallelism with MPI

Research on Parallel Computing Model for Cubic-R Architecture

SafeBTW: A Scalable Optimistic Yet Non-risky Synchronization Algorithm

Single Operation Multiple Data - Data Parallelism at Subroutine Level

Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds

An Empirical Performance Study of Chapel Programming Language

Accelerating Block Cryptography Algorithms in Procedure Level Speculation

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options