Search results

chapter

DeTrans: Deterministic and Parallel execution of Transactions

Vesna Smiljkovic, Srdan Stipic, Christof Fetzer, Osman Unsal, more

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 152 - 159

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Deterministic execution of a multithreaded application guarantees the same output as long as the application runs with the same input parameters. Determinism helps a programmer to test and debug an application and to provide fault-tolerance in the systems based on replicas. Additionally, Transactional Memory (TM) greatly simplifies development of multithreaded applications where applications use transactions...

chapter

Hybrid dynamic data race detection in systemC

Alper Sen, Onder Kalaci

Proceedings of the 2014 Forum on Specification and Design Languages (FDL) > 978-2-9530504-9-3 > 1 - 6

2014 Forum on Specification and Design Languages (FDL)

Data races are one of the most common problems in concurrent programs. As SystemC standard allows nondeterministic scheduling of processes, this leads to data races. Hence, different executions of the same concurrent program may lead to unexpected results due to race conditions. We develop a hybrid dynamic data race detection algorithm for SystemC/TLM designs that adopts the well-studied dynamic race...

chapter

A comparison of parallel systemc simulation approaches at RTL

Bastian Haetzer, Martin Radetzki

Proceedings of the 2014 Forum on Specification and Design Languages (FDL) > 978-2-9530504-9-3 > 1 - 8

2014 Forum on Specification and Design Languages (FDL)

This paper presents a holistic comparison of different parallel SystemC simulation approaches at the register transfer level (RTL). The effect of RTL modeling styles and simulation strategies on performance will be evaluated to show potentials and limitations of state of the art parallel simulation techniques on shared memory machines. Experiments show that the simulation performance strongly depends...

chapter

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

Akihiko Kasagi, Koji Nakano, Yasuaki Ito

2014 43rd International Conference on Parallel Processing > 251 - 260

2014 43nd International Conference on Parallel Processing (ICPP)

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main contribution of this paper is to introduce the...

chapter

Hydra: Efficient Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architecture

Zhuofang Dai, Haojun Wang, Weihua Zhang, Haibo Chen, more

2014 43rd International Conference on Parallel Processing > 331 - 340

2014 43nd International Conference on Parallel Processing (ICPP)

Detecting concurrency bugs, such as data race, atomicity violation and order violation, is a cumbersome task for programmers. This situation is further being exacerbated due to the increasing number of cores in a single machine and the prevalence of threaded programming models. Unfortunately, many existing software-based approaches usually incur high runtime overhead or accuracy loss, while most hardware-based...

chapter

Pinso: Precise Isolation of Concurrency Bugs via Delta Triaging

Bo Liu, Zhengwei Qi, Bin Wang, Ruhui Ma

2014 IEEE International Conference on Software Maintenance and Evolution > 201 - 210

2014 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Concurrent programs are known to be difficult to test and maintain. These programs often fail because of concurrency bugs caused by non-deterministic interleavings among shared memory accesses. Even though a concurrency bug can be detected, it is still hard to isolate the root cause of the bug, due to the challenge in understanding the complex thread interleavings or schedules. In this paper, we propose...

chapter

Disruption-free software updates in automation systems

Michael Wahler, Manuel Oriol

Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA) > 1 - 8

2014 IEEE Emerging Technology and Factory Automation (ETFA)

Automation systems must primarily be deterministic and reliable, especially in safety-critical environments. With recent trends such as mass customization or Industry 4.0, there is an increasing need for automation systems to be dynamic. Changing parts of the software of today's automation systems, however, typically requires rebooting the controller, which makes software updates a complex and costly...

chapter

How Processor Speedups Can Slow Down I/O Performance

Hung-Ching Chang, Bo Li, Matthew Grove, Kirk W. Cameron

2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems > 395 - 404

2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS

Power states in power-scalable systems are managed to maximize performance and reduce energy waste. Power-scalable processor capabilities (e.g., Intel Turbo Boost) embrace a "faster is better" approach to power management. While these technologies can vastly improve performance and energy efficiency, there is a growing body of evidence that "faster is not always better". For example,...

chapter

A Comparison of Graphics Processor Architectures for RFID Simulation

Renato Ferrero, Bartolomeo Montrucchio, Lorenzo David, Kargar Ebrahim, more

2014 17th International Conference on Network-Based Information Systems > 8 - 14

2014 17th International Conference on Network-Based Information Systems (NBiS)

Graphics Processing Units (GPUs) have a huge number of cores to speed up graphical computations and they are being used in a wide area of general-purpose applications that require high performances. In this paper, GPU computing is exploited to model the signal propagation and the interference in large RFID systems, which are a promising solution for achieving pervasive computing since they offer the...

chapter

Generating On-Chip Heterogeneous Systems from High-Level Parallel Code

Alessandro Cilardo, Luca Gallo

2014 17th Euromicro Conference on Digital System Design > 161 - 168

2014 17th Euromicro Conference on Digital System Design (DSD)

This work addresses the generation of parallel on-chip heterogeneous systems starting from high-level code with explicit parallelism, based on a custom compiler and a high-level synthesis flow. Blending parallel software programming paradigms with high-level synthesis introduces a range of challenges at both the architectural level and the programming paradigm level, particularly involving the mismatches...

chapter

A Speculative Mechanism for Barrier Synchronization

Meng Jinglei, Chen Tianzhou, Pan Ping, Yao Jun, more

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 858 - 865

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Barrier are synchronization operations widely used by compiler and programmer, it is flexible and convenient but there are some defects. Threads arrive at barrier ahead of other threads have to wait the subsequent threads. This lead to some waste of time. Our experiments show that up to 35% of the total execution time is wasted on synchronization. Inspired by this, we propose barrier speculation which...

chapter

Flexible Parallelized Empirical Mode Decomposition in CUDA for Hilbert Huang Transform

Kevin P.Y. Huang, Charles H.P. Wen, Herming Chiueh

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 1125 - 1133

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Hilbert-Huang Transform (HHT) is a process of adaptive analysis applicable to non-linear and non-stationary data such as voice and biomedical signals. Empirical Mode Decomposition (EMD) is a key in HHT and decomposes data into multiple Intrinsic Mode Functions (IMFs). Traditionally, EMD is computed on all data points in a serial manner, thus making its execution time grows at least linearly with the...

chapter

Efficient Work-Stealing with Blocking Deques

Liu Chi, Song Ping, Liu Yi, Hao Qinfen

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 149 - 152

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Work stealing is a popular and effective approach to implement load balancing in modern multi-/many-core systems, where each parallel thread has its local deque to maintain its own work-set of tasks and performs load balancing by stealing tasks from other deques. Unfortunately, the existing concurrent deques have two limitations. Firstly, these algorithms require memory fences in the owner's critical...

chapter

Core Affinity Code Block Schedule to Reduce Inter-core Data Synchronization of SpMT

John Ye, Songyuan Li, Tianzhou Chen, Minghui Wu, more

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 1002 - 1007

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Extract parallelism from programs is growing important as the number of cores of processors is increasing. Parallelization usually involves splitting a sequential thread, and schedule the split code to run on multiple cores. E.g. Some previous Speculative Multi-Threading research used code block reordering to automatically parallelize a sequential thread on multi-core processors. Although the parallelized...

chapter

Research on Mahalanobis Distance Algorithm Optimization Based on OpenCL

Qingchun Xie, Yunquan Zhang, Haipeng Jia, Yongquan Lu

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 84 - 91

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Mahalanobis distance algorithms has been widely used in machine learning and classification algorithms, and it has an important practical significance in improving the performance of some applications through GPU, especially in some applications with high real-time demand. However, due to the complexity of the GPU hardware architectures, how to complete the algorithm optimization and achieve high...

chapter

An efficient dynamic scheduling scheme for H.264/AVC encoding on multi-core architecture

Dung Vu, Jeremy Castillo, Laxmi Bhuyan

2014 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2014 IEEE International Conference on Multimedia and Expo (ICME)

The popular wave front parallelization has been proposed to encode H.264/AVC video employing macro-block level parallelism. This approach, however, fails to achieve an optimum performance due to a significant overhead of barrier-based synchronization. All threads must wait for the slowest ones to complete encoding before starting a next processing wave. In this paper, we propose a dynamic scheduling...

chapter

Predicting performance in the presence of software and hardware resource bottlenecks

Subhasri Duttagupta, Rupinder Virk, Manoj Nambiar

International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2014) > 542 - 549

2014 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS)

Scalability of a multi-tier enterprise system is limited by the presence of software and hardware resource bottlenecks. These bottlenecks typically occur at larger number of users. It would help enterprise applications significantly if these bottlenecks are known a-priori during the performance testing itself. This paper deals with predicting the performance of such systems and models an application...

chapter

PCJ - Java library for high performance computing in PGAS model

Marek Nowicki, Lukasz Gorski, Patryk Grabrczyk, Piotr Bala

2014 International Conference on High Performance Computing & Simulation (HPCS) > 202 - 209

2014 International Conference on High Performance Computing & Simulation (HPCS)

This paper presents the application of the PCJ library for the parallelization of the selected HPC applications implemented in Java language. The library is motivated by partitioned global address space (PGAS) model represented by Co-Array Fortran, Unified Parallel C, X10 or Titanium.

chapter

Instruction-based high-efficient synchronization in a many-core Network-on-Chip processor

Zhenqi Wei, Peilin Liu, Zhencheng Zeng, Jiangwei Xu, more

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2193 - 2196

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

Parallelized applications running on many-core Network-on-Chip (NoC) processors may consume a great part of execution time to synchronize threads mapped on multiple NoC nodes, if synchronization for NoC processors is not carefully designed. In this paper, we propose an instruction-based synchronization solution applied in a packet-switched many-core NoC processor with 2D mesh grid topology. Return...

chapter

Temporal multithreading architecture design for a Java processor

Hung-Cheng Su, Tsung-Han Wu, Chun-Jen Tsai

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2201 - 2204

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

In this paper, we presents the design of a hardware temporal multi-threading architecture for a Java processor. The Java virtual machine (JVM) model is a stack machine where the process state is the snapshot of the Java stack. If the runtime stack is stored (or cached) in on-chip memory for performance reasons, the backup and restoration of the Java runtime stacks for context switching would be expensive...

INFONA - science communication portal

Search results

DeTrans: Deterministic and Parallel execution of Transactions

Hybrid dynamic data race detection in systemC

A comparison of parallel systemc simulation approaches at RTL

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

Hydra: Efficient Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architecture

Pinso: Precise Isolation of Concurrency Bugs via Delta Triaging

Disruption-free software updates in automation systems

How Processor Speedups Can Slow Down I/O Performance

A Comparison of Graphics Processor Architectures for RFID Simulation

Generating On-Chip Heterogeneous Systems from High-Level Parallel Code

A Speculative Mechanism for Barrier Synchronization

Flexible Parallelized Empirical Mode Decomposition in CUDA for Hilbert Huang Transform

Efficient Work-Stealing with Blocking Deques

Core Affinity Code Block Schedule to Reduce Inter-core Data Synchronization of SpMT

Research on Mahalanobis Distance Algorithm Optimization Based on OpenCL

An efficient dynamic scheduling scheme for H.264/AVC encoding on multi-core architecture

Predicting performance in the presence of software and hardware resource bottlenecks

PCJ - Java library for high performance computing in PGAS model

Instruction-based high-efficient synchronization in a many-core Network-on-Chip processor

Temporal multithreading architecture design for a Java processor

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options