Search results

chapter

SPeCK: a kernel for scalable predictability

Qi Wang, Yuxin Ren, Matt Scaperoth, Gabriel Parmer

21st IEEE Real-Time and Embedded Technology and Applications Symposium > 121 - 132

2015 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing...

chapter

Yes! You Can Use Your Model Checker to Verify OSEK/VDX Applications

Haitao Zhang, Toshiaki Aoki, Yuki Chiba

2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST) > 1 - 10

2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST)

OSEK/VDX, a standard of automobile OS, has been widely adopted by many manufacturers to design and develop a vehicle-mounted OS. With the increasing functionalities in vehicles, more and more complex applications are developed based on the OSEK/VDX OS. However, how to ensure the reliability of developed applications is becoming a challenge for developers. As to ensure the reliability of developed...

chapter

Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications

Joachim Protze, Simone Atzeni, Dong H. Ahn, Martin Schulz, more

2014 LLVM Compiler Infrastructure in HPC > 40 - 47

2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race...

chapter

Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation

Siddharth Nilakantan, Karthik Sangaiah, Ankit More, Giordano Salvadory, more

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 278 - 287

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, and allowing portability, and scalability. However, trace-based simulation approaches have encountered difficulty capturing and accurately replaying multi-threaded traces due to the inherent non-determinism in the execution of multi-threaded...

chapter

A parallel abstract interpreter for JavaScript

Kyle Dewey, Vineeth Kashyap, Ben Hardekopf

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 34 - 45

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We investigate parallelizing flow- and context-sensitive static analysis for JavaScript. Previous attempts to parallelize such analyses for other languages typically start with the traditional framework of sequential dataflow analysis, and then propose methods to parallelize the existing sequential algorithms within this framework. However, we show that this approach is non-optimal and propose a new...

chapter

On performance debugging of unnecessary lock contentions on multicore processors: A replay-based approach

Long Zheng, Xiaofei Liao, Bingsheng He, Song Wu, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 56 - 67

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program execution on multicore processors, thereby incurring significant performance overhead. This paper presents a performance debugging framework, PerfPlay, to facilitate...

chapter

Clean: A race detector with cleaner semantics

Cedomir Segulja, Tarek S. Abdelrahman

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 401 - 413

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

Data races make parallel programs hard to understand. Precise race detection that stops an execution on first occurrence of a race addresses this problem, but it comes with significant overhead. In this work, we exploit the insight that precisely detecting only write-after-write (WAW) and read-after-write (RAW) races suffices to provide cleaner semantics for racy programs. We demonstrate that stopping...

chapter

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Jiwei Liu, Jun Yang, Rami Melhem

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 383 - 394

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...

chapter

Efficiently enforcing strong memory ordering in GPUs

Abhayendra Singh, Shaizeen Aga, Satish Narayanasamy

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 699 - 712

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPU programming models such as CUDA and OpenCL are starting to adopt a weaker data-race-free (DRF-0) memory model, which does not guarantee any semantics for programs with data-races. Before standardizing the memory model interface for GPUs, it is imperative that we understand the tradeoffs of different memory models for these devices. While there is a rich memory model literature for CPUs, studies...

chapter

Execution Drafting: Energy Efficiency through Computation Deduplication

Michael Mckeown, Jonathan Balkind, David Wentzlaff

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 432 - 444

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Computation is increasingly moving to the data enter. Thus, the energy used by CPUs in the data centeris gaining importance. The centralization of computation in the data center has also led to much commonality between the applications running there. For example, there are many instances of similar or identical versions of the Apache web server running in a large data center. Many of these applications,...

chapter

Real-time embedded software design for onboard computer of Anti-Ballistic missile system

Hemant Kumar Rathore, Mahender Katukuri, DL Seshagiri Rao, BHVS Narayana Murthy

International Conference on Computing and Communication Technologies > 1 - 5

2014 International Conference on Computer and Communications Technologies (ICCCT)

The Missile Onboard Computer (OBC) is an embedded computer to perform control, guidance, target data estimation, mission sequencing and various other critical operations during flight. The real-time embedded software for the OBC must be designed to achieve deterministic response times and quite minimal jitters (in the order of 50–100µs) for the accurate control and guidance operations and precise...

chapter

Migration of CUDA Program Based on a Divide-and-Conquer Method

Nan Li, Jianmin Pang, Zheng Shan

2014 IEEE 17th International Conference on Computational Science and Engineering > 1685 - 1691

2014 IEEE 17th International Conference on Computational Science and Engineering (CSE)

Porting CUDA program to other heterogeneous and many-core platform especially native processor is very meaningful for extending the range of the CUDA application, taking advantage of many-core on target platform and supporting national industries. Traditional binary translation technique is not competent to this task. On the point of software reverse engineering, it is feasible to design a new migration...

chapter

A novel method to estimate performance for a high performance computation workload

Joseph Issa

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) > 1 - 2

2014 IEEE International Performance Computing and Communications Conference (IPCCC)

Given the rapid change in processor architecture in the past years, there is a driving necessity to assess processor performance for a high performance computation workload. Assessing performance for a given workload is important to understand which architecture parameters the workload performance is sensitive to. A given workload can be categorized as memory bounded, compute bounded, or in between...

chapter

GPU accelerated NEH algorithm

Magdalena Metlicka, Donald Davendra, Frank Hermann, Markus Meier, more

2014 IEEE Symposium on Computational Intelligence in Production and Logistics Systems (CIPLS) > 114 - 119

2014 IEEE Symposium on Computational Intelligence in Production and Logistics Systems (CIPLS)

This research aims to develop a CUDA accelerated NEH algorithm for the permutative flowshop scheduling problem with makespan criterion. NEH has been shown in the literature as the best constructive heuristic for this particular problem. The CUDA based NEH aims to speed up the processing time by utilising the GPU cores for parallel evaluation. In order to show the versatility and scalability of the...

chapter

Fine-grained GPU parallelization of pairwise local sequence alignment

Chirag Jain, Subodh Kumar

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

The Smith-Waterman algorithm is used in Bio-informatics to perform pairwise local alignment between a query sequence and a subject sequence. We present a GPU based parallel version of this algorithm that is able to perform pair-wise alignment faster than previous algorithms. In particular, it parallelizes each alignment, rather than relying on parallelism across multiple pair alignments, which many...

chapter

MP3 audio parallel decoding based on Libmad library

Xiaxin Li, Shutao Sun

The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014) > 319 - 323

2014 2nd International Conference on Systems and Informatics (ICSAI)

With the popularity of parallel computing, the serial program is unable to take advantage of the multi-core. In this paper, A MP3 audio parallel decoding algorithm based on libmad library for multicore platform is proposed to improve the decoding speed. In this method, to reach the parallel aim, more than one decoder is provided. Experimental results indicate that the algorithm improves the efficiency...

chapter

Energy-Efficient Stencil Computations on Distributed GPUs Using Dynamic Parallelism and GPU-Controlled Communication

Lena Oden, Benjamin Klenk, Holger Froning

2014 Energy Efficient Supercomputing Workshop > 31 - 40

2014 Energy Efficient Supercomputing Workshop (E2SC)

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate...

chapter

A GPU task generator for rendering

Alexandru-Lucian Petrescu, Florica Moldoveanu, Victor Asavei, Alin Moldoveanu, more

2014 18th International Conference on System Theory, Control and Computing (ICSTCC) > 556 - 561

2014 18th International Conference on System Theory, Control and Computing (ICSTCC)

We present an original GPU task generator that can be attached to the rasterization based rendering process, in order to provide a dynamic parallelism. Compared to other existing GPU task generators or schedulers our method creates new tasks using the GPU graphics pipeline task scheduler. These are generated in the geometry shader, by means of rasterizing new geometry that produces additional fragments...

chapter

Graph processing on GPUs: Where are the bottlenecks?

Qiumin Xu, Hyeran Jeon, Murali Annavaram

2014 IEEE International Symposium on Workload Characterization (IISWC) > 140 - 149

2014 IEEE International Symposium on Workload Characterization (IISWC)

Large graph processing is now a critical component of many data analytics. Graph processing is used from social networking web sites that provide context-aware services from user connectivity data to medical informatics that diagnose a disease from a given set of symptoms. Graph processing has several inherently parallel computation steps interspersed with synchronization needs. Graphics processing...

chapter

Thread-level speculation on off-the-shelf hardware transactional memory

Rei Odaira, Takuya Nakaike

2014 IEEE International Symposium on Workload Characterization (IISWC) > 212 - 221

2014 IEEE International Symposium on Workload Characterization (IISWC)

Thread-level speculation can speed up a single-thread application by splitting its execution into multiple tasks and speculatively executing those tasks in multiple threads. Efficient thread-level speculation requires hardware support for memory conflict detection, store buffering, and execution rollback, and in addition, previous research has also proposed advanced optimization facilities, such as...

INFONA - science communication portal

Search results

SPeCK: a kernel for scalable predictability

Yes! You Can Use Your Model Checker to Verify OSEK/VDX Applications

Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications

Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation

A parallel abstract interpreter for JavaScript

On performance debugging of unnecessary lock contentions on multicore processors: A replay-based approach

Clean: A race detector with cleaner semantics

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Efficiently enforcing strong memory ordering in GPUs

Execution Drafting: Energy Efficiency through Computation Deduplication

Real-time embedded software design for onboard computer of Anti-Ballistic missile system

Migration of CUDA Program Based on a Divide-and-Conquer Method

A novel method to estimate performance for a high performance computation workload

GPU accelerated NEH algorithm

Fine-grained GPU parallelization of pairwise local sequence alignment

MP3 audio parallel decoding based on Libmad library

Energy-Efficient Stencil Computations on Distributed GPUs Using Dynamic Parallelism and GPU-Controlled Communication

A GPU task generator for rendering

Graph processing on GPUs: Where are the bottlenecks?

Thread-level speculation on off-the-shelf hardware transactional memory

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options