Search results

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

Redesigning Go’s Built-In Map to Support Concurrent Operations

Louis Jenkins, Tingzhe Zhou, Michael Spear

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 14 - 26

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

The Go language lacks built-in data structures that allow fine-grained concurrent access. In particular, its map data type, one of only two generic collections in Go, limits concurrency to the case where all operations are read-only; any mutation (insert, update, or remove) requires exclusive access to the entire map. The tight integration of this map into the Go language and runtime precludes its...

chapter

The Performance and Scalability of the SHMEM and Corresponding MPI-3 Routines on a Cray XC30

Gianina Alina Negoita, Glenn R. Luecke, Marina Kraeva, Gurpur Prabhu, more

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 62 - 69

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

In this paper the authors compare the performance and scalability of the SHMEM and corresponding MPI-3 routines for five different benchmark tests using a Cray XC30. The performance of the MPI-3 get and put operations was evaluated using fence synchronization and also using lock-unlock synchronization. The five tests used communication patterns ranging from light to heavy data traffic: accessing distant...

chapter

Copyright page

2017 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SINKHROINFO) > 1

2017 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SINKHROINFO)

chapter

Retrospective Lightweight Distributed Snapshots Using Loosely Synchronized Clocks

Aleksey Charapko, Ailidani Ailijiang, Murat Demirbas, Sandeep Kulkarni

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 2061 - 2066

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

In order to take a consistent snapshot of a distributed system, it is necessary to collate and align local logs from each node to construct a pairwise concurrent cut. By leveraging NTP synchronized clocks, and augmenting them with logical clock causality information, Retroscope provides a lightweight solution for taking unplanned retrospective snapshots of past distributed system states. Instead of...

chapter

Analysis on interactive data race checker: IDRC

Md A. Obaida, Israt Jahan, Sayeed Z. Sajal

2017 IEEE International Conference on Electro Information Technology (EIT) > 265 - 269

2017 IEEE International Conference on Electro Information Technology (EIT)

Parallel programming is becoming more and more prevalent in this era of concurrent programming. Because of the nondeterministic nature of parallel programming, it is notoriously difficult to debug concurrency bugs, moreover attempt to fix one bug may result in deadlock or other concurrency bugs. Though many static and dynamic data race detection tool is proposed in recent years, none of them is interactive...

chapter

DPA-resistant QDI dual-rail AES S-Box based on power-balanced weak-conditioned half-buffer

James Lim, Weng-Geng Ho, Kwen-Siong Chong, Bah-Hwee Gwee

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

We propose an asynchronous-logic (async) Quasi-Delay-lnsensitive (QDI) dual-rail 32-bit Advanced Encryption Standard (AES) Substitution-Box (S-Box) for Differential Power Analysis (DPA) attack countermeasure. There are three novel features in the proposed S-Box. First, the proposed S-Box operates in async QDl protocol with dual-rail data encoding, hence there is only a marginal difference in power...

chapter

Redundancy elimination might be overrated: A quantitative study on wireless traffic

Xueheng Hu, Aaron Striegel

2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) > 754 - 759

IEEE INFOCOM 2017 -IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

With significant increases in mobile device traffic slated for the foreseeable future, numerous technologies must be embraced to satisfy such demand. Notably, one of the more intriguing approaches has been blending on-device caching and device-to-device (D2D) communications. While various past research has pointed to potentially significant gains (30%+) via redundancy elimination (RE), some skepticism...

chapter

Multi-GPU Graph Analytics

Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 479 - 490

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the single-GPU implementations, our design only requires programmers to specify a few algorithm-dependent concerns, hiding most multi-GPU related implementation details....

chapter

Offloading Communication Control Logic in GPU Accelerated Applications

Elena Agostini, Davide Rossetti, Sreeram Potluri

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 248 - 257

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...

chapter

Transparent Caching for RMA Systems

Salvatore Di Girolamo, Flavio Vella, Torsten Hoefler

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 1018 - 1027

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI-3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular...

chapter

A Pluggable Framework for Composable HPC Scheduling Libraries

Max Grossman, Vivek Kumar, Nick Vrvilo, Zoran Budimlic, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 723 - 732

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Driven by the increasing diversity of current and future HPC hardware and software platforms, the HPC community has seen a dramatic increase in research and development efforts into the composability of discrete software systems. While modularity is often desirable from a software engineering, quality assurance, and maintainability perspective, the barriers between software components often hide optimization...

chapter

Fault Tolerance for Cooperative Lifeline-Based Global Load Balancing in Java with APGAS and Hazelcast

Jonas Posner, Claudia Fohry

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 854 - 863

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Fault tolerance is a major issue for parallel applications. Approaches on application-level are gaining increasing attention because they may be more efficient than system-level ones. In this paper, we present a generic reusable framework for fault-tolerant parallelization with the task pool pattern. Users of this framework can focus on coding sequential tasks for their problem, while respecting some...

chapter

RDIT - Race Detection from Incomplete Traces

Arun Krishnakumar Rajagopalan

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) > 677 - 679

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C)

In a precise race-detector, a race is detected only if the trace exhibits a real race. In such tools, every memory access from each thread is typically checked for conflicting accesses. We show that there are many redundant memory access checks present in real world program execution traces. Removing these redundant checks during the online run-time instrumentation stage can significantly speed up...

article

An HLA-Based Distributed Cosimulation Framework in Mixed-Signal System-on-Chip Design

Moon Gi Seok, Tag Gon Kim, Chang Beom Choi, Daejin Park

IEEE Transactions on Very Large Scale Integration (VLSI) Systems > 2017 > 25 > 2 > 760 - 764

In mixed-signal system-on-chip (SoC) design, distributed cosimulation is one of the practical approaches for unifying various abstracted hardware models using different description languages. Conventional ad hoc distributed cosimulation solutions do not have formal theoretical backgrounds of simulator integration into their solutions. In this brief, we propose a general cosimulation framework based...

chapter

Combat vehicle simulator based on HLA prototype concept

Josef Brozek, Martin Jakes, Simeon Karamazov, Dan Hamernik

2016 17th International Conference on Mechatronics - Mechatronika (ME) > 1 - 6

2016 17th International Conference on Mechatronics - Mechatronika (ME)

A combat vehicle simulator is the first complex drive-simulator developed at the University of Pardubice. It consists of dozens of complementary simulators and simulation calculations. The whole simulator is created on a modified implementation of a game engine. That has resulted in a high-quality graphic processing, based on a hybrid kernel of the simulator (discrete-continuous simulation), which...

chapter

An Efficient Scenario Based Testing Methodology Using UVM

Khaled Fathy, Khaled Salah

2016 17th International Workshop on Microprocessor and SOC Test and Verification (MTV) > 57 - 60

2016 17th International Workshop on Microprocessor and SOC Test and Verification (MTV)

Most of modern verification architects use randomness supported by system verilog (SV) to enable defining a generic path for a test to follow. This generic path stresses on a subset of features, and allows randomization to explore corners in depth. Setting up such test case requires a well-defined stimulus generation methodology that consumes less time to cover all the corner-cases. Moreover, Off-the-shelf...

chapter

The “Behavior, interaction and priority” framework applied to SystemC-based embedded systems

Ismail Assayad, Lamia Bljadiri, Abdelouahed Zakari, Tarik Nahhal

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) > 1 - 6

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)

We present an approach for formal modelling of SystemC programs build uppon the “Behavior, Interactions and Priority” framework. Produced automata interactions are restricted only by using priorities rather than cutting their interactions. Produced models have thus the advantages of being composable — with other futur programs — without any change, except for the priorities. Furthermore automata are...

chapter

Towards Using Concurrent Java API Correctly

Shuang Liu, Guangdong Bai, Jun Sun, Jin Song Dong

2016 21st International Conference on Engineering of Complex Computer Systems (ICECCS) > 219 - 222

2016 21st International Conference on Engineering of Complex Computer Systems (ICECCS)

Concurrent Programs are hard to analyze or debug due to the complex program logic and unpredictable execution environment. In practice, ordinary programmers often adopt existing well-designed concurrency related API (e.g., those in java.util.concurrent) so as to avoid dealing with these issues. These API can however often be used incorrectly, which results in hardto-debug concurrent bugs. In this...

chapter

Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication

Hongzhang Shan, Samuel Williams, Yili Zheng, Weiqun Zhang, more

2016 PGAS Applications Workshop (PAW) > 17 - 24

2016 PGAS Applications Workshop (PAW)

Nearest-neighbor communication is one of the most important communication patterns appearing in many scientific applications. In this paper, we discuss the results of applying UPC++, a library-based partitioned global address space (PGAS) programming extension to C++, to an adaptive mesh framework (BoxLib), and a full scientific application GTC-P, whose communications are dominated by the nearest-neighbor...

INFONA - science communication portal

Search results

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Redesigning Go’s Built-In Map to Support Concurrent Operations

The Performance and Scalability of the SHMEM and Corresponding MPI-3 Routines on a Cray XC30

Copyright page

Retrospective Lightweight Distributed Snapshots Using Loosely Synchronized Clocks

Analysis on interactive data race checker: IDRC

DPA-resistant QDI dual-rail AES S-Box based on power-balanced weak-conditioned half-buffer

Redundancy elimination might be overrated: A quantitative study on wireless traffic

Multi-GPU Graph Analytics

Offloading Communication Control Logic in GPU Accelerated Applications

Transparent Caching for RMA Systems

A Pluggable Framework for Composable HPC Scheduling Libraries

Fault Tolerance for Cooperative Lifeline-Based Global Load Balancing in Java with APGAS and Hazelcast

RDIT - Race Detection from Incomplete Traces

An HLA-Based Distributed Cosimulation Framework in Mixed-Signal System-on-Chip Design

Combat vehicle simulator based on HLA prototype concept

An Efficient Scenario Based Testing Methodology Using UVM

The “Behavior, interaction and priority” framework applied to SystemC-based embedded systems

Towards Using Concurrent Java API Correctly

Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options