The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...
The Go language lacks built-in data structures that allow fine-grained concurrent access. In particular, its map data type, one of only two generic collections in Go, limits concurrency to the case where all operations are read-only; any mutation (insert, update, or remove) requires exclusive access to the entire map. The tight integration of this map into the Go language and runtime precludes its...
In this paper the authors compare the performance and scalability of the SHMEM and corresponding MPI-3 routines for five different benchmark tests using a Cray XC30. The performance of the MPI-3 get and put operations was evaluated using fence synchronization and also using lock-unlock synchronization. The five tests used communication patterns ranging from light to heavy data traffic: accessing distant...
In order to take a consistent snapshot of a distributed system, it is necessary to collate and align local logs from each node to construct a pairwise concurrent cut. By leveraging NTP synchronized clocks, and augmenting them with logical clock causality information, Retroscope provides a lightweight solution for taking unplanned retrospective snapshots of past distributed system states. Instead of...
Parallel programming is becoming more and more prevalent in this era of concurrent programming. Because of the nondeterministic nature of parallel programming, it is notoriously difficult to debug concurrency bugs, moreover attempt to fix one bug may result in deadlock or other concurrency bugs. Though many static and dynamic data race detection tool is proposed in recent years, none of them is interactive...
We propose an asynchronous-logic (async) Quasi-Delay-lnsensitive (QDI) dual-rail 32-bit Advanced Encryption Standard (AES) Substitution-Box (S-Box) for Differential Power Analysis (DPA) attack countermeasure. There are three novel features in the proposed S-Box. First, the proposed S-Box operates in async QDl protocol with dual-rail data encoding, hence there is only a marginal difference in power...
With significant increases in mobile device traffic slated for the foreseeable future, numerous technologies must be embraced to satisfy such demand. Notably, one of the more intriguing approaches has been blending on-device caching and device-to-device (D2D) communications. While various past research has pointed to potentially significant gains (30%+) via redundancy elimination (RE), some skepticism...
We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the single-GPU implementations, our design only requires programmers to specify a few algorithm-dependent concerns, hiding most multi-GPU related implementation details....
NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...
The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI-3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular...
Driven by the increasing diversity of current and future HPC hardware and software platforms, the HPC community has seen a dramatic increase in research and development efforts into the composability of discrete software systems. While modularity is often desirable from a software engineering, quality assurance, and maintainability perspective, the barriers between software components often hide optimization...
Fault tolerance is a major issue for parallel applications. Approaches on application-level are gaining increasing attention because they may be more efficient than system-level ones. In this paper, we present a generic reusable framework for fault-tolerant parallelization with the task pool pattern. Users of this framework can focus on coding sequential tasks for their problem, while respecting some...
In a precise race-detector, a race is detected only if the trace exhibits a real race. In such tools, every memory access from each thread is typically checked for conflicting accesses. We show that there are many redundant memory access checks present in real world program execution traces. Removing these redundant checks during the online run-time instrumentation stage can significantly speed up...
In mixed-signal system-on-chip (SoC) design, distributed cosimulation is one of the practical approaches for unifying various abstracted hardware models using different description languages. Conventional ad hoc distributed cosimulation solutions do not have formal theoretical backgrounds of simulator integration into their solutions. In this brief, we propose a general cosimulation framework based...
A combat vehicle simulator is the first complex drive-simulator developed at the University of Pardubice. It consists of dozens of complementary simulators and simulation calculations. The whole simulator is created on a modified implementation of a game engine. That has resulted in a high-quality graphic processing, based on a hybrid kernel of the simulator (discrete-continuous simulation), which...
Most of modern verification architects use randomness supported by system verilog (SV) to enable defining a generic path for a test to follow. This generic path stresses on a subset of features, and allows randomization to explore corners in depth. Setting up such test case requires a well-defined stimulus generation methodology that consumes less time to cover all the corner-cases. Moreover, Off-the-shelf...
We present an approach for formal modelling of SystemC programs build uppon the “Behavior, Interactions and Priority” framework. Produced automata interactions are restricted only by using priorities rather than cutting their interactions. Produced models have thus the advantages of being composable — with other futur programs — without any change, except for the priorities. Furthermore automata are...
Concurrent Programs are hard to analyze or debug due to the complex program logic and unpredictable execution environment. In practice, ordinary programmers often adopt existing well-designed concurrency related API (e.g., those in java.util.concurrent) so as to avoid dealing with these issues. These API can however often be used incorrectly, which results in hardto-debug concurrent bugs. In this...
Nearest-neighbor communication is one of the most important communication patterns appearing in many scientific applications. In this paper, we discuss the results of applying UPC++, a library-based partitioned global address space (PGAS) programming extension to C++, to an adaptive mesh framework (BoxLib), and a full scientific application GTC-P, whose communications are dominated by the nearest-neighbor...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.