The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...
NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...
Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing...
The mainstream of embedded software development as of today is dominated by C programming. To aid the development, hardware abstractions, libraries, kernels and lightweight operating systems are commonplace. Such kernels and operating systems typically impose a thread based abstraction to concurrency. However, in general thread based programming is hard, plagued by race conditions and dead-locks....
Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance,...
We present an auto-parallelization technique for generating GPU implementation of data-structure operations from a sequential spec-ification. The technique partitions the data-structure operations into barrier-separated phases such that each phase executes only homogeneous operations. Homogeneity is dictated by the method type, which is derived from the specification. Two key aspects of our technique...
The EC project parMERASA investigates techniques for the parallelization of industrial real-time applications from automotive, avionic, and construction machinery domains. The aim is to execute such applications on many-core processors with up to 64 cores. The system software plays a key role in the deployment of applications. However, requirements of application domains differ widely, and thus no...
In this paper, we propose built-in functions on parallel programming model in SMYLE OpenCL to extend the original OpenCL semantics giving our system's original limitation and interpretation for embedded many-core architecture. On a platform using FPGA to evaluate embedded many-core architecture SMYLEref, data parallel and task parallel programming models supported by the OpenCL execution model are...
The simplicity of concurrent programming with Transactional Memory (TM) and its recent implementation in mainstream processors greatly motivates researchers and industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C system library which a wide range of TM developers can adopt. TM application developers have been forced to avoid library...
In recent years, improvements of energy efficiency and computational performance have become a major issue, because smartphones and tablets become popular. To implement high performance, multi-core accelerator consists of general purpose processors and accelerators are often used. But to use these multi-core accelerator efficiently, programmers have to consider synchronization and data transfer between...
This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional SystemC...
Electronic System Level (ESL) design of embedded systems proposes raising the abstraction level of the design entry to cope with the increasing complexity of such systems. To exploit the benefits of ESL, design languages should allow specification of models which are a) heterogeneous, to describe different aspects of systems; b) formally defined, for application of analysis and synthesis methods;...
New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...
The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting,...
A threads package is a set of primitives available to the user relating to threads. In this paper we will consider some of the issues concerned with the architecture and functionality of threads packages and consider how to implement threads packages.
This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures. CuPP offers both a low level interface - mostly consisting of smartpointers...
In this paper, we present a simple but powerful solution to combine two different simulation environments - SystemC and OMNeT++ - enabling a cosimulation framework for modeling and simulating a distributed system of systems. It is therefore possible to utilize the strengths and preliminary work of OMNeT++ in the field of communication networks and SystemC in modeling hardware entities. We use a socket...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.