The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Given their complexity operating systems have beena teaching challenge in terms of both course design and coursedelivery. Being complex software artifacts, they challenge thestudent by bringing together a number of concepts and algorithmsfrom different aspects of the body of knowledge inComputer Science. This inherent "nonlinearity" (of the way inwhich the concepts come together) is in stark...
Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...
Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...
Convolution serves as the basic computational primitive for various associative computing tasks ranging from edge detection to image matching. CMOS implementation of such computations entails significant bottlenecks in area and energy consumption due to the large number of multiplication and addition operations involved. In this paper, we propose an ultra-low power and compact hybrid spintronic-CMOS...
Embedded scalable platforms combine a flexible socketed architecture for heterogeneous system-on-chip (SoC) design and a companion system-level design methodology. The architecture supports the rapid integration of processor cores with many specialized hardware accelerators. The methodology simplifies the design, integration, and programming of the heterogeneous components in the SoC. In particular,...
Exascale performance requires a level of energy efficiency only achievable with specialized hardware. Hence, for building a general purpose HPC system with Exascale performance different types of processors, memory technologies and interconnection networks will be necessary. Heterogeneous hardware is already present on some top supercomputer systems that are composed of different compute nodes, which...
We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...
This paper deals with backstepping design for boundary PDE control/observer as a convex optimization problem. Both Volterra and Fredholm operators are analysed for a class of parabolic and hyperbolic PDEs. The resulting Kernel-PDEs are formulated in terms of polynomial functions, the parameters of which are optimized using Sum-of-Squares (SOS) techniques and solved via semidefinite programming. Uniqueness...
Exchanging data on noncontiguous user buffers has been a dominant communication pattern in many scientific applications. The OpenSHMEM specification introduces a new set of communication routines to support strided data communication. Most high performance implementations of the OpenSHMEM specification support strided data communication by either packing/unpacking or multiple reads/writes based scheme,...
There is now significant interest in OpenCL for FPGAs because it is the first time the FPGA vendors have provided a programming model and a computing platform with integrated high-level synthesis. OpenCL is intended for heterogenous platforms, not just FPGAs, and the standard continues to evolve. Recently, OpenCL has introduced Shared Virtual Memory (SVM) with the goal of simplifying the programming...
In this paper we present a heavily exploration oriented implementation of genetic algorithms to be executed on graphic processor units (GPUs) that is optimized with our novel mechanism for scheduling GPU-side synchronized jobs that takes inspiration from the concept of persistent threads. Persistent Threads allow an efficient distribution of work loads throughout the GPU so to fully exploit the CUDA...
Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings...
Nowadays, multi-core architectures have become mainstream in the microprocessor industry. However, while the number of cores integrated in a single chip growth, more important becomes the need for an adequate programming model. In recent years, the OpenCL programming model has attracted the attention of multi-core designers' community. This paper presents an OpenCL-compliant architecture and demonstrates...
While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving...
Recently, large-scale graph analytics has become a very popular topic owing to the emergence of gigantic graphs whose number of vertices and edges is in millions, billions or even trillions. Many graph analytics libraries and frameworks have been proposed with various computational models and programming languages to deal with such graphs. X10 programming language is a PGAS language that aims at both...
The Single Instruction Multiple Data (SIMD) architecture of Graphic Processing Units (GPUs) makes them perfect for parallel processing of big data. In this paper, we present the design, implementation and evaluation of G-Storm, a GPU-enabled parallel system based on Storm, which harnesses the massively parallel computing power of GPUs for high-throughput online stream data processing. G-Storm has...
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...
Kernel fusion is an optimization method, in which the code from several kernels is composed to create a new, fused kernel. It can push the performance of kernels beyond limits given for their isolated, unfused form. In this paper, we introduce a classification of different types of kernel fusion for both data dependent and data independent kernels. We study kernel fusion on three types of OpenCL devices:...
The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...
OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application's execution with much less effort. Coprocessors offer significant computing power...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.