The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...
All semiconductor market domains are converging to concurrent platforms. This trend has certainly led real challenge to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. This paper argues that the Computer System related courses are natural places to introduce the parallelism, and the earlier to parallel computing concepts...
Programming FPGAs has been an arduous task that requires extensive knowledge of hardware design languages (HDLs), such as Verilog or VHDL, and low-level hardware details. With OpenCL support for FPGAs, the design, prototyping and implementation of an FPGA is increasingly moving towards a much higher level of abstraction, when compared to the intrinsically low-level nature of HDLs. On the other hand,...
For GPUs to achieve their peak performance, effective and efficient usage of memory bandwidth is necessary. To this end, programmers invest extensive development effort to optimize a GPU program, specially its memory bandwidth usage. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this model's coarse-grained control on a program can make...
Building massively parallel numerical simulations is not easy due to lasting changes of parallel programming models and various software technologies needed. We develop a component based graphical parallel programming approach to lower the difficulties of coding applications in scientific and engineering computing and support rapid development of large scale simulations basing on a domain specific...
Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...
Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...
We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...
Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which...
Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research...
Computational chemistry comprises one of the driving forces of High Performance Computing. In particular, many-body methods, such as Coupled Cluster methods (CC) [1] of the quantum chemistry package NWCHEM [2], are of particular interest for the applied chemistry community.
The biomedical imagery, the numeric communications, the acoustic signal processing and many others gls[dsp] applications are present more and more in the numeric world. They process growing data volume which is represented with more and more accuracy, and use complex algorithms with time constraints to satisfying. Consequently, a high requirement of computing power characterize them. To satisfy this...
In seismic data processing, Prestack Kirchhoff time migration is a forming method, and widely used. In this paper, we introduced how to port the original CUDA program to OpenCL, and how to implement and optimize Prestack Kirchhoff Time Migration algorithm on OpenCL and General Purpose GPU, and how to optimize the OpenCL program to get the competitive performance comparing with the original CUDA version...
GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...
In this paper, we propose an efficient and effective there level Design Space Exploration (DSE) method for mapping a system consisting of a number of DSP functions onto an RTL or lower level model using constraint programming methodology. The design space has three dimensions: a) function execution schedule (when the functions should execute), b) function implementation assignment (how the execution...
The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed...
Recently, we have witnessed that cloud providers start to offer heterogeneous computing environments. There have been wide interests in both clusters and cloud of adopting graphics processors (GPUs) as accelerators for various applications. On the other hand, large-scale graph processing is important for many data-intensive applications in the cloud. In this paper, we propose to leverage GPUs to accelerate...
In this paper, we present a comparative performance study of a 2D Gaussian blur kernel mapped to a heterogeneous multi-core CPU/GPU platform. In this study, the kernel workgroup, the Gaussian kernel and the image sizes are considered variable parameters. We aim to gain insight into how well the execution and data movement times evolve across each computing device in varying the values of these parameters...
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes, scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports a data-flow task...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.