Search results

Items from 61 to 80 out of 372 results

chapter

Experience with an Incremental Approach to Teaching Single Processor Operating Systems

Abhijat Vichare

2016 International Conference on Learning and Teaching in Computing and Engineering (LaTICE) > 162 - 166

2016 International Conference on Learning and Teaching in Computing and Engineering (LaTICE)

Given their complexity operating systems have beena teaching challenge in terms of both course design and coursedelivery. Being complex software artifacts, they challenge thestudent by bringing together a number of concepts and algorithmsfrom different aspects of the body of knowledge inComputer Science. This inherent "nonlinearity" (of the way inwhich the concepts come together) is in stark...

chapter

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...

chapter

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

chapter

Low-power approximate convolution computing unit with domain-wall motion based “Spin-Memristor” for image processing applications

Yong Shim, Abhronil Sengupta, Kaushik Roy

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolution serves as the basic computational primitive for various associative computing tasks ranging from edge detection to image matching. CMOS implementation of such computations entails significant bottlenecks in area and energy consumption due to the large number of multiplication and addition operations involved. In this paper, we propose an ultra-low power and compact hybrid spintronic-CMOS...

chapter

High-level synthesis of accelerators in embedded scalable platforms

Paolo Mantovani, Giuseppe Di Guglielmo, Luca P. Carloni

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) > 204 - 211

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

Embedded scalable platforms combine a flexible socketed architecture for heterogeneous system-on-chip (SoC) design and a companion system-level design methodology. The architecture supports the rapid integration of processor cores with many specialized hardware accelerators. The methodology simplifies the design, integration, and programming of the heterogeneous components in the SoC. In particular,...

chapter

Collective Offload for Heterogeneous Clusters

Florentino Sainz, Jorge Bellon, Vicenc Beltran, Jesus Labarta

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 376 - 385

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Exascale performance requires a level of energy efficiency only achievable with specialized hardware. Hence, for building a general purpose HPC system with Exascale performance different types of processors, memory technologies and interconnection networks will be necessary. Heterogeneous hardware is already present on some top supercomputer systems that are composed of different compute nodes, which...

chapter

Using type transformations to generate program variants for FPGA design space exploration

Syed Waqar Nabi, Wim Vanderbauwhede

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...

chapter

Backstepping PDE design, Volterra and Fredholm operators: A convex optimization approach

Pedro Ascencio, Alessandro Astolfi, Thomas Parisini

2015 54th IEEE Conference on Decision and Control (CDC) > 7048 - 7053

2015 54th IEEE Conference on Decision and Control (CDC)

This paper deals with backstepping design for boundary PDE control/observer as a convex optimization problem. Both Volterra and Fredholm operators are analysed for a class of parabolic and hyperbolic PDEs. The resulting Kernel-PDEs are formulated in terms of polynomial functions, the parameters of which are optimized using Sum-of-Squares (SOS) techniques and solved via semidefinite programming. Uniqueness...

chapter

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR

Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Jie Zhang, more

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 244 - 253

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Exchanging data on noncontiguous user buffers has been a dominant communication pattern in many scientific applications. The OpenSHMEM specification introduces a new set of communication routines to support strided data communication. Most high performance implementations of the OpenSHMEM specification support strided data communication by either packing/unpacking or multiple reads/writes based scheme,...

chapter

Evaluating shared virtual memory in an OpenCL framework for embedded systems on FPGAs

Vincent Mirian, Paul Chow

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 8

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

There is now significant interest in OpenCL for FPGAs because it is the first time the FPGA vendors have provided a programming model and a computing platform with integrated high-level synthesis. OpenCL is intended for heterogenous platforms, not just FPGAs, and the standard continues to evolve. Recently, OpenCL has introduced Shared Virtual Memory (SVM) with the goal of simplifying the programming...

chapter

Efficient Implementation of Genetic Algorithms on GP-GPU with Scheduled Persistent CUDA Threads

Nicola Capodieci, Paolo Burgio

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) > 6 - 12

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

In this paper we present a heavily exploration oriented implementation of genetic algorithms to be executed on graphic processor units (GPUs) that is optimized with our novel mechanism for scheduling GPU-side synchronized jobs that takes inspiration from the concept of persistent threads. Persistent Threads allow an efficient distribution of work loads throughout the GPU so to fully exploit the CUDA...

chapter

Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers (T)

Pantazis Deligiannis, Alastair F. Donaldson, Zvonimir Rakamaric

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) > 166 - 177

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings...

chapter

An OpenCL-Compliant Multi-core Platform and Its Companion Compiler

Ramon S. Nepomuceno, Jonatas C. Santos, Laysson O. Luz, Ivan S. Silva

2015 Brazilian Symposium on Computing Systems Engineering (SBESC) > 116 - 121

2015 Brazilian Symposium on Computing Systems Engineering (SBESC)

Nowadays, multi-core architectures have become mainstream in the microprocessor industry. However, while the number of cores integrated in a single chip growth, more important becomes the need for an adequate programming model. In recent years, the OpenCL programming model has attracted the attention of multi-core designers' community. This paper presents an OpenCL-compliant architecture and demonstrates...

chapter

Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs

Pirmin Vogel, Andrea Marongiu, Luca Benini

2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 45 - 54

2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving...

chapter

ScaleGraph: A high-performance library for billion-scale graph analytics

Toyotaro Suzumura, Koji Ueno

2015 IEEE International Conference on Big Data (Big Data) > 76 - 84

2015 IEEE International Conference on Big Data (Big Data)

Recently, large-scale graph analytics has become a very popular topic owing to the emergence of gigantic graphs whose number of vertices and edges is in millions, billions or even trillions. Many graph analytics libraries and frameworks have been proposed with various computational models and programming languages to deal with such graphs. X10 programming language is a PGAS language that aims at both...

chapter

G-Storm: GPU-enabled high-throughput online data processing in Storm

Zhenhua Chen, Jielong Xu, Jian Tang, Kevin Kwiat, more

2015 IEEE International Conference on Big Data (Big Data) > 307 - 312

2015 IEEE International Conference on Big Data (Big Data)

The Single Instruction Multiple Data (SIMD) architecture of Graphic Processing Units (GPUs) makes them perfect for parallel processing of big data. In this paper, we present the design, implementation and evaluation of G-Storm, a GPU-enabled parallel system based on Storm, which harnesses the massively parallel computing power of GPUs for high-throughput online stream data processing. G-Storm has...

chapter

CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters

Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden, more

2015 IEEE 18th International Conference on Computational Science and Engineering > 17 - 26

2015 IEEE 18th International Conference on Computational Science and Engineering (CSE)

On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...

chapter

OpenCL Kernel Fusion for GPU, Xeon Phi and CPU

Jiri Filipovic, Siegfried Benkner

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 98 - 105

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Kernel fusion is an optimization method, in which the code from several kernels is composed to create a new, fused kernel. It can push the performance of kernels beyond limits given for their isolated, unfused form. In this paper, we introduce a classification of different types of kernel fusion for both data dependent and data independent kernels. We study kernel fusion on three types of OpenCL devices:...

chapter

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Arpith Jacob, Ravi Nair, Tong Chen, Zehra Sura, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 57 - 64

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...

chapter

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA

Adrian Castello, Antonio J. Pena, Rafael Mayo, Pavan Balaji, more

2015 IEEE International Conference on Cluster Computing > 92 - 95

2015 IEEE International Conference on Cluster Computing (CLUSTER)

OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application's execution with much less effort. Coprocessors offer significant computing power...

Keywords:
KERNEL
PROGRAMMING

Publication date

Set your own date range

Content availability

Available (369)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (104)
HARDWARE (82)
COMPUTER ARCHITECTURE (81)
COMPUTATIONAL MODELING (73)
INSTRUCTION SETS (62)
OPTIMIZATION (57)
PARALLEL PROCESSING (56)
GRAPHICS PROCESSING UNIT (55)
GPU (51)
CUDA (43)
OPENCL (39)
COMPUTER GRAPHIC EQUIPMENT (36)
COPROCESSORS (36)
PROGRAM PROCESSORS (36)
RUNTIME (35)
FIELD PROGRAMMABLE GATE ARRAYS (34)
ARRAYS (30)
LIBRARIES (30)
PERFORMANCE EVALUATION (30)
BENCHMARK TESTING (29)
REGISTERS (29)
SYNCHRONIZATION (26)
PARALLEL PROGRAMMING (25)
ALGORITHM DESIGN AND ANALYSIS (24)
LINUX (24)
MEMORY MANAGEMENT (23)
GPGPU (22)
DATA MINING (19)
OPENMP (18)
YARN (18)
BANDWIDTH (16)
COMPUTER GRAPHICS (16)
SUPPORT VECTOR MACHINES (16)
APPLICATION PROGRAM INTERFACES (15)
HIGH PERFORMANCE COMPUTING (15)
MULTIPROCESSING SYSTEMS (15)
MICROPROCESSOR CHIPS (14)
ACCELERATION (13)
CONTEXT (13)
FPGA (13)
GRAPHICS (13)
MPI (13)
PARALLEL ARCHITECTURES (13)
RANDOM ACCESS MEMORY (13)
JAVA (12)
STANDARDS (12)
COMPLEXITY THEORY (11)
COMPUTE UNIFIED DEVICE ARCHITECTURE (11)
DATA TRANSFER (11)
INDEXES (11)
MULTI-THREADING (11)
SERVERS (11)
MESSAGE PASSING (10)
OPERATING SYSTEMS (10)
PROGRAMMING MODEL (10)
SOFTWARE (10)
SOFTWARE ARCHITECTURE (10)
VECTORS (10)
DATABASES (9)
MAGNETIC CORES (9)
MESSAGE SYSTEMS (9)
MICROPROCESSORS (9)
MULTICORE PROCESSING (9)
OPERATING SYSTEM KERNELS (9)
CENTRAL PROCESSING UNIT (8)
EMBEDDED SYSTEMS (8)
MACHINE LEARNING (8)
REAL TIME SYSTEMS (8)
STREAMING MEDIA (8)
ACCELERATORS (7)
CRYPTOGRAPHY (7)
DATA MODELS (7)
DATA STRUCTURES (7)
GRAPHIC PROCESSING UNIT (7)
LINEAR PROGRAMMING (7)
PROGRAM COMPILERS (7)
REAL-TIME SYSTEMS (7)
RESOURCE MANAGEMENT (7)
ANALYTICAL MODELS (6)
CLASSIFICATION ALGORITHMS (6)
COMPUTER LANGUAGES (6)
DRIVER CIRCUITS (6)
ELECTRONICS PACKAGING (6)
IMAGE PROCESSING (6)
MATHEMATICAL MODEL (6)
NVIDIA GPU (6)
OBJECT ORIENTED MODELING (6)
OPENACC (6)
OPERATING SYSTEMS (COMPUTERS) (6)
OPTIMISATION (6)
PARALLEL COMPUTING (6)
PIPELINE PROCESSING (6)
RECONFIGURABLE ARCHITECTURES (6)
SCHEDULES (6)
SECURITY (6)
SEMANTICS (6)
SEMIDEFINITE PROGRAMMING (6)
SYSTEM-ON-CHIP (6)
more

INFONA - science communication portal

Search results

Experience with an Incremental Approach to Teaching Single Processor Operating Systems

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Low-power approximate convolution computing unit with domain-wall motion based “Spin-Memristor” for image processing applications

High-level synthesis of accelerators in embedded scalable platforms

Collective Offload for Heterogeneous Clusters

Using type transformations to generate program variants for FPGA design space exploration

Backstepping PDE design, Volterra and Fredholm operators: A convex optimization approach

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR

Evaluating shared virtual memory in an OpenCL framework for embedded systems on FPGAs

Efficient Implementation of Genetic Algorithms on GP-GPU with Scheduled Persistent CUDA Threads

Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers (T)

An OpenCL-Compliant Multi-core Platform and Its Companion Compiler

Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs

ScaleGraph: A high-performance library for billion-scale graph analytics

G-Storm: GPU-enabled high-throughput online data processing in Storm

CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters

OpenCL Kernel Fusion for GPU, Xeon Phi and CPU

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options