Search results

Items from 1 to 17 out of 17 results

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

Offloading Communication Control Logic in GPU Accelerated Applications

Elena Agostini, Davide Rossetti, Sreeram Potluri

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 248 - 257

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...

chapter

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution

Brunno F. Goldstein, Felipe M.G. Franca, Leandro A.J. Marzulo, Tiago A.O. Alves

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 103 - 108

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing...

chapter

Well-formed control flow for critical sections in RTFM-core

Per Lindgren, Marcus Lindner, Andreas Lindner, David Pereira, more

2015 IEEE 13th International Conference on Industrial Informatics (INDIN) > 1438 - 1445

2015 IEEE 13th International Conference on Industrial Informatics (INDIN)

The mainstream of embedded software development as of today is dominated by C programming. To aid the development, hardware abstractions, libraries, kernels and lightweight operating systems are commonplace. Such kernels and operating systems typically impose a thread based abstraction to concurrency. However, in general thread based programming is hard, plagued by race conditions and dead-locks....

chapter

Sparse matrix computations on clusters with GPGPUs

Valeria Cardellini, Alessandro Fanfarillo, Salvatore Filippone

2014 International Conference on High Performance Computing & Simulation (HPCS) > 23 - 30

2014 International Conference on High Performance Computing & Simulation (HPCS)

Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance,...

chapter

Auto-parallelization of data structure operations for GPUs

Rupesh Nasre

2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) > 1 - 10

2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

We present an auto-parallelization technique for generating GPU implementation of data-structure operations from a sequential spec-ification. The technique partitions the data-structure operations into barrier-separated phases such that each phase executes only homogeneous operations. Homogeneity is dictated by the method type, which is derived from the specification. Two key aspects of our technique...

chapter

A Cross-Domain System Architecture for Embedded Hard Real-Time Many-Core Systems

Christian Bradatsch, Florian Kluge, Theo Ungerer

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 2034 - 2041

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

The EC project parMERASA investigates techniques for the parallelization of industrial real-time applications from automotive, avionic, and construction machinery domains. The aim is to execute such applications on many-core processors with up to 64 cores. The system software plays a key role in the deployment of applications. However, requirements of application domains differ widely, and thus no...

chapter

Design, Implementation and Evaluation of Built-in Functions on Parallel Programming Model in SMYLE OpenCL

Noriko Etani, Takuji Hieda, Hiroyuki Tomiyama

2013 IEEE 7th International Symposium on Embedded Multicore Socs > 113 - 118

2013 IEEE 7th International Symposium on Embedded Multicore Socs (MCSoC)

In this paper, we propose built-in functions on parallel programming model in SMYLE OpenCL to extend the original OpenCL semantics giving our system's original limitation and interpretation for embedded many-core architecture. On a platform using FPGA to evaluate embedded many-core architecture SMYLEref, data parallel and task parallel programming models supported by the OpenCL execution model are...

chapter

TM-dietlibc: A TM-aware Real-World System Library

Vesna Smiljkovic, Martin Nowack, Neboja Miletic, Timothy Harris, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1266 - 1274

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The simplicity of concurrent programming with Transactional Memory (TM) and its recent implementation in mainstream processors greatly motivates researchers and industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C system library which a wide range of TM developers can adopt. TM application developers have been forced to avoid library...

chapter

An OpenCL Runtime Library for Embedded Multi-Core Accelerator

Ryuichi Sakamoto, Mikiko Sato, Yusuke Koizumi, Hideharu Amano, more

2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications > 419 - 422

2012 IEEE 18th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2012)

In recent years, improvements of energy efficiency and computational performance have become a major issue, because smartphones and tablets become popular. To implement high performance, multi-core accelerator consists of general purpose processors and accelerators are often used. But to use these multi-core accelerator efficiently, programmers have to consider synchronization and data transfer between...

chapter

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Rohit Sinha, Aayush Prakash, Hiren D. Patel

17th Asia and South Pacific Design Automation Conference > 455 - 460

2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC)

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional SystemC...

chapter

Formal heterogeneous system modeling with SystemC

Seyed Hosein Attarzadeh Niaki, Mikkel Koefoed Jakobsen, Tero Sulonen, Ingo Sander

Proceeding of the 2012 Forum on Specification and Design Languages > 160 - 167

2012 Forum on Specification & Design Languages (FDL)

Electronic System Level (ESL) design of embedded systems proposes raising the abstraction level of the design entry to cope with the increasing complexity of such systems. To exploit the benefits of ESL, design languages should allow specification of models which are a) heterogeneous, to describe different aspects of systems; b) formally defined, for application of analysis and synthesis methods;...

chapter

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

R Dietrich, T Ilsche, G Juckeland

2010 39th International Conference on Parallel Processing Workshops > 135 - 143

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...

chapter

Exporting kernel page caching for efficient user-level I/O

Richard P Spillane, Sagar Dixit, Shrikar Archak, Saumitra Bhanage, more

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 13

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010)

The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting,...

chapter

The Research and Design on Key Issues for Threads Packages

Wang chengjun

2009 Pacific-Asia Conference on Knowledge Engineering and Software Engineering > 63 - 66

2009 Pacific-Asia Conference on Knowledge Engineering and Software Engineering (KESE 2009)

A threads package is a set of primitives available to the user relating to threads. In this paper we will consider some of the issues concerned with the architecture and functionality of threads packages and consider how to implement threads packages.

chapter

CuPP - A framework for easy CUDA integration

J. Breitbart

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures. CuPP offers both a low level interface - mostly consisting of smartpointers...

chapter

A Cosimulation Framework for a Distributed System of Systems

B. Muller-Rathgeber, H. Rauchfuss

2008 IEEE 68th Vehicular Technology Conference > 1 - 5

2008 IEEE 68th Vehicular Technology Conference (VTC 2008-Fall)

In this paper, we present a simple but powerful solution to combine two different simulation environments - SystemC and OMNeT++ - enabling a cosimulation framework for modeling and simulating a distributed system of systems. It is therefore possible to utilize the strengths and preliminary work of OMNeT++ in the field of communication networks and SystemC in modeling hardware entities. We use a socket...

Filter options

Content availability:
Available
Data set:
ieee
Keywords:
SYNCHRONIZATION
KERNEL
LIBRARIES

Publication date

Set your own date range

Keywords

HARDWARE (5)
COMPUTATIONAL MODELING (4)
GRAPHICS PROCESSING UNITS (4)
MEMORY MANAGEMENT (3)
C++ LANGUAGE (2)
COMPUTER ARCHITECTURE (2)
DATA STRUCTURES (2)
GPGPU (2)
INSTRUCTION SETS (2)
RUNTIME (2)
SPARSE MATRICES (2)
SWITCHES (2)
YARN (2)
ABSTRACTS (1)
ACCELERATORS (1)
ANALYTICAL MODELS (1)
APPLICATION PROGRAM INTERFACES (1)
ARRAYS (1)
ASYNCHRONOUS COMMUNICATIONS (1)
AUTO (1)
AUTOMOTIVE (1)
AVIONICS (1)
BENCHMARK TESTING (1)
BUFFER STORAGE (1)
C++ FRAMEWORK (1)
CACHE STORAGE (1)
CHAPTERS (1)
CHARGE CARRIER PROCESSES (1)
CHECKPOINT/RESTART (1)
CHECKPOINTING (1)
COMMUNICATION NETWORKS (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER AIDED ENGINEERING (1)
CONCURRENT COMPUTING (1)
CONSTRUCTION MACHINERY (1)
COSIMULATION FRAMEWORK (1)
CROSS-DOMAIN (1)
CUDA 8.0 (1)
CUDA ENVIRONMENT (1)
CUDA INTEGRATION (1)
CUPP (1)
DATAFLOW (1)
DISTRIBUTED PROCESSING (1)
DISTRIBUTED SYSTEM (1)
EINSTEIN@HOME CLIENT (1)
EMBEDDED MANY-CORE SYSTEM (1)
EMBEDDED SYSTEM (1)
EVENT LOGGING (1)
FAULT TOLERANCE (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FORMAL SPECIFICATIONS (1)
GPGPU COMPUTING (1)
GPU (1)
GPU MEMORY (1)
GPUDIRECT ASYNC (1)
GRAPHICS PROCESSING UNIT (1)
HARD REAL-TIME (1)
HIGH LEVEL INTERFACE (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH-SPEED LOW-POWER PERFORMANCE (1)
HPC APPLICATIONS (1)
HYBRID ARCHITECTURES (1)
HYBRID SIMULATION (1)
I/O (1)
INCACHE SYSTEM WORKLOAD (1)
INDEXES (1)
INFINIBAND (1)
INSTRUMENTS (1)
INTERNET (1)
K-NEAREST NEIGHBOR PROBLEM (1)
KERNEL DEVELOPMENT (1)
KERNEL FILE SYSTEM DESIGN (1)
KERNEL PAGE CACHING (1)
KERNEL PAGE WRITE-BACK THREAD (1)
LARGE COMPUTING SYSTEMS (1)
LINEAR ALGEBRA (1)
LINEAR ALGEBRA KERNELS (1)
LINUX (1)
MANY-CORE (1)
MEMORY ALLOCATION (1)
MEMORY ALLOCATION FUNCTIONS (1)
MESSAGE PASSING INTERFACE (MPI) (1)
MESSAGE SYSTEMS (1)
MODELING (1)
MONITORING (1)
MONITORING LIBRARIES (1)
MULTI-THREADING (1)
MULTICORE PROCESSING (1)
NONINTRUSIVE PERFORMANCE ANALYSIS (1)
NVIDIAS GPGPU SYSTEM CUDA (1)
OMNET++ (1)
OPEN-SOURCE STEERING LIBRARY (1)
OPENCL FRAMEWORK (1)
OPENCL RUNTIME LIBRARY; EMBEDDED; MULTI-CORE ACCELERATOR (1)
OPENSTEER (1)
OUT-OF-CACHE RANDOM LOOKUP (1)
PACKAGE (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options