Search results

Items from 1 to 15 out of 15 results

chapter

A comprehensive performance analysis of HSA and OpenCL 2.0

Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 183 - 193

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...

chapter

Tyche: An efficient Ethernet-based protocol for converged networked storage

Pilar Gonzalez-Ferez, Angelos Bilas

2014 30th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 11

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Current technology trends for efficient use of infrastructures dictate that storage converges with computation by placing storage devices, such as NVM-based cards and drives, in the servers themselves. With converged storage the role of the interconnect among servers becomes more important for achieving high I/O throughput. Given that Ethernet is emerging as the dominant technology for datacenters,...

chapter

VMCSnap: Taking Snapshots of Virtual Machine Cluster with Memory Deduplication

Yumei Huang, Renyu Yang, Lei Cui, Tianyu Wo, more

2014 IEEE 8th International Symposium on Service Oriented System Engineering > 314 - 319

2014 IEEE 8th International Symposium on Service Oriented System Engineering (SOSE)

Virtualization is one of the main technologies currently used to deploy computing systems due to the high reliability and rapid crash recovery it offers in comparison to physical nodes. These features are mainly achieved by continuously producing snapshots of the status of running virtual machines. In earlier works, the snapshot of each individual VM is performed independently, ignoring the memory...

chapter

An adaptive Memory Interface Controller for improving bandwidth utilization of hybrid and reconfigurable systems

Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1 - 4

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Data mining, bioinformatics, knowledge discovery, social network analysis, are emerging irregular applications that exploits data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructured grids. These applications are characterized by unpredictable memory accesses and generally are memory bandwidth bound, but also presents large amounts of inherent dynamic parallelism...

chapter

TM-dietlibc: A TM-aware Real-World System Library

Vesna Smiljkovic, Martin Nowack, Neboja Miletic, Timothy Harris, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1266 - 1274

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The simplicity of concurrent programming with Transactional Memory (TM) and its recent implementation in mainstream processors greatly motivates researchers and industry to investigate this field and propose new implementations and optimizations. However, there is still no standard C system library which a wide range of TM developers can adopt. TM application developers have been forced to avoid library...

chapter

Parallel one- and two-dimensional FFTs on GPGPUs

Mehrdad Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang

Anti-counterfeiting, Security, and Identification > 1 - 5

2012 International Conference on Anti-Counterfeiting, Security and Identification (2012 ASID)

This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...

chapter

Performance characteristics of Graph500 on large-scale distributed environment

Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, more

2011 IEEE International Symposium on Workload Characterization (IISWC) > 149 - 158

2011 IEEE International Symposium on Workload Characterization (IISWC)

Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant....

chapter

Using Shared Memory to Accelerate MapReduce on Graphics Processing Units

Feng Ji, Xiaosong Ma

2011 IEEE International Parallel & Distributed Processing Symposium > 805 - 816

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

Modern General Purpose Graphics Processing Units (GPGPUs) provide high degrees of parallelism in computation and memory access, making them suitable for data parallel applications such as those using the elastic MapReduce model. Yet designing a MapReduce framework for GPUs faces significant challenges brought by their multi-level memory hierarchy. Due to the absence of atomic operations in the earlier...

chapter

Efficiently Using a CUDA-enabled GPU as Shared Resource

Hagen Peters, Martin Köper, Norbert Luttenberger

2010 10th IEEE International Conference on Computer and Information Technology > 1122 - 1127

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

GPGPU is getting more and more important, but when using CUDA-enabled GPUs the special characteristics of NVIDIAs SIMT architecture have to be considered. Particularly, it is not possible to run functions concurrently, although NVIDIAs GPUs consist of many processing units. Therefore, the processing power of GPUs can not be shared among processes, and for an efficient use of the GPU, it has to be...

chapter

Exporting kernel page caching for efficient user-level I/O

Richard P Spillane, Sagar Dixit, Shrikar Archak, Saumitra Bhanage, more

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 13

2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010)

The modern file system is still implemented in the kernel, and is statically linked with other kernel components. This architecture has brought performance and efficient integration with memory management. However kernel development is slow and modern storage systems must support an array of features, including distribution across a network, tagging, searching, deduplication, checksumming, snap-shotting,...

chapter

Adding Multiprocessor Support to an Uniprocessor Distributed Operating System with Transactional Distributed Memory

Thilo Schmitt, Patrick Schmidt, Nico Kaemmer, Steffen Gerhold, more

2010 Second International Conference on Computer Engineering and Applications > 1 > 309 - 313

2010 Second International Conference on Computer Engineering and Applications (ICCEA 2010)

Writing software for distributed systems is a complex task and gets even harder when shared data is replicated among nodes. Transactional memory is a promising technology for dealing with both synchronization and data consistency issues. Rainbow OS, a distributed operating system for PC clusters, employs this concept in a distributed fashion providing a cluster-wide transactional distributed memory...

chapter

A System Framework for the Design of Embedded Software Targeting Heterogeneous Multi-core SoCs

X. Guerin, F. Petrot

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors > 153 - 160

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

Embedded appliances designers rely on heterogeneous multi-core system-on-chips (HMC-SoC) to provide the computing power required by modern applications. Due to the inherent complexity of this kind of platform, the development of specific system architectures is not considered as an option to provide low-level services to an application. Hence, the software is built either from scratch - when the softwarepsilas...

chapter

CuPP - A framework for easy CUDA integration

J. Breitbart

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures. CuPP offers both a low level interface - mostly consisting of smartpointers...

chapter

Design and implementation of the Smith-Waterman algorithm on the CUDA-compatible GPU

Y. Munekawa, F. Ino, K. Hagihara

2008 8th IEEE International Conference on BioInformatics and BioEngineering > 1 - 6

2008 8th IEEE International Conference on Bioinformatics and BioEngineering

This paper describes a design and implementation of the Smith-Waterman algorithm accelerated on the graphics processing unit (GPU). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip memory and processing elements in the GPU....

chapter

Tackling the Memory Balancing Problem for Large-Scale Network Simulation

Hyojeong Kim, Kihong Park

2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems > 1 - 10

2008 IEEE International Symposium on Modeling, Analysis & Simulation of Computers & Telecommunication Systems (MASCOTS)

A key obstacle to large-scale network simulation over PC clusters is the memory balancing problem where a memory-overloaded machine can slow down an entire simulation due to disk I/O overhead. Memory balancing is complicated by (i) the dfficulty of estimating the peak memory consumption of a group of nodes during network partitioning-a consequence of per-node peak memory not being synchronized-and...

Filter options

Data set:
ieee
Keywords:
KERNEL
MEMORY MANAGEMENT
SYNCHRONIZATION

Publication date

Set your own date range

Content availability

Available (14)
None (1)

Keywords

GRAPHICS PROCESSING UNIT (4)
HARDWARE (4)
INSTRUCTION SETS (3)
LIBRARIES (3)
ALGORITHM DESIGN AND ANALYSIS (2)
BENCHMARK TESTING (2)
COMPUTE UNIFIED DEVICE ARCHITECTURE (2)
DISTRIBUTED PROCESSING (2)
OPERATING SYSTEMS (COMPUTERS) (2)
PARALLEL ARCHITECTURES (2)
PARALLEL PROCESSING (2)
PROGRAMMING (2)
SERVERS (2)
STORAGE MANAGEMENT (2)
TRANSACTIONAL MEMORY (2)
1-D FFT (1)
2-D FFT (1)
APPLICATION PROGRAM INTERFACES (1)
ARRAYS (1)
BGP ROUTING (1)
BIOINFORMATICS (1)
BISMUTH (1)
BUFFER STORAGE (1)
C++ FRAMEWORK (1)
C++ LANGUAGE (1)
CACHE STORAGE (1)
CHARGE CARRIER PROCESSES (1)
CLOUD COMPUTING (1)
COMPONENT-BASED SYSTEM FRAMEWORK (1)
COMPUTATIONAL MODELING (1)
COMPUTER GRAPHIC EQUIPMENT (1)
CONCURRENT COMPUTING (1)
COPROCESSORS (1)
CUDA ENABLED GPU (1)
CUDA INTEGRATION (1)
CUDA-COMPATIBLE GPU (1)
CUPP (1)
DATA STRUCTURES (1)
DATABASES (1)
DESIGN (1)
DISCRETE EVENT SIMULATION (1)
DISCRETE-EVENT SIMULATION (1)
DISTRIBUTED CLIENT/SERVER SYSTEM (1)
DISTRIBUTED OPERATING SYSTEM (1)
EINSTEIN@HOME CLIENT (1)
EMBEDDED SOFTWARE DESIGN (1)
EMBEDDED SYSTEMS (1)
FFTW (1)
GENERAL-PURPOSE OPERATING SYSTEM (1)
GLOBAL SCANNING (1)
GPGPU (1)
GPGPU SERVICE (1)
GPU MEMORY (1)
GRAPHICS (1)
GRAPHICS PROCESSING UNITS (1)
GRIPPERS (1)
HETEROGENEOUS (1)
HETEROGENEOUS MULTICORE SYSTEM-ON-CHIPS (1)
HIGH LEVEL INTERFACE (1)
I/O (1)
INCACHE SYSTEM WORKLOAD (1)
INDEXES (1)
K-NEAREST NEIGHBOR PROBLEM (1)
KERNEL DEVELOPMENT (1)
KERNEL FILE SYSTEM DESIGN (1)
KERNEL PAGE CACHING (1)
KERNEL PAGE WRITE-BACK THREAD (1)
LARGE-SCALE NETWORK SIMULATION (1)
LINUX (1)
LOAD MODELING (1)
LOCAL SCANNING (1)
MARS (1)
MEMORY ALLOCATION (1)
MEMORY ALLOCATION FUNCTIONS (1)
MEMORY BALANCING PROBLEM (1)
MEMORY DEDUPLICATION (1)
MEMORY ESTIMATION (1)
MEMORY TRANSFERS (1)
MEMORY USAGE (1)
MEMORY-OVERLOADED MACHINE (1)
MESSAGE SYSTEMS (1)
MICROPROCESSOR CHIPS (1)
MICROWAVE INTEGRATED CIRCUITS (1)
MONITORING (1)
MPI (1)
MULTICORE (1)
MULTIPROCESSING SYSTEMS (1)
MULTIPROCESSOR (1)
MULTIPROCESSOR SUPPORT (1)
NETWORK PARTITIONING (1)
NVIDIA GPU (1)
NVIDIA SIMT ARCHITECTURE (1)
NVIDIAS GPGPU SYSTEM CUDA (1)
ON-CHIP SHARED MEMORY (1)
OPEN-SOURCE STEERING LIBRARY (1)
OPENGL-BASED METHOD (1)
OPENSTEER (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options