Advanced search

Advanced search in people

From:

To:

Items from 1 to 16 out of 16 results

chapter

Puffin: Graph Processing System on Multi-GPUs

Peng Zhao, Xuan Luo, Jiang Xiao, Xuanhua Shi, more

2017 IEEE 10th Conference on Service-Oriented Computing and Applications (SOCA) > 50 - 57

2017 IEEE 10th Conference on Service-Oriented Computing and Applications (SOCA)

Multi-GPUs nodes are becoming the platform of choice for graph processing. However, in the multiple GPUs environment, there are two main challenges in designing a graph processing system. First, the system suffers from huge communication overhead. GPUs and CPUs are connected through PCIe, whose bandwidth is far smaller than that of GPU memory. Second, the system is developed based on BSP (Bulk Synchronous...

chapter

POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs

Prashant SIngh Rawat, Aravind Sukumaran-Rajam, Atanas Rountev, Fabrice Rastello, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 158 - 159

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Compute-intensive GPU architectures allow the use of high-order 3D stencils for better computational accuracy. These stencils are usually compute-bound. While current state-of-the-art register allocators are satisfactory for most applications, they are unable to effectively manage register pressure for such complex high-order stencils, resulting in a sub-optimal code with a large number of register...

chapter

Package and Printed Circuit Board Design of a 19.2 Gb/s Data Link for High-Performance Computing

Sungjun Chun, Jose Hejase, Junyan Tang, Jean Audet, more

2017 IEEE 67th Electronic Components and Technology Conference (ECTC) > 1701 - 1707

2017 IEEE 67th Electronic Components and Technology Conference (ECTC)

A 19.2 Gb/s per lane link with IBM's latest POWER8 processor module has been analyzed. This paper presents the overview of the high-speed link design from the signal integrity point of view. Design approaches in package and printed circuit board (PCB) to support the target data-rate have been discussed. The end-to-end communication bus is modeled from extracted post-route design with a 3-D full-wave...

chapter

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A. Gawande, Joshua B. Landwehr, Jeff A. Daily, Nathan R. Tallent, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 399 - 408

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...

chapter

The SEPO Model of Computation to Enable Larger-Than-Memory Hash Tables for GPU-Accelerated Big Data Analytics

Reza Mokhtari, Michael Stumm

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 866 - 875

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The massive parallelism and high memory bandwidth of GPU's are particularly well matched with the exigencies of Big Data analytics applications, for which many independent computations and high data throughput are prevalent. These applications often produce (intermediary or final) results in the form of key-value (KV) pairs, and hash tables are particularly well-suited for storing these KV pairs in...

chapter

Acceleration of parallel computation for derived micro-modeling circuit by exploiting GPU memory bandwidth limit

Yuhang Dou, Ke-Li Wu

2017 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization for RF, Microwave, and Terahertz Applications (NEMO) > 146 - 148

2017 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization for RF, Microwave, and Terahertz Applications (NEMO)

It has been shown that a newly proposed micro-modeling method for deriving a concise passive circuit of a large-scale EM problem is highly suitable for GPU parallel computation. However, due to the memory bandwidth limit of GPU, the utilization of GPU is far from its peak performance because more than 97% processing time is occupied by the frequent data transactions. This paper proposes an effective...

chapter

Optimizing memory efficiency for convolution kernels on kepler GPUs

Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating...

chapter

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers

Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 739 - 749

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather forecast simulation. Servers with multiple accelerator devices that are primarily connected by a PCI-Express (PCIe) network achieve a significantly higher energy efficiency. Memory transfers between accelerators in such a system are subjected to PCIe...

chapter

A Model for Weak Scaling to Many GPUs at the Basis of the Linpack Benchmark

David Rohr, Jan de Cuveland, Volker Lindenstruth

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 192 - 202

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Today, accelerator cards like GPUs are an important constituent of HPC clusters. For certain GPU-intense applications, the trend is shifting toward multi-GPU systems with four or more GPUs per compute node. This can increase the performance per dollar and the performance per watt. The Linpack benchmark is the standard tool for measuring the compute performance of supercomputers. Its standard implementation,...

chapter

Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters

Toshio Endo

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 21 - 29

2016 IEEE International Conference on Cluster Computing (CLUSTER)

The memory wall problem is one of major obstacles against the realization of extremely fast and large scale simulations. Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPU clusters in speed, due to high memory bandwidth and computation speed of accelerators. However, their problem scales have been limited by small capacity of GPU device memory...

chapter

Broadband multi-frequency GNSS signal simulation with GPU

Iva Bartunkova, Bernd Eissfeller

2016 IEEE/ION Position, Location and Navigation Symposium (PLANS) > 477 - 490

2016 IEEE/ION Position, Location and Navigation Symposium - PLANS 2016

This work introduces an alternative architecture of a GNSS signal simulator, where the multiple GNSS services in the full GNSS bandwidth from L5 to L1 are generated and mixed in digital form. The digital-to-analog conversion and up-conversion to L-band is then applied to the single compounded wideband digital signal. The digital signal generation and mixing is implemented on a pair of strong GPUs...

chapter

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time

Benjamin Klenk, Lena Oden, Holger Froning

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 318 - 327

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Accelerated computing has become pervasive for increasing the computational power and energy efficiency in terms of GFLOPs/Watt. For application areas with highest demands, for instance high performance computing, data warehousing and high performance analytics, accelerators like GPUs or Intel’s MICs are distributed throughout the cluster. Since current analyses and predictions show that data movement...

chapter

Performance enhancing factors for manycore architectures: State-of-the-art

P.S. Tamizharasan, Praveen Kumar Yadav, N. Ramasubramanian, K. Geetha

2014 First International Conference on Networks & Soft Computing (ICNSC2014) > 278 - 283

2014 International Conference on Networks & Soft Computing (ICNSC)

Manycore architecture system includes more number of processing elements to improve the performance while sustaining power considerations. Accelerating heterogeneous manycore computing elements involves huge amount of memory copy, computation and thread management. Applications of manycore architectures range from desktop computer to ware-house-scale computer. In this paper, the state-of-the-art trends...

chapter

Performance Models for CPU-GPU Data Transfers

B. van Werkhoven, J. Maassen, F.J. Seinstra, H.E. Bal

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 11 - 20

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Many GPU applications perform data transfers to and from GPU memory at regular intervals. For example because the data does not fit into GPU memory or because of internode communication at the end of each time step. Overlapping GPU computation with CPU-GPU communication can reduce the costs of moving data. Several different techniques exist for transferring data to and from GPU memory and for overlapping...

chapter

Efficient simulation of large-scale Spiking Neural Networks using CUDA graphics processors

J.M. Nageswaran, N. Dutt, J.L. Krichmar, A. Nicolau, more

2009 International Joint Conference on Neural Networks > 2145 - 2152

2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta)

Neural network simulators that take into account the spiking behavior of neurons are useful for studying brain mechanisms and for engineering applications. Spiking neural network (SNN) simulators have been traditionally simulated on large-scale clusters, super-computers, or on dedicated hardware architectures. Alternatively, graphics processing units (GPUs) can provide a low-cost, programmable, and...

chapter

Performance insights on executing non-graphics applications on CUDA on the NVIDIA GeForce 8800 GTX

Wen-meii Hwu, Daviid Kiirk, Shane Ryoo, Chriistopher Rodriigues, more

2007 IEEE Hot Chips 19 Symposium (HCS) > 1 - 11

2007 IEEE Hot Chips 19 Symposium (HCS)

This article consists of a collection of slides from the author's conference presentation on NVIDIA's GeForce 8800 GTX family of products. Some of the specific topics discussed include: the special features, system specifications, and system design for these products; GPU computing capabilities; system architectures; applications for use; platforms supported; processing capabilities; memory capabilities;...

Filter options

Keywords:
BANDWIDTH
COMPUTATIONAL MODELING
GRAPHICS PROCESSING UNITS
Publication type:
book

Publication date

Set your own date range

Keywords

GPU (4)
BENCHMARK TESTING (3)
COMPUTER ARCHITECTURE (3)
INSTRUCTION SETS (3)
PARALLEL PROCESSING (3)
PERFORMANCE EVALUATION (3)
ACCELERATION (2)
CUDA (2)
DATA TRANSFER (2)
ENGINES (2)
GPU COMPUTING (2)
KERNEL (2)
MEMORY MANAGEMENT (2)
AGGREGATES (1)
ANALYTICS (1)
APU (1)
BIG DATA (1)
BRAIN MECHANISMS (1)
BRAIN MODELING (1)
BROADBAND COMMUNICATION (1)
BROADBAND SIGNAL SIMULATION (1)
CAFFE (1)
COMMUNICATION OVERHEAD (1)
COMPLEXITY THEORY (1)
CONVOLUTION (1)
CONVOLUTIONAL NEURAL NETWORKS (1)
COPROCESSORS (1)
CROSSTALK (1)
CUDA GRAPHICS PROCESSORS (1)
DATA MODELS (1)
DATA PARALLELISM (1)
DEEP LEARNING (1)
DELAY (1)
DGEMM (1)
DIGITAL SIMULATION (1)
ELECTROMAGNETIC SOLVERS (1)
ENGINEERING APPLICATIONS (1)
EYE DIAGRAM (1)
FIRING (1)
GPGPU (1)
GRAPH PROCESSING (1)
GRAPHICS PROCESSING UNIT (1)
GRAPHICS PROCESSING UNIT (GPU) (1)
GRAPHICS PROCESSOR (1)
HASH TABLE (1)
HIGH SPEED SERDES (1)
HIGH-ORDER STENCILS (1)
HIGH-PERFORMANCE COMPUTING (1)
HPL (1)
IMAGE RECONSTRUCTION (1)
IMPEDANCE (1)
INTEGRATED CIRCUIT MODELING (1)
INTEL KNIGHTS LANDING (1)
IZHIKEVICH NEURON (1)
IZHIKEVICH SPIKING NEURON (1)
KEY-VALUE (1)
LAMINATES (1)
LARGE-SCALE SPIKING NEURAL NETWORK SIMULATION (1)
LAYOUT (1)
LIBRARIES (1)
LINPACK (1)
MACHINE LEARNING (1)
MAGNETIC RESONANCE IMAGING (1)
MANYCORE ARCHITECTURE (1)
MATEX (1)
MEMORY BANDWIDTH (1)
MEMORY HIERARCHY (1)
MEMORY ISSUE (1)
MIC (1)
MICRO-MODELING CIRCUIT (1)
MODEL ORDER REDUCTION (MOR) (1)
MULTI-CONSTELLATION GNSS SIGNAL SIMULATOR (1)
MULTIPLE GPUS (1)
NEURAL NETS (1)
NEURAL NETWORKS (1)
NEURONS (1)
NUMERICAL MODELS (1)
NVIDIA DGX-1 (1)
NVIDIA GTX-280 (1)
OPTIMIZATION (1)
OUT OF CORE (1)
OUT-OF-CORE (1)
PACKAGE AND BOARD DESIGN (1)
PARALLEL COMPUTATION (1)
PARALLEL PROGRAMMING (1)
PARALLELISM EXTRACTION (1)
PCI-EXPRESS (1)
PERFORMANCE ANALYSIS (1)
PERFORMANCE MODEL (1)
PORTS (COMPUTERS) (1)
PREDICTIVE MODELS (1)
PROGRAMMING (1)
REAL-TIME SYSTEMS (1)
REGISTER PRESSURE (1)
REGISTERS (1)
RUNTIME (1)
RUNTIME LIBRARY (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options