2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Items from 1 to 10 out of 10 results

chapter

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi

Athena Elafrou, Georgios Goumas, Nectarios Koziris

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1389 - 1398

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper we propose a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel on the Intel Xeon Phi manycore processor. The architectural differences of such processors compared to their multicore counterparts overly expose inherent structural weaknesses of different sparse matrices, intensifying performance issues beyond the traditionally reported memory bandwidth...

chapter

Load-Aware Strategies for Cloud-Based VoIP Optimization with VM Startup Prediction

Jorge M. Cortes-Mendoza, Andrei Tchernykh, Igor Bychkov, Alexander Feoktistov, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 472 - 481

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we address cloud VoIP scheduling strategies to provide appropriate levels of quality of service to users, and cost to VoIP service providers. This bi-objective focus is reasonable and representative for real installations and applications. We conduct comprehensive simulation on real data of twenty three on-line non-clairvoyant scheduling strategies with fixed threshold of utilization...

chapter

Improving CPU Performance Through Dynamic GPU Access Throttling in CPU-GPU Heterogeneous Processors

Siddharth Rai, Mainak Chaudhuri

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 18 - 29

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Heterogeneous chip-multiprocessors with integrated CPU and GPU cores on the same die allow sharing of critical memory system resources among the applications executing on the twotypes of cores. In this paper, we explore memory system management driven by the quality of service (QoS) requirement of the GPU applications executing simultaneously with CPUapplications in such heterogeneous platforms. Our...

chapter

Design of an Energy Aware Petaflops Class High Performance Cluster Based on Power Architecture

Wissam Abu Ahmad, Andrea Bartolini, Francesco Beneventi, Luca Benini, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 964 - 973

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM’s POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100...

chapter

Quadruple-Precision BLAS Using Bailey's Arithmetic with FMA Instruction: Its Performance and Applications

Susumu Yamada, Toshiyuki Imamura, Takuya Ina, Narimasa Sasa, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1418 - 1425

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

When a floating-point arithmetic is executed on a processor unit, round-off and truncation errors occur every calculation. These errors cause a precision issue in a large simulation which requires a great number of calculations. Therefore, we have developed the quadruple-precision basic linear algebra subprograms (QPBLAS) based on Bailey's double-double arithmetic. The multiplication operation of...

chapter

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A. Gawande, Joshua B. Landwehr, Jeff A. Daily, Nathan R. Tallent, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 399 - 408

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...

chapter

Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

Luka Stanisic, Lucas Mello Schnorr, Augustin Degomme, Franz C. Heinrich, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1588 - 1597

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS, ) to cope with the evolution and complexity of network technology. Although the network has received a lot of attention,...

chapter

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 683 - 692

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional...

chapter

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications

Kavitha Chandrasekar, Xiang Ni, Laxmikant V. Kale

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1293 - 1300

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth...

chapter

Implementing the OpenACC Data Model

Michael Wolfe, Seyong Lee, Jungwon Kim, Xiaonan Tian, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 662 - 672

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Programming accelerators today usually requires managing separate virtual and physical memories, such as allocating space in and copying data between host and device memories. The OpenACC API provides data directives and clauses to control this behavior where it is required. This paper describes how the data model is supported in current OpenACC implementations, ranging from research compilers (OpenUH...

Filter options

Keywords:
BANDWIDTH

Publication date

Set your own date range

Keywords

COMPUTER ARCHITECTURE (3)
INSTRUCTION SETS (3)
RANDOM ACCESS MEMORY (3)
BENCHMARK TESTING (2)
ELECTRONIC MAIL (2)
GRAPHICS PROCESSING UNITS (2)
HARDWARE (2)
HPC (2)
MULTICORE PROCESSING (2)
QUALITY OF SERVICE (2)
SCHEDULING (2)
SUPERCOMPUTERS (2)
3D SCENE RENDERING (1)
ACCELERATOR (1)
ACCESS THROTTLING (1)
ANALYTICAL MODELS (1)
APPLICATION PERFORMANCE ON INTEL KNL (1)
BAILY'S DOUBLE-DOUBLE ARITHMETIC (1)
BIN PACKING (1)
CAFFE (1)
CALL ALLOCATION (1)
CENTRAL PROCESSING UNIT (1)
CLOUD COMPUTING (1)
CLOUD VOICE OVER IP (1)
CODECS (1)
COMPUTATIONAL MODELING (1)
CONFERENCES (1)
CONVOLUTIONAL NEURAL NETWORKS (1)
CPU-GPU HETEROGENEOUS PROCESSORS (1)
DATA MODELS (1)
DATA STRUCTURES (1)
DEEP LEARNING (1)
DISTRIBUTED PROCESSING (1)
DRAM BANDWIDTH (1)
EIGENVALUES AND EIGENFUNCTIONS (1)
ENERGY AWARE (1)
FMA INSTRUCTION (1)
GPGPU (1)
HBM (1)
HYBRID MEMORY SYSTEM (1)
ICE (1)
INTEL KNIGHTS LANDING (1)
INTEL KNIGHTS LANDING (KNL) (1)
KERNEL (1)
LIBRARIES (1)
LIQUID COOLING (1)
LOAD MODELING (1)
MACHINE LEARNING (1)
MARKET RESEARCH (1)
MATEX (1)
MCDRAM (1)
MEMORY HETEROGENEITY (1)
MEMORY MANAGEMENT (1)
NEURAL NETWORKS (1)
NVIDIA DGX-1 (1)
NVLINK (1)
OPTIMIZATION (1)
PERFORMANCE EVALUATION (1)
POWER ARCHITECTURE (1)
POWER DEMAND (1)
POWER MONITOR (1)
PREDICTIVE MODELS (1)
PROGRAM PROCESSORS (1)
PROTOCOLS (1)
QUADRUPLE-PRECISION BLAS (1)
QUADRUPLE-PRECISION EIGENVALUE SOLVER (1)
RENDERING (COMPUTER GRAPHICS) (1)
RUNTIME (1)
RUNTIME SYSTEM (1)
SERVERS (1)
SHARED LAST-LEVEL CACHE (1)
SIZE MEASUREMENT (1)
SOFTWARE (1)
SPARSE MATRICES (1)
TELEPHONE SETS (1)
THREE-DIMENSIONAL DISPLAYS (1)
TRAINING (1)
UPPER BOUND (1)
more

INFONA - science communication portal

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)