2017 46th International Conference on Parallel Processing (ICPP)

chapter

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano, more

2017 46th International Conference on Parallel Processing (ICPP) > 362 - 371

The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...

chapter

A Dynamic Resource Controller for a Lambda Architecture

MohammadReza HoseinyFarahabady, Javid Taheri, Zahir Tari, Albert Y. Zomaya

2017 46th International Conference on Parallel Processing (ICPP) > 332 - 341

2017 46th International Conference on Parallel Processing (ICPP)

Lambda architecture is a novel event-driven serverless paradigm that allows companies to build scalable and reliable enterprise applications. As an attractive alternative to traditional service oriented architecture (SOA), Lambda architecture can be used in many use cases including BI tools, in-memory graph databases, OLAP, and streaming data processing. In practice, an important aim of Lambda's service...

chapter

Fading-Resistant Link Scheduling in Wireless Networks

Chenxi Qiu, Haiying Shen

2017 46th International Conference on Parallel Processing (ICPP) > 312 - 321

2017 46th International Conference on Parallel Processing (ICPP)

In this paper, we study the link scheduling problem considering the fluctuating fading effect in transmissions. We extend the previous deterministic physical interference model to the Rayleigh-fading model that uses the stochastic propagation to address fading effects. Based on this model, we formulate a problem called Fading-Resistant Link Scheduling (Fading-R-LS) problem, which aims to maximize...

chapter

Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning

Jiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos

2017 46th International Conference on Parallel Processing (ICPP) > 181 - 190

2017 46th International Conference on Parallel Processing (ICPP)

This paper investigates how to improve the memory locality of graph-structured analytics on large-scale shared memory systems. We demonstrate that a graph partitioning where all in-edges for a vertex are placed in the same partition improves memory locality. However, realising performance improvement through such graph partitioning poses several challenges and requires rethinking the classification...

chapter

ES2: Aiming at an Optimal Virtual I/O Event Path

Xiaokang Hu, Wang Zhang, Jian Li, Ruhui Ma, more

2017 46th International Conference on Parallel Processing (ICPP) > 141 - 150

2017 46th International Conference on Parallel Processing (ICPP)

Improving the performance of I/O virtualization is a key issue for cloud and datacenter infrastructures, especially with the rapid increase of network interconnection speeds. Previous efforts have made the performance overhead associated with the virtual I/O data path largely negligible. The remaining bottlenecks mainly lie in the event path: hypervisor interventions trigger costly virtual machine...

chapter

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

Ching-Hsiang Chu, Xiaoyi Lu, Ammar A. Awan, Hari Subramoni, more

2017 46th International Conference on Parallel Processing (ICPP) > 161 - 170

2017 46th International Conference on Parallel Processing (ICPP)

Broadcast operations (e.g. MPI_Bcast) have been widely used in deep learning applications to exchange a large amount of data among multiple graphics processing units (GPUs). Recent studies have shown that leveraging the InfiniBand hardware-based multicast (IB-MCAST) protocol can enhance scalability of GPU-based broadcast operations. However, these initial designs with IB-MCAST are not optimized for...

chapter

HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip

Vikram K. Narayana, Shuai Sun, Armin Mehrabian, Volker J. Sorger, more

2017 46th International Conference on Parallel Processing (ICPP) > 131 - 140

2017 46th International Conference on Parallel Processing (ICPP)

As we move towards an era of hundreds of cores, the research community has witnessed the emergence of optoelectronic network on-chip designs based on nanophotonics, in order to achieve higher network throughput, lower latencies, and lower dynamic power. However, traditional nanophotonics options face limitations such as large device footprints compared with electronics, higher static power due to...

chapter

Constrained Tensor Factorization with Accelerated AO-ADMM

Shaden Smith, Alec Beri, George Karypis

2017 46th International Conference on Parallel Processing (ICPP) > 111 - 120

2017 46th International Conference on Parallel Processing (ICPP)

Low-rank sparse tensor factorization is a populartool for analyzing multi-way data and is used in domainssuch as recommender systems, precision healthcare, and cybersecurity.Imposing constraints on a factorization, such asnon-negativity or sparsity, is a natural way of encoding priorknowledge of the multi-way data. While constrained factorizationsare useful for practitioners, they can greatly increasefactorization...

chapter

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti

2017 46th International Conference on Parallel Processing (ICPP) > 91 - 100

2017 46th International Conference on Parallel Processing (ICPP)

We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. All kernels heavily exploit the registers of the graphics processing unit (GPU) in order to deliver high performance for small problems. The development of these kernels is motivated by the need for tackling this embarrasingly-parallel...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement Devices

Sudip K. Seal, Mark R. Cianciosa, Steven P. Hirshman, Andreas Wingen, more

2017 46th International Conference on Parallel Processing (ICPP) > 282 - 291

2017 46th International Conference on Parallel Processing (ICPP)

Fast, accurate three dimensional reconstructions of plasma equilibria, crucial for physics interpretation of fusion data generated within confinement devices like stellarators/ tokamaks, are computationally very expensive and routinely require days, even weeks, to complete using serial approaches. Here, we present a parallel implementation of the three dimensional plasma reconstruction code, V3FIT...

chapter

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Minyoung Jung, Jinwoo Park, Johann Blieberger, Bernd Burgstaller

2017 46th International Conference on Parallel Processing (ICPP) > 271 - 281

2017 46th International Conference on Parallel Processing (ICPP)

String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...

chapter

Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe Recovery of Triple Disk Failure Tolerant Arrays

Luyu Li, Houxiang Ji, Chentao Wu, Jie Li, more

2017 46th International Conference on Parallel Processing (ICPP) > 221 - 230

2017 46th International Conference on Parallel Processing (ICPP)

With the development of cloud computing, disk arrays tolerating triple disk failures (3DFTs) are receiving more attention nowadays because they can provide high data reliability with low monetary cost. However, a challenging issue in these arrays is how to efficiently reconstruct the lost data, especially for partial stripe errors (e.g., sector and chunk errors). It is one of the most significant...

chapter

Practical Experience with Transactional Lock Elision

Tingzhe Zhou, Pante A Zardoshti, Michael Spear

2017 46th International Conference on Parallel Processing (ICPP) > 81 - 90

2017 46th International Conference on Parallel Processing (ICPP)

Transactional Memory (TM) promises both to provide a scalable mechanism for synchronization in concurrent programs, and to offer ease-of-use benefits to programmers. The most straightforward use of TM in real-world programs is in the form of Transactional Lock Elision (TLE). In TLE, critical sections are attempted as transactions, with a fall-back to a lock if conflicts manifest. Thus TLE expects...

chapter

GCN: GPU-Based Cube CNN Framework for Hyperspectral Image Classification

Han Dong, Tao Li, Jiabing Leng, Lingyan Kong, more

2017 46th International Conference on Parallel Processing (ICPP) > 41 - 49

2017 46th International Conference on Parallel Processing (ICPP)

Hyperspectral image classification has been proved significant in remote sensing field. Traditional classification methods have meet bottlenecks due to the lack of remote sensing background knowledge or high dimensionality. Deep learning based methods, such as deep convolutional neural network (CNN), can effectively extract high level features from raw data. But the training of deep CNN is rather...

chapter

Greed Is Good: Parallel Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures

Mustafa Kemal Tas, Kamer Kaya, Erik Saule

2017 46th International Conference on Parallel Processing (ICPP) > 503 - 512

2017 46th International Conference on Parallel Processing (ICPP)

In parallel computing, a valid graph coloring yields a lock-free processing of the colored tasks, data points, etc., without expensive synchronization mechanisms. However, coloring is not free and the overhead can be significant. In particular, for the bipartite-graph partial coloring (BGPC) and distance-2 graph coloring (D2GC) problems, which have various use-cases within the scientific computing...

chapter

Resilience for Stencil Computations with Latent Errors

Aiman Fang, Aurelien Cavelan, Yves Robert, Andrew A. Chien

2017 46th International Conference on Parallel Processing (ICPP) > 581 - 590

2017 46th International Conference on Parallel Processing (ICPP)

Projections and measurements of error rates in near-exascale and exascale systems suggest a dramatic growth, due to extreme scale (10^9 cores), concurrency, software complexity, and deep submicron transistor scaling. Such a growth makes resilience a critical concern, and may increase the incidence of errors that "escape", silently corrupting application state. Such errors can often be revealed...

chapter

Network Aware Multi-User Computation Partitioning in Mobile Edge Clouds

Lei Yang, Jiannong Cao, Zhenyu Wang, Weigang Wu

2017 46th International Conference on Parallel Processing (ICPP) > 302 - 311

2017 46th International Conference on Parallel Processing (ICPP)

Mobile edge cloud has been increasingly concerned by researchers due to its closer distance to mobile users than the traditional cloud on Internet. Offloading computations from mobile devices to the nearby edge cloud is an effective technique to accelerate the applications and/or save energy on the mobile devices. However, the mobile edge cloud usually has limited computation resources and constrained...

chapter

[Copyright notice]

2017 46th International Conference on Parallel Processing (ICPP) > iv

2017 46th International Conference on Parallel Processing (ICPP)

Presents the copyright information for the conference. May include reprint permission information.

chapter

[Title page iii]

2017 46th International Conference on Parallel Processing (ICPP) > iii

2017 46th International Conference on Parallel Processing (ICPP)

Presents the title page of the proceedings record.

INFONA - science communication portal

2017 46th International Conference on Parallel Processing (ICPP)

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

A Dynamic Resource Controller for a Lambda Architecture

Fading-Resistant Link Scheduling in Wireless Networks

Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning

ES2: Aiming at an Optimal Virtual I/O Event Path

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

HyPPI NoC: Bringing Hybrid Plasmonics to an Opto-Electronic Network-on-Chip

Constrained Tensor Factorization with Accelerated AO-ADMM

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement Devices

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Favorable Block First: A Comprehensive Cache Scheme to Accelerate Partial Stripe Recovery of Triple Disk Failure Tolerant Arrays

Practical Experience with Transactional Lock Elision

GCN: GPU-Based Cube CNN Framework for Hyperspectral Image Classification

Greed Is Good: Parallel Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures

Resilience for Stencil Computations with Latent Errors

Network Aware Multi-User Computation Partitioning in Mobile Edge Clouds

[Copyright notice]

[Title page iii]

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2017 46th International Conference on Parallel Processing (ICPP) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 46th International Conference on Parallel Processing (ICPP)