2017 46th International Conference on Parallel Processing (ICPP)

book

2017 46th International Conference on Parallel Processing (ICPP)

IEEE

chapter

A Machine Learning Approach for Efficient Parallel Simulation of Beam Dynamics on GPUs

Kamesh Arumugam, Desh Ranjan, Mohammad Zubair, Balsa Terzic, more

2017 46th International Conference on Parallel Processing (ICPP) > 462 - 471

2017 46th International Conference on Parallel Processing (ICPP)

Parallel computing architectures like GPUs have traditionally been used to accelerate applications with dense and highly-structured workloads; however, many important applications in science and engineering are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Numerical simulation of charged particle beam dynamics is one such application where the distribution...

chapter

WA-Dataspaces: Exploring the Data Staging Abstractions for Wide-Area Distributed Scientific Workflows

Mehmet Fatih Aktas, Javier Diaz-Montes, Ivan Rodero, Manish Parashar

2017 46th International Conference on Parallel Processing (ICPP) > 251 - 260

2017 46th International Conference on Parallel Processing (ICPP)

Data staging has been shown to be very effective for supporting data intensive in-situ workflows and coupling of applications. Experimental sciences are increasingly becoming collaborative among geographically distributed teams, and include experimental instruments and HPC facilities. This new way of doing science poses new challenges due to data sizes, complexity of computation, and the use of wide...

chapter

OptiMatch: Enabling an Optimal Match between Green Power and Various Workloads for Renewable-Energy Powered Storage Systems

Xiaoyang Qu, Jiguang Wan, Fengguang Song, Xiaozhao Zhuang, more

2017 46th International Conference on Parallel Processing (ICPP) > 211 - 220

2017 46th International Conference on Parallel Processing (ICPP)

To reduce energy consumption and carbon emission, many data centers have deployed (or anticipate to build) their own renewable-energy power plants. However, the renewable energy (such as wind, tide, and solar energy) has the serious issues of intermittency and variability that prevent the green energy from being utilized effectively in practice. To cope with the issues, new power-supply management...

chapter

High Performance Query Processing for Web Scale RDF Data using BSP Style Communication and Balanced Distribution

Minho Bae, Junho Eum, Donghoon Kim, Sangyoon Oh

2017 46th International Conference on Parallel Processing (ICPP) > 201 - 210

2017 46th International Conference on Parallel Processing (ICPP)

To overcome scalability and performance issues for process queries over a web-scale RDF data, various studies have proposed RDF SPARQL query processing algorithm using parallel processing manners. However, it is hard to resolve the scalability and performance issues together because the problem of communication overhead between nodes is closely related to the data distribution for parallel processing...

chapter

Non-Sequential Striping for Distributed Storage Systems with Different Redundancy Schemes

Yanwen Xie, Dan Feng, Fang Wang

2017 46th International Conference on Parallel Processing (ICPP) > 231 - 240

2017 46th International Conference on Parallel Processing (ICPP)

Modern distributed storage systems often store redundant data in multiple replications or erasure coding according to their access frequencies. Multiple replications scheme is well-performance for hot data while erasure coding scheme is storage-efficient for warm and cold data. When hot data turn cold, an encoding procedure starts to do the conversion. However, due to sequential striping, current...

chapter

Reviewers

2017 46th International Conference on Parallel Processing (ICPP) > xxi

2017 46th International Conference on Parallel Processing (ICPP)

The conference offers a note of thanks and lists its reviewers.

chapter

Large-Scale Parallelization of Smoothed Particle Hydrodynamics Method on Heterogeneous Cluster

Yingrui Wang, Leisheng Li, Rong Tian

2017 46th International Conference on Parallel Processing (ICPP) > 21 - 30

2017 46th International Conference on Parallel Processing (ICPP)

This paper implements a Smoothed Particle Hydrodynamics simulation code and distributes it on a heterogeneous cluster. The theoretical analysis results show that treating GPU as equivalent peer of CPU rather than an assistant or a substitute is the most efficient way of using a CPU+GPU compute node. However, it raises complex challenges of heterogeneous cooperation. Our strategies of hybrid-level...

chapter

GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations

Adrian Castello, Sangmin Seo, Rafael Mayo, Pavan Balaji, more

2017 46th International Conference on Parallel Processing (ICPP) > 60 - 69

2017 46th International Conference on Parallel Processing (ICPP)

OpenMP is the de facto standard application programming interface (API) for on-node parallelism. The most popular OpenMP runtimes rely on POSIX threads (pthreads) implementations that offer an excellent performance for coarse-grained parallelism and match perfectly with the current hardware. However, a recent trend in runtimes/applications points in the direction of leveraging massive on-node parallelism...

chapter

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor

Lijuan Jiang, Chao Yang, Yulong Ao, Wanwang Yin, more

2017 46th International Conference on Parallel Processing (ICPP) > 422 - 431

2017 46th International Conference on Parallel Processing (ICPP)

The matrix-matrix multiplication is an essential building block that can be found in various scientific and engineering applications. High-performance implementations of the matrix-matrix multiplication on state-of-the-art processors may be of great importance for both the vendors and the users. In this paper, we present a detailed methodology of implementing and optimizing the double-precision general...

chapter

Runtime Data Layout Scheduling for Machine Learning Dataset

Yang You, James Demmel

2017 46th International Conference on Parallel Processing (ICPP) > 452 - 461

2017 46th International Conference on Parallel Processing (ICPP)

Machine Learning (ML) approaches are widelyused classification/regression methods for data mining applications. However, the time-consuming training process greatly limits the efficiency of ML approaches. We use the example of SVM (traditional ML algorithm) and DNN (state-of-the-art ML algorithm) to illustrate the idea in this paper. For SVM, a major performance bottleneck of current tools is that...

chapter

PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout

Zhipeng Li, Yinlong Xu, Yongkun Li, Chengjin Tian, more

2017 46th International Conference on Parallel Processing (ICPP) > 402 - 411

2017 46th International Conference on Parallel Processing (ICPP)

Parity declustering is widely deployed in erasure coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve the parity declustered data layout so as to guarantee the RAID performance after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they can not be applied for scaling...

chapter

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications

Guojing Cong, Onkar Bhardwaj, Minwei Feng

2017 46th International Conference on Parallel Processing (ICPP) > 11 - 20

2017 46th International Conference on Parallel Processing (ICPP)

Parallel and distributed processing is employed to accelerate training for many deep-learning applications with large models and inputs. As it reduces synchronization and communication overhead by tolerating stale gradient updates, asynchronous stochastic gradient descent (ASGD), derived from stochastic gradient descent (SGD), is widely used. Recent theoretical analyses show ASGD converges with linear...

chapter

Boosting the Efficiency of HPCG and Graph500 with Near-Data Processing

Erik Vermij, Leandro Fiorin, Christoph Hagleitner, Koen Bertels

2017 46th International Conference on Parallel Processing (ICPP) > 31 - 40

2017 46th International Conference on Parallel Processing (ICPP)

HPCG and Graph500 can be regarded as the two most relevant benchmarks for high-performance computing systems. Existing supercomputer designs, however, tend to focus on floating-point peak performance, a metric less relevant for these two benchmarks, leaving resources underutilized, and resulting in little performance improvements, for these benchmarks, over time. In this work, we analyze the implementation...

chapter

Locality-Aware Dynamic Task Graph Scheduling

Jordyn Maglalang, Sriram Krishnamoorthy, Kunal Agrawal

2017 46th International Conference on Parallel Processing (ICPP) > 70 - 80

2017 46th International Conference on Parallel Processing (ICPP)

Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NABBITC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NABBITC allows users to assign a color to each task representing the location (e.g., a processor core) that has the...

chapter

[Publisher's information]

2017 46th International Conference on Parallel Processing (ICPP) > 604

2017 46th International Conference on Parallel Processing (ICPP)

Provides a listing of current committee members and society officers.

chapter

Application-Aware Power Coordination on Power Bounded NUMA Multicore Systems

Rong Ge, Pengfei Zou, Xizhou Feng

2017 46th International Conference on Parallel Processing (ICPP) > 591 - 600

2017 46th International Conference on Parallel Processing (ICPP)

Power is a critical factor that limits the performance and scalability of modern high performance computer systems. Considering power as a first-order constraint and a scarce system resource, power-bounded computing represents a new perspective to address the power challenge in HPC.In this work we present an application-aware, multi-dimensional power allocation framework to support power-bounded parallel...

chapter

Author index

2017 46th International Conference on Parallel Processing (ICPP) > 601 - 603

2017 46th International Conference on Parallel Processing (ICPP)

Presents an index of the authors whose articles are published in the conference proceedings record.

chapter

A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters

Isuru Dilanka Fernando, Sanath Jayasena, Milinda Fernando, Hari Sundar

2017 46th International Conference on Parallel Processing (ICPP) > 513 - 522

2017 46th International Conference on Parallel Processing (ICPP)

We present a scalable distributed memory library for generating and computations involving structured dense matrices, such as those produced by boundary integral equation formulations. Such matrices are dense, but have special structure that can be exploited to obtain efficient storage and matrix-vector product evaluations and consequently the fast solution of linear systems. At the core of the methods...

chapter

E-Storm: Replication-Based State Management in Distributed Stream Processing Systems

Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein, more

2017 46th International Conference on Parallel Processing (ICPP) > 571 - 580

2017 46th International Conference on Parallel Processing (ICPP)

Apache Storm is a fault-tolerant, distributed inmemory computation system for processing large volumes of high-velocity data in real-time. As an integral part of the fault-tolerance mechanism, Storm's state management is achieved by a checkpointing framework, which commits states regularly and recovers lost states from the latest checkpoint. However, this method involves a remote data store for state...

INFONA - science communication portal

2017 46th International Conference on Parallel Processing (ICPP)