2017 IEEE International Conference on Cluster Computing (CLUSTER)

chapter

A Novel Hybrid Transactional Memory Based on Abort Prediction and Adaptive Retry Policy

Young-Sung Shin, Yeon-Woo Jang, Moon-Hwan Kang, Jae-Woo Chang

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 613 - 614

This paper proposes a novel hybrid transactional memory scheme based on both abort prediction and an adaptive retry policy. First, the proposed scheme can predict not only conflicts between transactions running concurrently, but also the capacity and other aborts of transactions by collecting the information of previously executed transactions. Second, the proposed scheme can provide an adaptive retry...

chapter

Mitigating the Write Amplification Problem of Write-Optimized File Systems on Flash Storage

Shuo-Han Chen, Jun-Long Lin, Tseng-Yi Chen, Tsan-Sheng Hsu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 615 - 616

2017 IEEE International Conference on Cluster Computing (CLUSTER)

As the volume of data stored by Big data and Cloud services continues to grow, both academia and industry are seeking for high-performance storage systems. Recently, with the recent advances in write-optimized indexes (WOI), WOI-based file systems can now outperform conventional file systems with orders of magnitude on random writes, metadata updates, and small file creation. Based on the B-tree structure,...

chapter

Acceleration of Turbulent Flow Simulations with Intel Xeon Phi(TM) Manycore Processors

Ji-Hoon Kang, Hoon Ryu

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 617 - 618

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Enhancing the performance of turbulent flow simulations is important as the size of simulations grows with higher Reynolds number. We discuss the performance of our in-house turbulent flow simulation solver, named as DNS-TBL (Direct Numerical Simulation: Turbulent Boundary Layer), on the Intel Xeon Phi™ manycore processors. With bootable Knights Landing processors, the DNS-TBL solver shows excellent...

chapter

Distributed Parallel Backprojection for Real-Time Stripmap SAR Imaging on GPU Clusters

Masato Gocho, Noboru Oishi, Atsuo Ozaki

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 619 - 620

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Parallelization on a GPU (graphics processing unit) cluster is an effective approach to reducing the huge computation time of backprojection, which is the most accurate SAR (synthetic aperture radar) imaging algorithm for reconstructing images with no errors caused by the platform motion. To obtain accurate imagery in real-time, we developed a distributed parallel backprojection algorithm for stripmap...

chapter

Parallelized Recovery of Hundreds of Millions Small Data Objects

Kevin Beineke, Stefan Nothaas, Michael Schoettner

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 621 - 622

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Social media networks as well as online graph analytics operate on large-scale graphs with millions of vertices, even billions in some cases. Low-latency access is essential, but caching suffers from the mostly irregular access patterns of the aforementioned application domains. Hence, distributed in-memory systems are proposed keeping all data always in memory. But, the sheer amount of small data...

chapter

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Chongchong Xu, Chao Wang, Lei Gong, Yuntao Lu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 623 - 624

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Large-scale graphs processing attracts more and more attentions, and it has been widely applied in many application domains. FPGA is a promising platform to implement graph processing algorithms with high power-efficiency and parallelism. In this paper, we propose OmniGraph, a scalable hardware accelerator for graph processing. OmniGraph can process graphs with different sizes adaptively and is adaptable...

chapter

A Comparative Study of HDD and SSD RAIDs’ Impact on Server Energy Consumption

Erica Tomes, Nihat Altiparmak

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 625 - 626

2017 IEEE International Conference on Cluster Computing (CLUSTER)

In the US alone, data centers consumed around $20 billion (200 TWh) yearly electricity in 2016, and this amount doubles itself every five years. Data storage alone is estimated to be responsible for about 25% to 35% of data-center power consumption. Servers in data centers generally include multiple HDDs or SSDs, commonly arranged in a RAID level for better performance, reliability, and availability...

chapter

Analyzing Hybrid Transactional Memory Performance Using Intel SDE

Mohammad A. Qayum, Abdel-Hameed A. Badawy, Jeanine Cook

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 627 - 628

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Due to the rapidly increasing use of big data, machines are stressed to provide more computing power at higher energy efficiency while maintaining simpler and more scalable computing paradigms. Transactional Memory (TM) is one such technique that can be used for synchronization instead of conventional locks used in critical sections since it has simpler paradigms, is scalable and has better energy...

chapter

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 629 - 630

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Today, artificial neural networks (ANNs) are widely used in a variety of applications, including speech recognition, face detection, disease diagnosis, etc. And as the emerging field of ANNs, Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) which contains complex computational logic. To achieve high accuracy, researchers always build large-scale LSTM networks which are time-consuming...

chapter

A Power-Efficient Accelerator for Convolutional Neural Networks

Fan Sun, Chao Wang, Lei Gong, Chongchong Xu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 631 - 632

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Convolutional neural networks(CNNs) have been widely applied in various applications. However, the computation-intensive convolutional layers and memory-intensive fully connected layers have brought many challenges to the implementation of CNN on embedded platforms. To overcome this problem, this work proposes a power-efficient accelerator for CNNs, and different methods are applied to optimize the...

chapter

Investigating the Effect of Garbage Collection on Service Level Objectives of Clouds

Panagiotis Patros, Kenneth B. Kent, Michael Dawson

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 633 - 634

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Platform as a Service (PaaS) clouds abstract large parts of the hardware/software stack to its tenant clients and provide it as a service. In this paper, we highlight the lack of scientific literature on the problem of Service Level Objective (SLO) satisfaction effects on clouds due to Garbage Collection (GC). To this end, we propose and implement CloudGC, a configurable PaaS application framework...

chapter

Performance of Large-Scale Electronic Structure Calculations on Built-in FPGA Systems

Seungmin Lee, Dukyun Nam, Hoon Ryu

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 635 - 636

2017 IEEE International Conference on Cluster Computing (CLUSTER)

We discuss the feasibility of an in-house Schrödinger equation solver on the Intel Broadwell Xeon processor with a built-in FPGA, with a particular focus on the performance of large-scale sparse matrix-vector multiplication (SpMV) that is the core numerical operation of electronic structure simulations for multi-million atomic systems. The double-precision SpMV section in our solver is offloaded to...

chapter

Efficient Swap Protocol of Remote Memory Paging for Out-of-Core Multi-thread Applications

Hiroko Midorikawa, Kenji Kitagawa, Hikari Ohura

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 637 - 638

2017 IEEE International Conference on Cluster Computing (CLUSTER)

A new page swap protocol is proposed for a user-level remote memory paging system to accelerate the performance of out-of-core processing with multi-thread user programs and libraries written in OpenMP and pthread. The original swap protocol has a bottle-neck in efficient page swapping which is requested by multiple threads in a user program, because all MPI communications to memory servers and page...

chapter

Evaluating Effect of Write Combining on PCIe Throughput to Improve HPC Interconnect Performance

Mahesh Chaudhari, Kedar Kulkarni, Shreeya Badhe, Vandana Inamdar

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 639 - 640

2017 IEEE International Conference on Cluster Computing (CLUSTER)

HPC interconnect is a very crucial component of any HPC machine. Interconnect performance is one of the contributing factors for overall performance of HPC system. Most popular interface to connect Network Interface Card (NIC) to CPU is PCI express (PCIe). With denser core counts in compute servers and increasingly maturing fabric interconnect speeds, there is need to maximize the packet data movement...

chapter

Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications

Peixin Qiao, Xin Wang, Xu Yang, Yuping Fan, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 641 - 642

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Among the high-radix and low-diameter networks, fat-tree topology is commonly used in HPC and datacenter systems. Resource and job management is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance. In this work we study interference...

chapter

A Preliminary Study of Intra-Application Interference on Dragonfly Network

Xin Wang, Xu Yang, Misbah Mubarak, Robert B. Ross, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 643 - 644

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Dragonfly network is widely used in modern high-performance computing systems. On this network, however, interference caused by network sharing can lead to significant network congestion and degraded performance. In this work, we present a comparative analysis of intra-application interference on applications with nearest neighbor communication, considering various placement strategies. Our results...

chapter

A New Direction for Streaming Graph Analysis

Eisha Nathan, E. Jason Riedy, Anita Zakrzewska, Chunxing Yin

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 645 - 646

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Applications in computer network security, social media analysis, and other areas rely on analyzing a changing environment. The data is rich in relationships and lends itself to graph analysis. Traditional static graph analysis cannot keep pace with network security applications analyzing nearly one million events per second and social networks like Facebook collecting 500 thousand comments per second...

chapter

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems

Zachary W. Parchman, Ferrol Aderholdt, Manjunath Gorentla Venkata

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 647 - 648

2017 IEEE International Conference on Cluster Computing (CLUSTER)

A high-performing distributed hash is critical for achieving performance in many applications and system software using extreme-scale systems. It is also a central part of many Big-Data frameworks including Memcached, file systems, and job schedulers. However, there is a lack of high-performing distributed hash implementations. In this work, we propose, design, and implement, SharP Hash, a high-performing,...

chapter

AMM: Scalable Memory Reuse Model to Predict the Performance of Physics Codes

Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz, Sunil Thulasidasan

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 649 - 650

2017 IEEE International Conference on Cluster Computing (CLUSTER)

As the US Department of Energy (DOE) invests in exascale computing, scalable performance modeling of physics codes on CPUs remains a hard challenge in computational codesign due to advanced design features of processors such as the memory hierarchy, instruction pipelining, and speculative execution. Reuse distance is a powerful (but unscalable) characteristic that helps to predict cache hit-rates...

chapter

A Probabilistic Monte Carlo Framework for Branch Prediction

Bhargava Kalla, Nandakishore Santhi, Abdel-Hameed A. Badawy, Gopinath Chennupati, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 651 - 652

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Branch prediction is crucial in improving the throughput of microprocessors. It reduces branching stalls in the pipeline, which helps to maintain the instruction execution flow. Of these instructions, conditional branches are non-trivial in determining the microprocessor performance and throughput. Modern microprocessors accurately predict the branches using advanced branch prediction techniques....

INFONA - science communication portal

2017 IEEE International Conference on Cluster Computing (CLUSTER)

A Novel Hybrid Transactional Memory Based on Abort Prediction and Adaptive Retry Policy

Mitigating the Write Amplification Problem of Write-Optimized File Systems on Flash Storage

Acceleration of Turbulent Flow Simulations with Intel Xeon Phi(TM) Manycore Processors

Distributed Parallel Backprojection for Real-Time Stripmap SAR Imaging on GPU Clusters

Parallelized Recovery of Hundreds of Millions Small Data Objects

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

A Comparative Study of HDD and SSD RAIDs’ Impact on Server Energy Consumption

Analyzing Hybrid Transactional Memory Performance Using Intel SDE

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

A Power-Efficient Accelerator for Convolutional Neural Networks

Investigating the Effect of Garbage Collection on Service Level Objectives of Clouds

Performance of Large-Scale Electronic Structure Calculations on Built-in FPGA Systems

Efficient Swap Protocol of Remote Memory Paging for Out-of-Core Multi-thread Applications

Evaluating Effect of Write Combining on PCIe Throughput to Improve HPC Interconnect Performance

Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications

A Preliminary Study of Intra-Application Interference on Dragonfly Network

A New Direction for Streaming Graph Analysis

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems

AMM: Scalable Memory Reuse Model to Predict the Performance of Physics Codes

A Probabilistic Monte Carlo Framework for Branch Prediction

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE International Conference on Cluster Computing (CLUSTER) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE International Conference on Cluster Computing (CLUSTER)