2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

chapter

Enabling One-Sided Communication Semantics on ARM

Pavel Shamis, M. Graham Lopez, Gilad Shainer

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 805 - 813

In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...

chapter

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

Jawwad Ahmed Shamsi

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 367 - 374

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant...

chapter

Graph Analytics: Complexity, Scalability, and Architectures

Peter M. Kogge

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1039 - 1047

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Big Data as expressed as "Big Graphs" are growing in importance. Looking forward, there is also increasing interest in streaming versions of the associated analytics. This paper develops an initial template for the relationship between "traditional" batch graph problems, and streaming forms. Variations of streaming problems are discussed, along with their relationship to existing...

chapter

Comparative Performance and Optimization of Chapel in Modern Manycore Architectures

Engin Kayraklioglu, Wo Chang, Tarek El-Ghazawi

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1105 - 1114

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Chapel is an emerging scalable, productive parallel programming language. In this work, we analyze Chapel's performance using The Parallel Research Kernels on two different manycore architectures including a state-of-the-art Intel Knights Landing processor. We discuss implementation techniques in Chapel and their relation to the OpenMP implementations of the PRK. We also suggest and prototype several...

chapter

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

Debarshi Dutta, Meher Chaitanya, Kishore Kothapalli, Debajyoti Bera

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 864 - 873

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Graph algorithms play an important role in several fields of sciences and engineering. Prominent among them are the All-Pairs-Shortest-Paths (APSP) and related problems. Indeed there are several efficient implementations for such problems on a variety of modern multi- and many-core architectures. It can be noticed that for several graph problems, parallelism offers only a limited success as current...

chapter

ReEP: A Toolset for Generation and Programming of Reconfigurable Datapaths for Event Processing

Philip Gottschling, Christian Hochberger

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 141 - 149

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Reconfigurable datapaths can be used to implement multiple applications on the same hardware. Switching between applications can be realized by loading new configuration information into the datapath. In this contribution, we want to use such datapaths for high frequency event processing. We have developed the toolset ReEP, which takes multiple problem descriptions and superposes them into one reconfigurable...

chapter

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Nitin A. Gawande, Joshua B. Landwehr, Jeff A. Daily, Nathan R. Tallent, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 399 - 408

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...

chapter

Out-of-Order Execution of Buffered Function Units in Exposed Data Path Architectures

Tripti Jain, Klaus Schneider, Frederik Walk

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 229 - 234

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Some of the newer processor architectures are no longer based on registers in order to increase their potential of instruction-level parallelism. Instead, they expose their data paths to the compiler so that the program is able to directly move data values between function units using suitable instructions. Some of these architectures require a synchronous transfer of data values while others use...

chapter

Exploiting FPGAs from Higher Level Languages A Signal Analysis Case Study

L. Stornaiuolo, A. Parravicini, G. Durelli, M. D. Santambrogio

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 132 - 140

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Field Programmable Gate Arrays (FPGAs) are usually perceived as difficult to exploit due to the High Level of expertise required to program them. In the last years, the major FPGAs vendors have produced different High Level Synthesis (HLS) tools to help programmers during the flow of acceleration of their algorithms through the hardware architecture. However, these tools often use languages considered...

chapter

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

chapter

Automating Compiler-Directed Autotuning for Phased Performance Behavior

Tharindu Rusira, Mary Hall, Protonu Basu

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1362 - 1371

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We describe an integration of the CHiLL compiler with OpenTuner to reduce the programmer burden in using autotuning. We use as a case study optimizing the smooth operator and its associated stencil computations in the context of Geometric Multigrid (GMG), a hierarchical linear solver that operates in multiple grid resolutions (levels). Smooth is the most performance-critical operation that runs multiple...

chapter

Dynamic Dual Fixed-Point CORDIC Implementation

Andres Jacoby, Daniel Llamocca

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 235 - 240

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We introduce Dynamic Dual Fixed Point (DDFX) CORDIC, that relies on run-time alteration of the numerical format of the Dual Fixed Point (DFX) CORDIC hardware. This allows for enhanced dynamic range and accuracy. Fixed Point, Dual Fixed Point, Floating Point, and Dynamic Dual Fixed Point CORDIC units are compared in terms of resources and accuracy. Results show that the hardware/software approach achieves...

chapter

Feasibility Study of Real-Time Spiking Neural Network Simulations on a Swarm Intelligence Based Digital Architecture

Francesca Palumbo, Carlo Sau, Danilo Pani, Paolo Meloni, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 247 - 250

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Nature has proved to be a source of inspiration for engineering solutions. Spiking Neural Networks are exemplary from this perspective, due to the possibility to exploit them not only to simulate the biological networks of neurons but also to effectively work as classifiers and artificial intelligence systems. Another interesting nature-inspired paradigm is Swarm Intelligence, mainly applied to optimization...

chapter

Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

Luka Stanisic, Lucas Mello Schnorr, Augustin Degomme, Franz C. Heinrich, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1588 - 1597

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS, ) to cope with the evolution and complexity of network technology. Although the network has received a lot of attention,...

chapter

Simultaneously Solving Swarms of Small Sparse Systems on SIMD Silicon

Bryce Adelstein Lelbach, Hans Johansen, Samuel Williams

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1128 - 1137

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix...

chapter

Exploring Translation of OpenMP to OpenACC 2.5: Lessons Learned

Sergio Pino, Lori Pollock, Sunita Chandrasekaran

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 673 - 682

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Scientists who want to exploit the computing power of the latest parallel architectures are faced with a diverse set of architectures and a number of programming languages, models and approaches. Among several such programming techniques are directive-based programming models, OpenMP and OpenACC. This paper explores the similarities and the functionality gaps between both models and presents insights...

chapter

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications

Kavitha Chandrasekar, Xiang Ni, Laxmikant V. Kale

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1293 - 1300

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth...

chapter

Power Analysis of HLS-Designed Customized Instruction Set Architectures

Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 207 - 212

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Performance and power consumption are key features for evaluating any processor design. In this paper, we present close attention to the impact on power and energy consumption of customized Instruction SetArchitecture (ISA) designed by means of High Level Synthesis (HLS) tools. We compare these results against a full ISA soft processor, Microblaze. Our customized ISA processors greatly reduce the...

chapter

FAReP: Fragmentation-Aware Replacement Policy for Task Reuse on Reconfigurable FPGAs

Godwin Enemali, Adewale Adetomi, Tughrul Arslan

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 202 - 206

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The use of reconfigurable chips such as FPGAs in embedded systems for many runtime applications is limited by large reconfiguration time. Techniques to circumvent this limitation relies on hardware task reuse which preserve certain circuits on the chip. However, the frequent addition and removal of circuits while preserving others on the chip will inevitably lead to fragmentation of its area, in an...

chapter

Pearson Correlation Coefficient Acceleration for Modeling and Mapping of Neural Interconnections

Enrico Reggiani, Eleonora DArnese, Andrea Purgato, Marco D. Santambrogio

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 223 - 228

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Thanks to the availability of new biomedical technologies and analysis methodologies, the quality of clinical exams and medical research is increasing. These improvements have given the opportunity to analyze large amount of data with an higher level of accuracy. Therefore, processors able to handle compute intensive algorithms and large datasets are needed, and the use of homogeneous processors is...

INFONA - science communication portal

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Enabling One-Sided Communication Semantics on ARM

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

Graph Analytics: Complexity, Scalability, and Architectures

Comparative Performance and Optimization of Chapel in Modern Manycore Architectures

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

ReEP: A Toolset for Generation and Programming of Reconfigurable Datapaths for Event Processing

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Out-of-Order Execution of Buffered Function Units in Exposed Data Path Architectures

Exploiting FPGAs from Higher Level Languages A Signal Analysis Case Study

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Automating Compiler-Directed Autotuning for Phased Performance Behavior

Dynamic Dual Fixed-Point CORDIC Implementation

Feasibility Study of Real-Time Spiking Neural Network Simulations on a Swarm Intelligence Based Digital Architecture

Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

Simultaneously Solving Swarms of Small Sparse Systems on SIMD Silicon

Exploring Translation of OpenMP to OpenACC 2.5: Lessons Learned

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications

Power Analysis of HLS-Designed Customized Instruction Set Architectures

FAReP: Fragmentation-Aware Replacement Policy for Task Reuse on Reconfigurable FPGAs

Pearson Correlation Coefficient Acceleration for Modeling and Mapping of Neural Interconnections

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)