2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Items from 1 to 11 out of 11 results

chapter

Leveraging GPUs in Ab Initio Nuclear Physics Calculations

Dossay Oryspayev, Hugh Potter, Pieter Maris, Masha Sosonkina, more

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1365 - 1372

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

This paper describes initial steps to leverage accelerators, such as GPUs, in ab initio nuclear physics calculations. Specifically, parallel nuclear structure calculations performed by the MFDn package are considered with selected stages adapted for GPUs. This paper outlines the necessary steps to make MFDnutilize GPUs in its matrix construction stage. The experiments are presented to compare the...

chapter

GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect

Roberto Ammendola, Massimo Bernaschi, Andrea Biagioni, Mauro Bisson, more

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 806 - 815

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer...

chapter

Acceleration of a High Order Finite-Difference WENO Scheme for Large-Scale Cosmological Simulations on GPU

Chen Meng, Long Wang, Zongyan Cao, Xianfeng Ye, more

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 2071 - 2078

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

In this work, we present our implementation of a three-dimensional 5th order finite-difference weighted essentially non-oscillatory (WENO) scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain along each of...

chapter

Toward a Generic Hybrid CPU-GPU Parallelization of Divide-and-Conquer Algorithms

Alejandro Lopez-Ortiz, Alejandro Salinger, Robert Suderman

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 601 - 610

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

The increasing power and decreasing cost of Graphic Processing Units (GPUs) together with the development of programming languages for General Purpose Computing on GPUs (GPGPU) have led to the development and implementation of fast parallel algorithms for this architecture for a large spectrum of applications. Given the streaming-processing characteristics of GPUs, most practical applications so far...

chapter

A Generic Vectorization Scheme and a GPU Kernel for the Phylogenetic Likelihood Library

Fernando Izquierdo-Carrasco, Nikolaos Alachiotis, Simon Berger, Tomas Flouri, more

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 530 - 538

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Highly optimized library implementations for important scientific kernels can improve scientific productivity. To this end, we are currently developing the Phylogenetic Likelihood Library (PLL) that implements functions to compute and optimize the phylogenetic likelihood score on evolutionary trees. Here, we focus on novel techniques to orchestrate likelihood computations on large vector-like processors...

chapter

The Hierarchical Memory Machine Model for GPUs

Koji Nakano

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 591 - 600

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory access and the global memory access of GPUs. The main contribution of this paper is to introduce the Hierarchical Memory Machine (HMM), which consists of multiple DMMs and a single UMM. The HMM is a more practical parallel computing model which...

chapter

Toward Automatic Optimized Code Generation for Multiprecision Modular Exponentiation on a GPU

Niall Emmart, Charles Weems

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1700 - 1707

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Multiprocessing modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve a significant number of simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted...

chapter

Adding GPU Computing to Computer Organization Courses

David Bunde, Karen L. Karavanic, Jens Mache, Christopher T. Mitchell

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1275 - 1282

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

How can parallel computing topics be incorporated into core courses that are taken by the majority of undergraduate students? This paper reports our experiences adding GPU computing with CUDA into the core undergraduate computer organization course at two different colleges. We have found that even though programming in CUDA is not necessarily easy, programmer control and performance impact seem to...

chapter

On the Optimality and Speed of the Deep Greedy Switching Algorithm for Linear Assignment Problems

Amgad Naiem, Mohammed El-Beltagy

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1828 - 1837

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

The Deep Greedy Switching algorithm is a fast heuristic for solving large instances of the linear sum assignment problem whilst sacrificing very little in terms of optimality. In this paper we explore the worst case performance aspects of the algorithm. We prove that the algorithm is finite and analyze its computational complexity. We also discuss a number of simplified variations of the algorithm...

chapter

High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU

Nhat-Phuong Tran, Myungho Lee, Sugwon Hong, Jaeyoung Choi

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 1807 - 1816

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm...

chapter

Performance and Power Simulation for Versatile GPGPU Global Memory

Bin Wang, Weikuan Yu

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum > 2254 - 2257

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Many core architectural computing devices such as Graphic Processing Unit (GPU) are becoming increasingly popular in scientific computing because of their performance advantages. The scaling trends of processor and memory technologies demand more innovations in memory and processor architectures, e.g., the need for new architectural techniques to leverage Non-Volatile Random Access Memories (NVRAM)...

Filter options

Keywords:
GPU

Publication date

Set your own date range

Keywords

CUDA (3)
PARALLEL COMPUTING (2)
PARALLELIZATION (2)
3D (1)
AB INITIO NUCLEAR STRUCTURE CALCULATION (1)
AHO-CORASICK ALGORITHM (1)
AUCTION ALGORITHM (1)
COMBINATORIAL OPTIMIZATION (1)
COMPUTER SCIENCE EDUCATION (1)
CONVOLUTION (1)
COSMOLOGICAL HYDRODYNAMIC (1)
DEEP GREEDY SWITCHING (1)
DIVIDE-AND-CONQUER (1)
DOUBLE PRECISION (1)
HETEROGENEOUS ARCHITECTURES (1)
HEURISTICS (1)
HYBRID ALGORITHMS (1)
INTERCONNECTION NETWORK (1)
LINEAR PROGRAMMING (1)
LINEAR SUM ASSIGNMENT PROBLEMS (1)
MAXIMUM LIKELIHOOD (1)
MEMORY MACHINE MODELS (1)
MODULAR EXPONENTIATION (1)
MULTI-CORE (1)
MULTITHREADED EXECUTION (1)
NVRAM (1)
OPENCL (1)
PARALLEL ALGORITHMS (1)
PARALLEL COMPUTING MODELS (1)
PEER-TO-PEER (1)
PERFORMANCE MODELING (1)
PHYLOGENETICS (1)
PTX CODE GENERATION (1)
RSA (1)
SHARED-MEMORY BANK CONFLICT (1)
SIMULATOR (1)
VECTOR INTRINSICS (1)
WENO (1)
more

INFONA - science communication portal

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)