Search results

Items from 1 to 15 out of 15 results

chapter

GPU-Accelerated Solution of Activated Sludge Model's System of ODEs with a High Degree of Stiffness

Jamal Alikhania, Arash Massoudiehb, Ujjal K. Bhowmika

2016 International Conference on Computational Science and Computational Intelligence (CSCI) > 555 - 560

2016 International Conference on Computational Science and Computational Intelligence (CSCI)

Simulation of activated sludge model (ASM) including detailed biokinetic reaction network often requires the solution of a large system of ordinary differential equations (ODEs) at each time frame, which requires long computing times. In this study, an adaptive time step backward differentiation formula (BDF) is proposed to solve the ASM's system of ODEs that mainly contains a high degree of stiffness...

chapter

Open ACC Programs Examined: A Performance Analysis Approach

Robert Dietrich, Guido Juckeland, Michael Wolfe

2015 44th International Conference on Parallel Processing > 310 - 319

2015 44th International Conference on Parallel Processing (ICPP)

The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...

chapter

Parallel implementation of low light level image enhancement using CUDA

Peiyi Shen, Liang Zhang, Juan Song, Xilu Peng, more

2015 IEEE International Conference on Information and Automation > 673 - 677

2015 IEEE International Conference on Information and Automation (ICIA)

Enhancement algorithms can make low light level images have a clear visual effect like the one captured during the daytime, but due to high complexity and generous computational cost, low light level image enhancement algorithms are usually difficult to meet real-time requirements which make it difficult to be widely used in practical application. For this situation, a parallel optimization algorithm...

chapter

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

chapter

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Felix Schmitt, Robert Dietrich, Guido Juckeland

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 908 - 915

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...

article

GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme

Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawal

IEEE Transactions on Parallel and Distributed Systems > 2014 > 25 > 1 > 104 - 115

In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. While languages like CUDA and OpenCL have eased GPU programming for nongraphical applications, they are still explicitly parallel languages. All parallel programmers, particularly the novices, need tools that can help ensuring the correctness of their programs. Like any multithreaded environment,...

chapter

Extending OpenSHMEM for GPU Computing

S. Potluri, D. Bureddy, H. Wang, H. Subramoni, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1001 - 1012

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...

chapter

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich, Felix Schmitt, Rene Widera, Michael Bussmann

2012 41st International Conference on Parallel Processing Workshops > 414 - 423

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...

chapter

Directive-based Programming for GPUs: A Comparative Study

Ruym'n Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 410 - 417

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...

chapter

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

Yongchao Liu, Bertil Schmidt

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 684 - 690

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...

chapter

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant V. Kale

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2404 - 2413

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism...

chapter

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

Jie Chen, Balint Joo, William Watson III, Robert Edwards

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2359 - 2368

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same...

chapter

Patterns of Inefficient Performance Behavior in GPU Applications

D Eschweiler, D Becker, F Wolf

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 262 - 266

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance...

chapter

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

José Duato, Antonio J Peña, F Silla, R Mayo, more

2010 International Conference on High Performance Computing&Simulation > 224 - 231

2010 International Conference on High Performance Computing & Simulation (HPCS 2010)

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable...

chapter

Speculative execution on multi-GPU systems

Gregory Diamos, Sudhakar Yalamanchili

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 12

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order...

Filter options

Keywords:
RUNTIME
KERNEL

Publication date

Set your own date range

Publication type

book (14)
article (1)

Keywords

GRAPHICS PROCESSING UNITS (8)
GRAPHICS PROCESSING UNIT (7)
INSTRUCTION SETS (6)
GPGPU (5)
GPU (5)
OPENCL (4)
PROGRAMMING (4)
ARRAYS (3)
COMPUTATIONAL MODELING (3)
COPROCESSORS (3)
LIBRARIES (3)
PERFORMANCE ANALYSIS (3)
ACCELERATOR (2)
ACCELERATORS (2)
COMPUTER GRAPHIC EQUIPMENT (2)
HIGH PERFORMANCE COMPUTING (2)
INSTRUMENTS (2)
PARALLEL PROCESSING (2)
PERFORMANCE EVALUATION (2)
PROFILING (2)
RADIATION DETECTORS (2)
SYNCHRONIZATION (2)
ACCELERATION (1)
ACTIVATED SLUDGE MODEL (1)
ADAPTIVE RUNTIME (1)
AGGLOMERATION (1)
ANALYSIS (1)
APPLICATION PARTITIONING (1)
ATMOSPHERIC MODELING (1)
BACKWARD DIFFERENTIATION FORMULA (1)
BENCHMARK TESTING (1)
BURROWS-WHEELER TRANSFORM (1)
C++ (1)
CACHE BYPASSING (1)
CACHE MEMORY (1)
CLUSTERS (1)
COLOR (1)
COMPILER (1)
COMPONENTS (1)
COMPUTATIONAL CAPABILITY (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER ARCHITECTURE (1)
CONCURRENCY (1)
CONTEXT (1)
CRITICAL PATH ANALYSIS (1)
CUDA COMPUTE ENGINE (1)
DATA RACE (1)
DRIVER CIRCUITS (1)
DYNAMIC PARALLELIZATION TECHNIQUES (1)
DYNAMIC SCHEDULING (1)
ELECTRONICS PACKAGING (1)
ENERGY CONSUMPTION (1)
ENERGY SAVING (1)
EQUATIONS (1)
EXPRESSION TEMPLATES (1)
GENERAL PURPOSE COMPUTING (1)
GENERAL-PURPOSE PROCESSORS (1)
GENOMICS (1)
GPU APPLICATIONS (1)
GPU-BASED ACCELERATORS (1)
GRAIN SIZE (1)
HARDWARE (1)
HARMONY EXECUTION MODEL (1)
HARMONY RUNTIME (1)
HETEROGENEOUS MANY-CORE PROCESSORS (1)
HETEROGENEOUS SYSTEM (1)
HIGH PERFORMANCE CLUSTERS (1)
IMAGE COLOR ANALYSIS (1)
IMAGE ENHANCEMENT (1)
INEFFICIENT PERFORMANCE BEHAVIOR PATTERN (1)
ISA (1)
JACOBIAN MATRICES (1)
JIT (1)
KERNEL LEVEL SPECULATION (1)
LATTICES (1)
LOW LIGHT LEVEL IMAGE ENHANCEMENT (1)
MAGNETIC CORES (1)
MANY-CORE (1)
MATHEMATICAL MODEL (1)
MEMORY MANAGEMENT (1)
MEMORY TRAFFIC (1)
MESSAGE SYSTEMS (1)
MICRO-ARCHITECTURE (1)
MPI (1)
MULTI-GPU SYSTEMS (1)
MULTIPROCESSING SYSTEMS (1)
MULTITHREADING (1)
NVIDIA FERMI ARCHITECTURE (1)
ODE (1)
OPEN ACC (1)
OPENACC (1)
OPENMP (1)
OPENSHMEM (1)
OPTIMIZATION (1)
PARALLEL ARCHITECTURES (1)
PARALLEL MATRIX INVERSION (1)
PARALLEL OPTIMIZATION (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options