Search results

Items from 1 to 20 out of 547 results

chapter

Optimizing memory efficiency for convolution kernels on kepler GPUs

Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating...

chapter

Satellite image processing on parallel computing: A technical review

Snehal B. Buche, Shweta A. Dhondse, Anand N. Khobragade

2016 Online International Conference on Green Engineering and Technologies (IC-GET) > 1 - 9

2016 Online International Conference on Green Engineering and Technologies (IC-GET)

Image classification is one the important processing done on satellite images. Many algorithm are proposed for such classification of which Support Vector Machine (SVM) is mostly used. Many variants and approaches of SVM are proposed of which GA based classifiers shows better prospects. But increasing size, spectrum and multiple dimension of remote sensing data has made image processing problem more...

chapter

A GPU Based SVM Method with Accelerated Kernel Matrix Calculation

Bo Yan, Yitian Ren, Zijiang Yang

2015 IEEE International Congress on Big Data > 41 - 46

2015 IEEE International Congress on Big Data (BigData Congress)

Support vector machine (SVM) is a popular classifier dealing with small-scale datasets. It has outstanding performance compared to other classifiers. However the execution time is extremely long when training Big Data. The Graphics Processing Unit (GPU) is a massively parallel device which performs very well as a co-processor. NVIDIA proposed a programming platform, CUDA, in 2006, which makes it much...

chapter

Analysis and realization of Relaxed Consistency Memory model for multi-core CPU or GPU

Ramanarayan Mohanty, Dipti Prakash Behera, Aurobinda Routray

2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence) > 866 - 870

2014 5th International Conference- Confluence The Next Generation Information Technology Summit

Parallel and distributed systems that support the shared memory paradigm are becoming widely accepted in many areas of computing. The memory consistency model of a shared-memory multiprocessor system influences both the performance and the programmability of the system. Under optimal condition it is found that multithreading contributes to more than 50 percent of performance improvement, while the...

chapter

An effective beamforming algorithm for a GPU-based ultrasound imaging system

Jiwon Kwon, Jae Hee Song, Sua Bae, Tai-kyoung Song, more

2012 IEEE International Ultrasonics Symposium > 619 - 622

2012 International Ultrasonics Symposium

In this paper, four beamforming algorithms (i.e., interpolation and phase rotation with pre- and post-filtering, IBF-PRE, IBF-POST, PRBF-PRE and PRBF-POST, respectively) implemented on a high-performance graphics-processing unit (GPU) were presented. Each beamforming method was divided into two kernels consisting of various beamforming and mid-processing blocks and efficiently implemented on a NVIDIA's...

chapter

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, more

2012 IEEE International Conference on Cluster Computing > 468 - 476

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this...

chapter

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms

Omer Erdil Albayrak, Ismail Akturk, Ozcan Ozturk

2012 41st International Conference on Parallel Processing Workshops > 81 - 88

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Many core accelerators are being deployed in many systems to improve the processing capabilities. In such systems, application mapping need to be enhanced to maximize the utilization of the underlying architecture. Especially in GPUs mapping becomes critical for multi-kernel applications as kernels may exhibit different characteristics. While some of the kernels run faster on GPU, others may refer...

chapter

Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations

Mei Wen, Huayou Su, Wenjie Wei, Nan Wu, more

2012 IEEE International Conference on Cluster Computing > 27 - 35

2012 IEEE International Conference on Cluster Computing (CLUSTER)

In cutting-edge CPU/GPU hybrid clusters, such as Tianhe-1A, the aggregate CPU computing capability may amount to up to 1/3 of the aggregate GPU computing capability. It thus goes without saying that the CPUs and GPUs should jointly carry out the computational work. However, to effectively and simultaneously use both the hardware components requires great care when developing the parallel implementations...

chapter

MRF Satellite Image Classification on GPU

Pedro Valero-Lara

2012 41st International Conference on Parallel Processing Workshops > 149 - 156

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

One of the stages of the analysis of satellite images is given by a classification based on the Markov Random Fields (MRF) method. It is possible to find in literature several packages to carry out this analysis, and of course the classification tasks. One of them is the Orfeo Tool Box (OTB). The analysis of satellite images is an expensive computational task requiring real time execution or automatization...

chapter

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Han Dong, Dibyajyoti Ghosh, Fahad Zafar, Shujia Zhou

2012 41st International Conference on Parallel Processing Workshops > 126 - 134

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...

chapter

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications

Ziming Zhong, Vladimir Rychkov, Alexey Lastovetsky

2012 IEEE International Conference on Cluster Computing > 191 - 199

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the aspect of efficient utilisation of the heterogeneous hardware and existing optimised software. During recent years, scientific software has been ported to multicore and GPU architectures and now should be reused on hybrid platforms. In this paper, we model the performance of such scientific applications in order...

chapter

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich, Felix Schmitt, Rene Widera, Michael Bussmann

2012 41st International Conference on Parallel Processing Workshops > 414 - 423

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...

chapter

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

Paul Sathre, Mark Gardner, Wu-Chun Feng

2012 41st International Conference on Parallel Processing Workshops > 89 - 96

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

The use of accelerators in high-performance computing is increasing. The most commonly used accelerator is the graphics processing unit (GPU) because of its low cost and massively parallel performance. The two most common programming environments for GPU accelerators are CUDA and OpenCL. While CUDA runs natively only on NVIDIA GPUs, OpenCL is an open standard that can run on a variety of hardware...

chapter

EMA: Turning Multiple Address Spaces Transparent to CUDA Programming

Kun Tang, Yulong Yu, Yuxin Wang, Yong Zhou, more

2012 Seventh ChinaGrid Annual Conference > 170 - 175

2012 Seventh ChinaGrid Annual Conference (ChinaGrid)

CUDA performs general purpose parallel computing using GPGPU, which has been applied to various computing fields. However, the multi-address-space architecture in CUDA makes memory management complicated. NVIDIA introduced UVA, Unified Virtual Addressing, into CUDA Toolkit 4.0 to address this issue. However, UVA has platform limitations and even performance loss under certain circumstances. We propose...

chapter

Accelerating Boosting-Based Face Detection on GPUs

David Oro, Carles Fern'ndez, Carlos Segura, Xavier Martorell, more

2012 41st International Conference on Parallel Processing > 309 - 318

2012 41st International Conference on Parallel Processing (ICPP)

The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core...

chapter

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

Yi Yang, Ping Xiang, Mike Mantor, Huiyang Zhou

2012 41st International Conference on Parallel Processing > 329 - 339

2012 41st International Conference on Parallel Processing (ICPP)

Given the extraordinary computational power of modern graphics processing units (GPUs), general purpose computation on GPUs (GPGPU) has become an increasingly important platform for high performance computing. To better understand how well the GPU resource has been utilized by application developers and then to facilitate them to develop high performance GPGPU code, we conduct an empirical study on...

chapter

A Dynamic Accelerator-Cluster Architecture

Sebastian Rinke, Daniel Becker, Thomas Lippert, Suraj Prabhakaran, more

2012 41st International Conference on Parallel Processing Workshops > 357 - 366

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Accelerators such as graphics processing units (GPUs) provide an inexpensive way of improving the performance of cluster systems. In such an arrangement, the individual nodes of the cluster are directly connected to one or more accelerator devices via PCI Express. This results in a static mapping of accelerators onto compute nodes, where each accelerator can only be accessed from exactly one compute...

chapter

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

Vlad Slavici, Raghu Varier, Gene Cooperman, Robert J. Harrison

2012 IEEE International Conference on Cluster Computing > 1 - 9

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Graphics Processing Units (GPUs) are becoming the workhorse of scalable computations. MADNESS is a scientific framework used especially for computational chemistry. Most MADNESS applications use operators that involve many small tensor computations, resulting in a less regular organization of computations on GPUs. A single GPU kernel may have to multiply by hundreds of small square matrices (with...

chapter

A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem

N. Melab, I. Chakroun, M. Mezmaz, D. Tuyttens

2012 IEEE International Conference on Cluster Computing > 10 - 17

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Branch-and-Bound (B&B) algorithms are time-intensive tree-based exploration methods for solving to optimality combinatorial optimization problems. In this paper, we investigate the use of GPU computing as a major complementary way to speed up those methods. The focus is put on the bounding mechanism of B&B algorithms, which is the most time consuming part of their exploration process...

chapter

Autotuning Stencil-Based Computations on GPUs

Azamat Mametjanov, Daniel Lowell, Ching-Chen Ma, Boyana Norris

2012 IEEE International Conference on Cluster Computing > 266 - 274

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define...

Keywords:
KERNEL
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Content availability

Available (546)
None (1)

Keywords

INSTRUCTION SETS (291)
GPU (180)
COPROCESSORS (158)
CUDA (138)
COMPUTER GRAPHIC EQUIPMENT (133)
COMPUTATIONAL MODELING (101)
PARALLEL PROCESSING (98)
COMPUTER ARCHITECTURE (97)
GPGPU (71)
OPTIMIZATION (68)
HARDWARE (60)
ARRAYS (58)
PROGRAMMING (55)
PERFORMANCE EVALUATION (46)
MEMORY MANAGEMENT (45)
ACCELERATION (41)
MATHEMATICAL MODEL (40)
ALGORITHM DESIGN AND ANALYSIS (37)
GRAPHICS PROCESSING UNITS (37)
OPENCL (35)
COMPUTE UNIFIED DEVICE ARCHITECTURE (34)
LIBRARIES (33)
SYNCHRONIZATION (33)
PARALLEL ARCHITECTURES (32)
VECTORS (32)
REGISTERS (31)
CENTRAL PROCESSING UNIT (29)
COMPUTER GRAPHICS (29)
SPARSE MATRICES (29)
INDEXES (28)
PIXEL (28)
EQUATIONS (25)
MULTIPROCESSING SYSTEMS (25)
BANDWIDTH (24)
BENCHMARK TESTING (24)
PARALLEL ALGORITHMS (24)
PARALLEL PROGRAMMING (24)
PARALLEL COMPUTING (22)
HIGH PERFORMANCE COMPUTING (19)
MULTICORE PROCESSING (19)
OPTIMISATION (19)
RUNTIME (19)
CONVOLUTION (18)
YARN (18)
GRAPHICS (17)
THROUGHPUT (17)
IMAGE PROCESSING (16)
REAL TIME SYSTEMS (16)
FIELD PROGRAMMABLE GATE ARRAYS (15)
OPENMP (15)
THREE DIMENSIONAL DISPLAYS (15)
CPU (14)
GENETIC ALGORITHMS (14)
ENCODING (13)
FEATURE EXTRACTION (13)
GPU COMPUTING (13)
GRAPHIC PROCESSING UNIT (13)
RANDOM ACCESS MEMORY (13)
ACCURACY (12)
DATABASES (12)
IMAGE COLOR ANALYSIS (12)
MEDICAL IMAGE PROCESSING (12)
MPI (12)
TILES (12)
CONTEXT (11)
EDUCATIONAL INSTITUTIONS (11)
IMAGE RECONSTRUCTION (11)
ITERATIVE METHODS (11)
JACOBIAN MATRICES (11)
LAYOUT (11)
MATRIX MULTIPLICATION (11)
SERVERS (11)
BIOINFORMATICS (10)
CLUSTERING ALGORITHMS (10)
DATA STRUCTURES (10)
INTERPOLATION (10)
LATTICES (10)
LINEAR ALGEBRA (10)
MESSAGE SYSTEMS (10)
NVIDIA (10)
PERFORMANCE (10)
TRAINING (10)
ULTRASONIC IMAGING (10)
APPLICATION PROGRAM INTERFACES (9)
CLOCKS (9)
ENERGY CONSUMPTION (9)
EVOLUTIONARY COMPUTATION (9)
PIPELINES (9)
POLYNOMIALS (9)
PROTEINS (9)
BIOLOGY COMPUTING (8)
COMPUTERS (8)
DECODING (8)
ENERGY EFFICIENCY (8)
FAST FOURIER TRANSFORMS (8)
GENERATORS (8)
GPUS (8)
HETEROGENEOUS COMPUTING (8)
more

INFONA - science communication portal

Search results

Optimizing memory efficiency for convolution kernels on kepler GPUs

Satellite image processing on parallel computing: A technical review

A GPU Based SVM Method with Accelerated Kernel Matrix Calculation

Analysis and realization of Relaxed Consistency Memory model for multi-core CPU or GPU

An effective beamforming algorithm for a GPU-based ultrasound imaging system

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms

Using 1000+ GPUs and 10000+ CPUs for Sedimentary Basin Simulations

MRF Satellite Image Classification on GPU

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications

Phase-Based Profiling in GPGPU Kernels

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

EMA: Turning Multiple Address Spaces Transparent to CUDA Programming

Accelerating Boosting-Based Face Detection on GPUs

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

A Dynamic Accelerator-Cluster Architecture

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem

Autotuning Stencil-Based Computations on GPUs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options