Search results

Items from 141 to 160 out of 433 results

1 ...
5
6
7
8
9
10
11

chapter

Importance of GPGPUs in efficiency improvement of real world applications

Shreyas Bhatia, Minal Tolpadi, Akhtar Rasool

2014 IEEE Students' Conference on Electrical, Electronics and Computer Science > 1 - 6

2014 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

The changing times have caused the requirements to change, causing a revolution in the field of parallel computing. The emergence of parallel computing as a necessity has boosted the use of GPGPUs for this purpose. With such an emergence comes a drastic improvement in many real world applications of GPGPUs as well. In this paper we discuss about GPGPUs, their evolution, and their contribution to many...

chapter

GPU parallel implementation of the approximate K-SVD algorithm using OpenCL

Paul Irofti, Bogdan Dumitrescu

2014 22nd European Signal Processing Conference (EUSIPCO) > 271 - 275

2014 22nd European Signal Processing Conference (EUSIPCO)

Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. We investigate a parallel version of the approximate K-SVD algorithm, where multiple atoms are updated simultaneously, and implement it using OpenCL, for execution on graphics processing units (GPU). This not only allows reducing the...

chapter

Code generation from a domain-specific language for C-based HLS of hardware accelerators

Oliver Reiche, Moritz Schmid, Frank Hannig, Richard Membarth, more

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

As today's computer architectures are becoming more and more heterogeneous, a plethora of options including CPUs, GPUs, DSPs, reconfigurable logic (FPGAs), and other application-specific processors come into consideration for close-to-sensor processing. Especially, in the domain of image processing on mobile devices, among numerous design challenges, a very stringent energy budget is of utmost importance,...

chapter

Task mapping in heterogeneous embedded systems for fast completion time

Husheng Zhou, Cong Liu

2014 International Conference on Embedded Software (EMSOFT) > 1 - 10

2014 International Conference on Embedded Software (EMSOFT)

Graphics processing units are being widely used in embedded systems as they can achieve high performance and energy efficiency. In such systems, the problem of computation and data mapping for multiple applications while minimizing the completion time is quite challenging due to a large size of the policy space, including heterogeneous application characteristics, complex application structure, data...

chapter

VAST: The illusion of a large memory space for GPUs

Janghaeng Lee, Mehrzad Samadi, Scott Mahlke

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 443 - 454

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Heterogeneous systems equipped with traditional processors (CPUs) and graphics processing units (GPUs) have enabled processing large data sets. With new programming models, such as OpenCL and CUDA, programmers are encouraged to offload data parallel workloads to GPUs as much as possible in order to fully utilize the available resources. Unfortunately, offloading work is strictly limited by the size...

chapter

Data-reuse optimizations for pipelined tiling with parametric tile sizes

Alexandre Isoard

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 509 - 510

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Todays' hardware diversity exacerbates the need for optimizing compilers. A problem that arises when exploiting hardware accelerators (FPGA, GPU, dedicated boards) is how to automatically perform kernel/function offloading or outlining (as opposed to function inlining). The principle is to outsource part of the computation (the kernel to be performed on the accelerator) to a more efficient but more...

chapter

Nuclear Fusion Simulation Code Optimization on GPU Clusters

Norihisa Fujita, Hideo Nuga, Taisuke Boku, Yasuhiro Idomura

2013 International Conference on Parallel and Distributed Systems > 420 - 421

2013 International Conference on Parallel and Distributed Systems (ICPADS)

GT5D is a nuclear fusion simulation program which aims to analyze the turbulence phenomena in tokamak plasma. In this research, we optimize it for GPU clusters with multiple GPUs on a node. Based on the profile result of GT5D on a CPU node, we decide to offload the whole of the time development part of the program to GPUs except MPI communication. We achieved 3.37 times faster performance in maximum...

chapter

Parallel distributed breadth first search on GPU

Koji Ueno, Toyotaro Suzumura

20th Annual International Conference on High Performance Computing > 314 - 323

2013 20th International Conference on High Performance Computing (HiPC)

In this paper we propose a highly optimized parallel and distributed BFS on GPU for Graph500 benchmark. We evaluate the performance of our implementation using TSUBAME2.0 supercomputer. We achieve 317 GTEPS (billion traversed edges per second) with scale 35 (a large graph with 34.4 billion vertices and 550 billion edges) using 1366 nodes and 4096 GPUs. With this score, TSUBAME2.0 supercomputer is...

chapter

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Konstantinos Krommydas, Muhsen Owaida, Christos D. Antonopoulos, Nikolaos Bellas, more

2013 International Conference on Parallel and Distributed Systems > 432 - 433

2013 International Conference on Parallel and Distributed Systems (ICPADS)

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed...

chapter

Online Performance Projection for Clusters with Heterogeneous GPUs

Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, more

2013 International Conference on Parallel and Distributed Systems > 283 - 290

2013 International Conference on Parallel and Distributed Systems (ICPADS)

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases...

chapter

Achieving TeraCUPS on Longest Common Subsequence Problem Using GPGPUs

Adnan Ozsoy, Arun Chauhan, Martin Swany

2013 International Conference on Parallel and Distributed Systems > 69 - 77

2013 International Conference on Parallel and Distributed Systems (ICPADS)

In this paper, we describe a novel technique to optimize longest common subsequence (LCS) algorithm for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)-a first for LCS algorithms. The latter is more efficiently done...

chapter

The Study of Parallel Ortho-rectification Method of Line-Array Image Based on GPU

Yuxia Yang, Zhaohua Liu, Jingyu Yang

2013 International Conference on Computer Sciences and Applications > 615 - 618

2013 International Conference on Computer Sciences and Applications (CSA)

This paper first briefly introduces the principle of Ortho-Rectification of line-array image, then designed a parallel processing method based on GPU and proposes a shared memory optimizing strategy of POS data to avoid performance bottle-neck due frequently accessing data in global memory, at last do a system experiment using ADS40 image based on Tesla C2050 GPU and invalidate the parallel processing...

chapter

Defend GPUs against DoS attacks

Wei Zhang

2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC) > 1 - 2

2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC)

Graphics Processing Units (GPUs) have become a popular choice for general-purpose high-performance computing. Encryption and decryption algorithms such as the Advanced Encryption Standard (AES) have been implemented on GPUs to gain significant speedup. However, the security of the GPU architecture is not well studied, making it potentially risky to offload sensitive computation to GPUs. In this paper,...

chapter

High throughput low latency LDPC decoding on GPU for SDR systems

Guohui Wang, Michael Wu, Bei Yin, Joseph R. Cavallaro

2013 IEEE Global Conference on Signal and Information Processing > 1258 - 1261

2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In this paper, we present a high throughput and low latency LDPC (low-density parity-check) decoder implementation on GPUs (graphics processing units). The existing GPU-based LDPC decoder implementations suffer from low throughput and long latency, which prevent them from being used in practical SDR (software-defined radio) systems. To overcome this problem, we present optimization techniques for...

chapter

A Fast Runtime Visualization of a GPU-Based 3D-FDTD Electromagnetic Simulation

Kota Aoki, Keisuke Dohi, Yuichiro Shibata, Kiyoshi Oguri, more

2013 First International Symposium on Computing and Networking > 30 - 37

2013 First International Symposium on Computing and Networking (CANDAR)

In this paper, we present design and implementation of a fast runtime visualizer for a GPU-based 3D-FDTD electromagnetic simulation. We focus on improving the productivity of simulator development without compromising simulation performance. In order to keep the portability, we implemented a visualizer with the MVC model, where simulation kernels and visualization process were completely separated...

chapter

Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone

Guohui Wang, Blaine Rister, Joseph R. Cavallaro

2013 IEEE Global Conference on Signal and Information Processing > 759 - 762

2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent...

chapter

Method to accelerate prediction of membrane protein types by CUDA

Yukun Zhong, Liao Gang, M A LongFei, Zeng Yu

2013 IEEE International Conference on Bioinformatics and Biomedicine > 27 - 32

2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

This paper introduce a parallel computing method to improve the efficiency of prediction of membrane protein types by SVM. With early hardware limitations of the GPU(lack of synchronization primitives and limited memory caching mechanisms)can make GPU-based computation inefficient. We present this efficient method for prediction of membrane protein type for Intel(R) Core(TM) i3–3110m quad-core and...

chapter

CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs

Weiyan Wang, Yunquan Zhang, Long Guoping, Shengen Yan, more

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 93 - 100

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

Scale Invariance Feature Transform (SIFT) is quite suitable for image matching because of its invariance to image scaling, rotation and slight changes in illumination or viewpoint. However, due to high computation complexity it's technically challenging to deploy SIFT in real time application situations. To address this problem, we propose CLSIFT, an OpenCL based highly speeded up and performance...

chapter

GPU-Accelerated Parallel 3D Image Thinning

Bingfeng Hu, Xuan Yang

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 149 - 152

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

The skeletons of the objects in 3D images can be extracted by using 3D image thinning. The application of 3D image thinning for image analysis is hampered by its considerable computation time. By employing the graphics processing unit (GPU), which has tremendous powerful computing power at an incomparable performance-to-cost ratio, the calculation of 3D image thinning can be accelerated. In this paper,...

chapter

Breaking through memory limitation in GPU parallel processing using Strassen Algorithm

Pujianto Yugopuspito, Sutrisno, Robertus Hudi

2013 International Conference on Computer, Control, Informatics and Its Applications (IC3INA) > 201 - 205

2013 International Conference on Computer, Control, Informatics and Its Applications (IC3INA)

Matrix multiplication is one of the basic operations in linear algebra that mostly used in computer science. For ages, applying naive algorithm to complete it has done it, and it has a standard complexity O(n³). Many researches are concluded to find more efficient and effective algorithm to process this operation, and one day Strassen has one that overcome the naive algorithm complexity with only...

1 ...
5
6
7
8
9
10
11

Keywords:
KERNEL
GPU

Publication date

Set your own date range

Content availability

Available (431)
None (2)

Keywords

GRAPHICS PROCESSING UNITS (213)
INSTRUCTION SETS (204)
GRAPHICS PROCESSING UNIT (180)
CUDA (142)
COPROCESSORS (86)
COMPUTER ARCHITECTURE (83)
PARALLEL PROCESSING (82)
COMPUTER GRAPHIC EQUIPMENT (70)
COMPUTATIONAL MODELING (69)
HARDWARE (57)
OPTIMIZATION (56)
OPENCL (55)
PROGRAMMING (51)
ARRAYS (50)
ALGORITHM DESIGN AND ANALYSIS (49)
MEMORY MANAGEMENT (42)
ACCELERATION (41)
REGISTERS (31)
PERFORMANCE EVALUATION (30)
SPARSE MATRICES (27)
YARN (26)
PARALLEL COMPUTING (25)
PIXEL (25)
VECTORS (25)
GPGPU (24)
MATHEMATICAL MODEL (24)
BANDWIDTH (23)
COMPUTER GRAPHICS (22)
LIBRARIES (22)
THROUGHPUT (21)
COMPUTE UNIFIED DEVICE ARCHITECTURE (20)
BENCHMARK TESTING (19)
RUNTIME (19)
GRAPHICS (18)
PARALLEL ALGORITHMS (18)
CPU (17)
CENTRAL PROCESSING UNIT (16)
FIELD PROGRAMMABLE GATE ARRAYS (16)
PARALLEL (16)
EQUATIONS (15)
FPGA (15)
IMAGE PROCESSING (15)
INDEXES (15)
FEATURE EXTRACTION (13)
PARALLEL PROGRAMMING (13)
PERFORMANCE (13)
TRAINING (13)
OPENMP (12)
PARALLEL ARCHITECTURES (12)
CONVOLUTION (11)
HIGH PERFORMANCE COMPUTING (11)
SUPPORT VECTOR MACHINES (11)
CONTEXT (10)
GRAPHIC PROCESSING UNIT (10)
MULTICORE PROCESSING (10)
RANDOM ACCESS MEMORY (10)
RENDERING (COMPUTER GRAPHICS) (10)
IMAGE RECONSTRUCTION (9)
JACOBIAN MATRICES (9)
MATRIX MULTIPLICATION (9)
REAL-TIME SYSTEMS (9)
RESOURCE MANAGEMENT (9)
THREE DIMENSIONAL DISPLAYS (9)
VIDEO CODING (9)
ANALYTICAL MODELS (8)
CONFERENCES (8)
DATA MINING (8)
DATA STRUCTURES (8)
DATABASES (8)
ENCODING (8)
ENERGY EFFICIENCY (8)
LINEAR ALGEBRA (8)
MOTION ESTIMATION (8)
MULTIPROCESSING SYSTEMS (8)
NVIDIA (8)
PARALLEL ALGORITHM (8)
PROGRAM PROCESSORS (8)
SPMV (8)
SYNCHRONIZATION (8)
TILES (8)
TUNING (8)
ACCURACY (7)
APPROXIMATION ALGORITHMS (7)
COMPUTER VISION (7)
DECODING (7)
EDUCATIONAL INSTITUTIONS (7)
HIGH DEFINITION VIDEO (7)
HISTOGRAMS (7)
IMAGE COLOR ANALYSIS (7)
IMAGE SEGMENTATION (7)
ITERATIVE METHODS (7)
MPI (7)
OPTIMISATION (7)
PARTITIONING ALGORITHMS (7)
PIPELINES (7)
RADIATION DETECTORS (7)
SHAPE (7)
SIMD (7)
more

INFONA - science communication portal

Search results

Importance of GPGPUs in efficiency improvement of real world applications

GPU parallel implementation of the approximate K-SVD algorithm using OpenCL

Code generation from a domain-specific language for C-based HLS of hardware accelerators

Task mapping in heterogeneous embedded systems for fast completion time

VAST: The illusion of a large memory space for GPUs

Data-reuse optimizations for pipelined tiling with parametric tile sizes

Nuclear Fusion Simulation Code Optimization on GPU Clusters

Parallel distributed breadth first search on GPU

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

Online Performance Projection for Clusters with Heterogeneous GPUs

Achieving TeraCUPS on Longest Common Subsequence Problem Using GPGPUs

The Study of Parallel Ortho-rectification Method of Line-Array Image Based on GPU

Defend GPUs against DoS attacks

High throughput low latency LDPC decoding on GPU for SDR systems

A Fast Runtime Visualization of a GPU-Based 3D-FDTD Electromagnetic Simulation

Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone

Method to accelerate prediction of membrane protein types by CUDA

CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs

GPU-Accelerated Parallel 3D Image Thinning

Breaking through memory limitation in GPU parallel processing using Strassen Algorithm

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options