Search results

Items from 41 to 60 out of 433 results

chapter

Analysis of OpenCL Work-Group Reduce for Intel GPUs

Grigore Lupescu, Emil-Ioan Slusanschi, Nicolae Tapus

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 417 - 423

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

As hardware becomes more flexible in terms ofprogramming, software APIs must expose hardware features ina portable way. Additions in the OpenCL 2.0 API expose threadcommunication through the newly defined work-group functions. In this paper we focus on two implementations of the work-groupfunctions in the OpenCL compiler backend for Intel's GPUs. Wefirst describe the particularities of Intel's GEN...

chapter

An improved GPGPU-Accelerated parallelization for rotation invariant thinning algorithm

Weiguang Yang, Qi Jia, Hui Liu, Yihao Wu, more

2016 IEEE International Conference on Image Processing (ICIP) > 1784 - 1788

2016 IEEE International Conference on Image Processing (ICIP)

Document is unavailable: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.

chapter

Performance optimization for CPU-GPU heterogeneous parallel system

Yanhua Wang, Jianzhong Qiao, Shukuan Lin, Tinglei Zhao

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 1259 - 1266

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

With GPU (Graphics Processing Unit) taking part in general-purpose computing, a heterogeneous system usually achieves higher performance and efficiency. There are many studies on how to improve the performance of a heterogeneous system, among of which are a number of researches to achieve the goal by allocating workload into processors with different strategies. In the paper, we implement a task allocation...

chapter

Two Parallel Implementations of Ehrlich-Aberth Algorithm for Root-Finding of Polynomials on Multiple GPUs with OpenMP and MPI

Kahina Ghidouche, Abderrahmane Sider, Lilia Ziane Khodja, Raphael Couturier

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) > 270 - 277

Finding the roots of polynomials is a very important part of solving real-life problems but the higher the degree of the polynomials is, the less easy it becomes. In this paper, we present two different parallel algorithms of the Ehrlich-Aberth method to find roots of sparse and fully defined polynomials of high degrees. Both algorithms are based on CUDA technology to be implemented on multi-GPU computing...

chapter

Basic k-mer operations using massive parallel processing on heterogeneus architectures

Nelson Enrique Vera-Parra, Cristian Alejandro Rojas-Quintero, Jose Nelson Perez-Castillo

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) > 193 - 196

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS)

In this article is presented and assessed a massive parallel processing model for basic operations with k-mers from genomic sequences, based on defined functions in terms of N-dimensional spaces. The model is implemented using a set of OpenCL cores available at github.com/bioinfud/k-merscl and assessed using a heterogeneous platform CPU/GPU and a dataset based on randomly generated k-mers. The results...

chapter

A Statistical-Feature ML Approach to IP Traffic Classification Based on CUDA

Zhengyang Chen, Renjie Chen, Yu Zhang, Jianzhong Zhang, more

2016 IEEE Trustcom/BigDataSE/ISPA > 2235 - 2239

2016 IEEE Trustcom/BigDataSE/ISPA

In modern networks, there exist different applications which generate various different types of network traffic. In order to improve the performance of network management, it is important to identify and classify the internet traffic. The machine learning (ML) technique based on per-flow statistics has been widely used in traffic classification. Different from traditional classification methods,...

chapter

Parallel adaptive sparsity-constrained NMF algorithm for hyperspectral unmixing

Wenhong Wang, Yuntao Qian

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 6137 - 6140

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

Sparsity-constrained Nonnegative matrix factorization (NMF) has been proved to be an effective method for hyperspectral unmixing. However, the optimization procedure of sparsity-constrained NMF is computational demanding, which may limit its application in time-constrained conditions. In this paper, a parallel L_1/2 sparsity-constrained NMF unmixing method on Graphics Processing Units (GPUs) is proposed,...

chapter

Atomic-free optimization on GPU based SAR raw data simulation

Xiaojie Yao, Chen Hu, Fan Zhang, Wei Hu, more

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 645 - 648

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

Synthetic Aperture Radar (SAR) has been widely used in airborne remote sensing and satellite ocean observation fields to reduce the affect of weather condition and sun illumination. As technology developed, swath and resolution requirements are increased in terrain, which result in a huge increase in echo data and simulated time[1]. With the development of graphics processing unit (GPU), it can reduce...

chapter

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

Jun Zhu, Haifeng Yao, Tao Yang, Qiaomei Zhou, more

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA) > 2092 - 2095

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA)

The calculation of small-scale data is commonly used in scientific computing and application domain, and the high-efficiency method of small calculation can give play to the potency of many calculation and application. In this paper, a novel self-adaptive parallel computing method based on the graphics processing unit (GPU) architecture for batches of small scale computing tasks is proposed herein...

chapter

GPU accelerated high-quality video/image super-resolution

Zhangzong Zhao, Li Song, Rong Xie, Xiaokang Yang

2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) > 1 - 4

2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

This paper presents several novel GPU optimization technologies to accelerate the SRCNN(Super-Resolution Convolutional Neural Network) — one of the best super-resolution algorithm. We first directly parallelize and implement the SRCNN, then accelerate the convolution by making use of the hierarchical feature of GPU memory. We explore different optimization methods on each convolution and select the...

chapter

GPUShare: Fair-Sharing Middleware for GPU Clouds

Anshuman Goswami, Jeffrey Young, Karsten Schwan, Naila Farooqui, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1769 - 1776

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Many new cloud-focused applications such as deeplearning and graph analytics have started to rely on the highcomputing throughput of GPUs, but cloud providers cannotcurrently support fine-grained time-sharing on GPUs to enablemulti-tenancy for these types of applications. Currently, schedulingis performed by the GPU driver in combination with ahardware thread dispatcher to maximize utilization. However,...

chapter

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

P. B. Jayaraj, K. Rahamathulla, G. Gopakumar

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 580 - 588

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The maximum common subgraph of two graphs, G1 and G2, is the largest subgraph in G1 that is isomorphic to a subgraph in G2. Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem....

chapter

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Ryan Eberhardt, Mark Hoemmen

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 663 - 672

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block substructure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. Previous research on SpMV for matrices with dense block substructure has largely focused on the design of novel data structures...

chapter

Counting Triangles in Large Graphs on GPU

Adam Polak

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 740 - 746

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the...

chapter

Performance of parallel ChaCha20 stream cipher

Radu Velea, Florina Gurzau, Laurentiu Margarit, Ion Bica, more

2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI) > 391 - 396

2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI)

ChaCha20 is an encryption cipher selected by Google to replace the now obsolete RC4 in the Chrome browser and Android devices. The current article discusses the performance implications of parallelizing ChaCha20 across multicore CPU and GPU. The serial implementation used to derive the parallel code is part of BoringSSL encryption library. We used OpenMP and OpenCL to accelerate the cipher and obtain...

chapter

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Tomasz Topa, Artur Noga, Andrzej Karwowski

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON) > 1 - 4

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON)

Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...

chapter

Real time ultrasound image denoising using NVIDIA CUDA

Amira Hadj Fredj, Jihene Malek

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 136 - 140

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

Image filtering is a process of reducing noise which degrades the performance of image processing. In some applications such as segmentation or classification, denoising has been designed to smooth the homogeneous areas while keeping and enhancing the edges. In several applications such as video analysis, image-guided surgical interventions or visual servoing, real-time denoising is needed. The devoted...

chapter

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

chapter

Microbenchmarks for GPU Characteristics: The Occupancy Roofline and the Pipeline Model

Jan Lemeire, Jan G. Cornelis, Laurent Segers

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 456 - 463

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

In this paper we present microbenchmarks in OpenCL to measure the most important performance characteristics of GPUs. Microbenchmarks try to measure individual characteristics that influence the performance. First, performance, in operations or bytes per second, is measured with respect to the occupancy and as such provides an occupancy roofline curve. The curve shows at which occupancy level peak...

chapter

Microbenchmarks for GPU Characteristics: The Occupancy Roofline and the Pipeline Model

Jan Lemeire, Jan G. Cornelis, Laurent Segers

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 456 - 463

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Keywords:
KERNEL
GPU

Publication date

Set your own date range

Content availability

Available (431)
None (2)

Keywords

GRAPHICS PROCESSING UNITS (213)
INSTRUCTION SETS (204)
GRAPHICS PROCESSING UNIT (180)
CUDA (142)
COPROCESSORS (86)
COMPUTER ARCHITECTURE (83)
PARALLEL PROCESSING (82)
COMPUTER GRAPHIC EQUIPMENT (70)
COMPUTATIONAL MODELING (69)
HARDWARE (57)
OPTIMIZATION (56)
OPENCL (55)
PROGRAMMING (51)
ARRAYS (50)
ALGORITHM DESIGN AND ANALYSIS (49)
MEMORY MANAGEMENT (42)
ACCELERATION (41)
REGISTERS (31)
PERFORMANCE EVALUATION (30)
SPARSE MATRICES (27)
YARN (26)
PARALLEL COMPUTING (25)
PIXEL (25)
VECTORS (25)
GPGPU (24)
MATHEMATICAL MODEL (24)
BANDWIDTH (23)
COMPUTER GRAPHICS (22)
LIBRARIES (22)
THROUGHPUT (21)
COMPUTE UNIFIED DEVICE ARCHITECTURE (20)
BENCHMARK TESTING (19)
RUNTIME (19)
GRAPHICS (18)
PARALLEL ALGORITHMS (18)
CPU (17)
CENTRAL PROCESSING UNIT (16)
FIELD PROGRAMMABLE GATE ARRAYS (16)
PARALLEL (16)
EQUATIONS (15)
FPGA (15)
IMAGE PROCESSING (15)
INDEXES (15)
FEATURE EXTRACTION (13)
PARALLEL PROGRAMMING (13)
PERFORMANCE (13)
TRAINING (13)
OPENMP (12)
PARALLEL ARCHITECTURES (12)
CONVOLUTION (11)
HIGH PERFORMANCE COMPUTING (11)
SUPPORT VECTOR MACHINES (11)
CONTEXT (10)
GRAPHIC PROCESSING UNIT (10)
MULTICORE PROCESSING (10)
RANDOM ACCESS MEMORY (10)
RENDERING (COMPUTER GRAPHICS) (10)
IMAGE RECONSTRUCTION (9)
JACOBIAN MATRICES (9)
MATRIX MULTIPLICATION (9)
REAL-TIME SYSTEMS (9)
RESOURCE MANAGEMENT (9)
THREE DIMENSIONAL DISPLAYS (9)
VIDEO CODING (9)
ANALYTICAL MODELS (8)
CONFERENCES (8)
DATA MINING (8)
DATA STRUCTURES (8)
DATABASES (8)
ENCODING (8)
ENERGY EFFICIENCY (8)
LINEAR ALGEBRA (8)
MOTION ESTIMATION (8)
MULTIPROCESSING SYSTEMS (8)
NVIDIA (8)
PARALLEL ALGORITHM (8)
PROGRAM PROCESSORS (8)
SPMV (8)
SYNCHRONIZATION (8)
TILES (8)
TUNING (8)
ACCURACY (7)
APPROXIMATION ALGORITHMS (7)
COMPUTER VISION (7)
DECODING (7)
EDUCATIONAL INSTITUTIONS (7)
HIGH DEFINITION VIDEO (7)
HISTOGRAMS (7)
IMAGE COLOR ANALYSIS (7)
IMAGE SEGMENTATION (7)
ITERATIVE METHODS (7)
MPI (7)
OPTIMISATION (7)
PARTITIONING ALGORITHMS (7)
PIPELINES (7)
RADIATION DETECTORS (7)
SHAPE (7)
SIMD (7)
more

INFONA - science communication portal

Search results

Analysis of OpenCL Work-Group Reduce for Intel GPUs

An improved GPGPU-Accelerated parallelization for rotation invariant thinning algorithm

Performance optimization for CPU-GPU heterogeneous parallel system

Two Parallel Implementations of Ehrlich-Aberth Algorithm for Root-Finding of Polynomials on Multiple GPUs with OpenMP and MPI

Basic k-mer operations using massive parallel processing on heterogeneus architectures

A Statistical-Feature ML Approach to IP Traffic Classification Based on CUDA

Parallel adaptive sparsity-constrained NMF algorithm for hyperspectral unmixing

Atomic-free optimization on GPU based SAR raw data simulation

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

GPU accelerated high-quality video/image super-resolution

GPUShare: Fair-Sharing Middleware for GPU Clouds

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Counting Triangles in Large Graphs on GPU

Performance of parallel ChaCha20 stream cipher

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Real time ultrasound image denoising using NVIDIA CUDA

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Microbenchmarks for GPU Characteristics: The Occupancy Roofline and the Pipeline Model

Microbenchmarks for GPU Characteristics: The Occupancy Roofline and the Pipeline Model

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options