Search results

Items from 21 to 40 out of 1,456 results

chapter

An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

Ajai V. George, Sankar Manoj, Sanket Rajan Gupte, Santonu Sarkar

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 233 - 242

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a...

chapter

3D CUDA FDTD based method for analysis of microstrip antennas

R. C. M. Pimenta, M. V. Africano, R. Adriano, U. C. Resende

2017 SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (IMOC) > 1 - 5

2017 SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (IMOC)

This paper presents a FDTD CUDA based implementation designed for microstrip antennas simulation. Aspects of geometry and also memory transactions are considered in the formulation of the parallel algorithm. As a result, an improvement in computational cost is achieved using the implementation proposed. Two microstrip antennas, a narrow band patch antenna and a UWB antenna, are simulated to validate...

chapter

PIM: Parallelization of Ising Model for Genomics Data

Qiankun Dong, Chao Liu, Tao Li, Zhandong Liu

2017 3rd International Conference on Big Data Computing and Communications (BIGCOM) > 172 - 177

2017 3rd International Conference on Big Data Computing and Communications (BIGCOM)

Ising model was originally designed to address the interactions among the atoms inside magnetic field. As it can fit into many biological problems where adjacent entities can interact with each other, Ising model is geting more and more popular. with its help, people may have a deeper and better understanding of associations between two related entities like genes and their products. However, it may...

chapter

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano, more

2017 46th International Conference on Parallel Processing (ICPP) > 362 - 371

2017 46th International Conference on Parallel Processing (ICPP)

The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...

chapter

A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs

Carlos Reano, Federico Silla

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 121 - 128

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

The use of Graphics Processing Units (GPUs) has become a very popular way to accelerate the execution of many applications. However, GPUs are not exempt from side effects. For instance, GPUs are expensive devices which additionally consume a non-negligible amount of energy even when they are not performing any computation. Furthermore, most applications present low GPU utilization. To address these...

chapter

Autotuning GPU Kernels via Static and Predictive Analysis

Robert Lim, Boyana Norris, Allen Malony

2017 46th International Conference on Parallel Processing (ICPP) > 523 - 532

2017 46th International Conference on Parallel Processing (ICPP)

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...

chapter

Exploiting GPUs for Fast Force-Directed Visualization of Large-Scale Networks

Govert G. Brinkmann, Kristian F.D. Rietveld, Frank W. Takes

2017 46th International Conference on Parallel Processing (ICPP) > 382 - 391

2017 46th International Conference on Parallel Processing (ICPP)

Network analysis software relies on graph layout algorithms to enable users to visually explore network data. Nowadays, networks easily consist of millions of nodes and edges, resulting in hours of computation time to obtain a readable graph layout on a typical workstation. Although these machines usually do not have a very large number of CPU cores, they can easily be equipped with Graphics Processing...

chapter

Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren, more

2017 46th International Conference on Parallel Processing (ICPP) > 171 - 180

2017 46th International Conference on Parallel Processing (ICPP)

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...

chapter

Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU

Max Raphael Sobroza Marques, Ghouthi Boukli Hacene, Carlos Eduardo Rosar Kos Lassance, Pierre-Henri Horrein

2017 International Conference on High Performance Computing & Simulation (HPCS) > 553 - 559

2017 International Conference on High Performance Computing & Simulation (HPCS)

Associative memories are models capable to store and retrieve messages given only a part of their content. These systems have been used in several applications such as databases engines, network routers, natural language processing and image recognition due to their error correction capability in pattern retrieving. Recently, Gripon and Berrou introduced a sparse associative memory based on cliques...

chapter

Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load

Sara Ayubian, Shadi Alawneh, Martin Richard, Jan Thij ssen

2017 International Conference on High Performance Computing & Simulation (HPCS) > 109 - 116

2017 International Conference on High Performance Computing & Simulation (HPCS)

Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine...

chapter

Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks

Alyson D. Pereira, Marcio Castro, Mario A. R. Dantas, Rodrigo C. O. Rocha, more

2017 International Conference on High Performance Computing & Simulation (HPCS) > 719 - 726

2017 International Conference on High Performance Computing & Simulation (HPCS)

The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general- purpose approach delivers good performance on average, but it misses optimization opportunities...

chapter

A Fast CUDA-Based Implementation for the Euclidean Distance Transform

Francisco de Assis Zampirolli, Leonardo Filipe

2017 International Conference on High Performance Computing & Simulation (HPCS) > 815 - 818

2017 International Conference on High Performance Computing & Simulation (HPCS)

In Image Processing efficient algorithms are always pursued for applications that use the most advanced hardware architectures. Distance Transform is a classic operation for blurring effects, skeletonizing, segmentation and various other purposes. This article presents two implementations of the Euclidean Distance Transform using CUDA (Compute Unified Device Architecture) in GPU (Graphics Process...

chapter

Reducing the Memory Footprint of an Eikonal Solver

Daniel Ganellari, Gundolf Haase

2017 International Conference on High Performance Computing & Simulation (HPCS) > 325 - 332

2017 International Conference on High Performance Computing & Simulation (HPCS)

The numerical solution of the Eikonal equation follows the fast iterative method with its application for tetrahe-dral meshes. Therein the main operations in each discretization element τ contain various inner products in the M-metric as ($e^{\rarr}$k,s,$e^{\rarr}$s,ℓMτ $e^{\rarr}$Tk,s · Mτ · $e^{\rarr}$s,ℓ with $e^{\rarr}$s,ℓ as connecting edge between vertices s and ℓ in element τ. Instead of passing...

chapter

A CUDA-based parallel adaptive dynamic programming algorithm

Lu Li, Xin Chen, Wei Wang

2017 36th Chinese Control Conference (CCC) > 3510 - 3515

2017 36th Chinese Control Conference (CCC)

Adaptive Dynamic Programming (ADP) with critic-actor architecture is a useful way to achieve online learning control. The algorithm Gaussian-Kernel Adaptive Dynamic Programming (GK-ADP) that has been developed before has a kind of two-phase iteration, which not only approximates value function, but also optimizes hyper-parameters simultaneously. However, just like most iteration algorithms are applied...

chapter

OAM 3D vector field visualization with CFDTD on CUDA GPUs

Gary Junkin, Alan Tennant

2017 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting > 2405 - 2406

2017 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting

This paper outlines an efficient technique for displaying 3D vector fields during conformal FDTD field updates on CUDA GPUs, while incurring only a small computational overhead and using a small configurable memory allocation. A 10GHz OAM phased array is presented as an example where 3D vector visualization shows the development of the OAM mode.

chapter

A GPU Based Parallel Clustering Method for Electric Power Big Data

Cong Ji, Zheng Xiong, Chao Fang, Hui LV, more

2017 4th International Conference on Information Science and Control Engineering (ICISCE) > 29 - 33

2017 4th International Conference on Information Science and Control Engineering (ICISCE)

With the explosive growth of user load data in power consumption information collection and load control systems, traditional computing frameworks and methods are faced with tremendous computational pressure when dealing with massive user load clustering and carrying out load characteristic analysis. In this paper, with a view to increasing accuracy and computational power of graphic process unit...

chapter

An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm

Shayan Manoochehri, Bahareh Goodarzi, Dhrubajyoti Goswami

2017 International Conference on High Performance Computing & Simulation (HPCS) > 643 - 650

2017 International Conference on High Performance Computing & Simulation (HPCS)

General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed...

chapter

SYMPES technique encoded IP-based secure voice communication system

B. Siddik Yarman, Cem Ulger, A. Burak Aslan

2017 International Symposium on Signals, Circuits and Systems (ISSCS) > 1 - 3

2017 International Symposium on Signals, Circuits and Systems (ISSCS)

Need for end-to-end secure voice communication under the cyber security threats are increasing day by day. This paper describes a method of establishing secure VOIP system in which the voice encoded with the SYMPES [1] coding technique and encryption set with an open standard encryption algorithm. Voice can be transmitted from point to point within a secure IP network. A Graphic Processing Unit (GPU)...

chapter

Enhanced CMT tracking algorithm with CUDA acceleration

Feiyang Tan, Song Xiao, Lei Li

2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) > 441 - 446

2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

The keypoints detection, matching and tracking based online tracking algorithm, which is called CMT (Clustering of the Static-Adaptive Correspondences for Deformable Object Tracking), is robust and accurate for deformable object tacking. However, its optical flow tracker is error-prone when active points run outside the scope of the target. Worse still, the computational complexity of CMT greatly...

chapter

GPU-based coevolutionary particle swarm optimization

Zhao Liang, Zhu Yanxing, Zhang Jianyu, Ye Zhencheng

2017 36th Chinese Control Conference (CCC) > 9883 - 9887

2017 36th Chinese Control Conference (CCC)

Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...

Publication type:
book

Publication date

Set your own date range

Content availability

Available (1,446)
None (10)

Keywords

CUDA (1,456)
GPU (684)
GRAPHICS PROCESSING UNITS (480)
GRAPHICS PROCESSING UNIT (363)
INSTRUCTION SETS (346)
KERNEL (303)
GPGPU (253)
PARALLEL PROCESSING (210)
COMPUTATIONAL MODELING (189)
COPROCESSORS (187)
COMPUTER ARCHITECTURE (174)
ALGORITHM DESIGN AND ANALYSIS (131)
COMPUTER GRAPHIC EQUIPMENT (129)
PARALLEL COMPUTING (124)
ACCELERATION (106)
MATHEMATICAL MODEL (96)
PROGRAMMING (89)
OPTIMIZATION (87)
HARDWARE (81)
COMPUTE UNIFIED DEVICE ARCHITECTURE (80)
ARRAYS (79)
YARN (72)
COMPUTER GRAPHICS (66)
CENTRAL PROCESSING UNIT (63)
OPENMP (62)
PARALLEL ALGORITHMS (62)
PARALLEL ARCHITECTURES (62)
MEMORY MANAGEMENT (59)
OPENCL (56)
PARALLEL PROGRAMMING (53)
GPU COMPUTING (52)
MPI (48)
LIBRARIES (47)
PERFORMANCE EVALUATION (46)
REGISTERS (45)
EQUATIONS (42)
IMAGE PROCESSING (41)
INDEXES (41)
PIXEL (41)
GRAPHICS (40)
REAL-TIME SYSTEMS (38)
IMAGE RECONSTRUCTION (37)
VECTORS (37)
RUNTIME (34)
DATA MINING (33)
HIGH PERFORMANCE COMPUTING (32)
SPARSE MATRICES (32)
THROUGHPUT (31)
FEATURE EXTRACTION (30)
BENCHMARK TESTING (29)
BANDWIDTH (28)
COMPUTERS (28)
CPU (28)
RENDERING (COMPUTER GRAPHICS) (28)
SYNCHRONIZATION (28)
CLUSTERING ALGORITHMS (27)
NVIDIA (27)
RANDOM ACCESS MEMORY (27)
DECODING (26)
PARALLELIZATION (26)
THREE DIMENSIONAL DISPLAYS (26)
CAMERAS (25)
DATA STRUCTURES (25)
HEURISTIC ALGORITHMS (25)
IMAGE EDGE DETECTION (25)
DATABASES (24)
TRAINING (24)
BIOINFORMATICS (23)
IMAGE SEGMENTATION (23)
MULTICORE PROCESSING (23)
PARALLEL (23)
PARALLEL ALGORITHM (23)
GENETIC ALGORITHMS (22)
IMAGE COLOR ANALYSIS (22)
IMAGE RESOLUTION (22)
INTERPOLATION (21)
MEDICAL IMAGE PROCESSING (20)
HISTOGRAMS (19)
IMAGE CODING (19)
PARALLEL COMPUTATION (19)
SOLID MODELING (19)
CONCURRENT COMPUTING (18)
CRYPTOGRAPHY (18)
ENCODING (18)
ESTIMATION (18)
JACOBIAN MATRICES (18)
PROGRAM PROCESSORS (18)
HPC (17)
MATLAB (17)
MATRIX MULTIPLICATION (17)
MULTI-THREADING (17)
OPENACC (17)
REAL TIME SYSTEMS (17)
SHAPE (17)
VIRTUALIZATION (17)
COMPUTER VISION (16)
EDUCATIONAL INSTITUTIONS (16)
FINITE DIFFERENCE METHODS (16)
GRAPHIC PROCESSING UNIT (16)
MULTIPROCESSING SYSTEMS (16)
more

Data set

ieee (1,372)
Springer (84)

INFONA - science communication portal

Search results

An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

3D CUDA FDTD based method for analysis of microstrip antennas

PIM: Parallelization of Ising Model for Genomics Data

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs

Autotuning GPU Kernels via Static and Predictive Analysis

Exploiting GPUs for Fast Force-Directed Visualization of Large-Scale Networks

Overlapping Data Transfers with Computation on GPU with Tiles

Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU

Implementation and Performance of a GPU-Based Monte-Carlo Framework for Determining Design Ice Load

Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks

A Fast CUDA-Based Implementation for the Euclidean Distance Transform

Reducing the Memory Footprint of an Eikonal Solver

A CUDA-based parallel adaptive dynamic programming algorithm

OAM 3D vector field visualization with CFDTD on CUDA GPUs

A GPU Based Parallel Clustering Method for Electric Power Big Data

An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm

SYMPES technique encoded IP-based secure voice communication system

Enhanced CMT tracking algorithm with CUDA acceleration

GPU-based coevolutionary particle swarm optimization

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options