Search results

Items from 1 to 20 out of 27 results

chapter

A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs

Carlos Reano, Federico Silla

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 121 - 128

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

The use of Graphics Processing Units (GPUs) has become a very popular way to accelerate the execution of many applications. However, GPUs are not exempt from side effects. For instance, GPUs are expensive devices which additionally consume a non-negligible amount of energy even when they are not performing any computation. Furthermore, most applications present low GPU utilization. To address these...

chapter

Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory

Khaled Balhaf, Mohammad A. Alsmirat, Mahmoud Al-Ayyoub, Yaser Jararweh, more

2017 8th International Conference on Information and Communication Systems (ICICS) > 7 - 11

2017 8th International Conference on Information and Communication Systems (ICICS)

String matching problems such as sequence alignment is one of the fundamental problems in many computer since fields such as natural language processing (NLP) and bioinformatics. Many algorithms have been proposed in the literature to address this problem. Some of these algorithms compute the edit distance between the two strings to perform the matching. However, these algorithms usually require long...

chapter

Parallelization of GST algorithm for source code similarity detection

Marko J. Misic, Dusan V. Nikolov, Jelica Z. Protic, Milo V. Tomasevic

2016 24th Telecommunications Forum (TELFOR) > 1 - 4

2016 24th Telecommunications Forum (TELFOR)

Source code is a frequent target for plagiarism in massive computing courses. Plagiarism detection requires a significant effort from the teaching staff, thus software tools have been used to detect similar source codes. This paper examines parallelization of source code similarity detection based on Greedy-String-Tiling and Karp-Rabin algorithms. CPU implementation is parallelized using Pthreads,...

chapter

Application of the CUDA® toolkit multi-GPU libraries to an out-of-core MoM solver

Alexander L. Saxerud, Jack P. Ferrell, Eric A. Dunn

2016 IEEE International Symposium on Antennas and Propagation (APSURSI) > 2013 - 2014

2016 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting

In this work the latest multi-graphical-processing unit (multi-GPU) libraries included with the NVIDIA® CUDA® toolkit are used to accelerate the simulation of the radar cross section (RCS) of a target discretized to 125,913 unknowns. On a system with four NVIDIA Tesla K40 GPU cards, the total runtime of a full-wave out-of-core method of moments (MoM) solver was reduced from about 4 days to about 5...

chapter

Accelerating generation of stochastic cyclone routes with GPU programming

Yiran Chen, Zhou Huang

2015 23rd International Conference on Geoinformatics > 1 - 5

2015 23rd International Conference on Geoinformatics

Typhoon/Cyclone is an important part of the risk assessment of natural catastrophes. To estimate probabilistic cyclone damage for a very long time (e.g. 1,000 years, 10,000 years), the first step of catastrophe risk assessment model is to generate a stochastic set of routes derived by historical cyclone routes, which takes a long time. In order to accelerate the generation, we try to use GPU programming,...

chapter

Digital filter bank implementation and signal classification on the basis of CUDA

D. M. Klionskiy, D. I. Kaplun, A. S. Voznesenskiy, V. V. Gulvanskiy, more

2015 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW) > 93 - 97

2015 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW)

The present paper discusses radio monitoring tasks and their solution using DFT-modulated filter banks. Filter bank software-hardware implementations are studied on the basis of Central Processing Unit (CPU) and Compute Unified Device Architecture (CUDA) with the use of Graphics Processing Unit (GPU). It is shown that CUDA technology is efficient for processing large datasets and outperforms computational...

chapter

On the efficient design of a generic emulator module towards precise simulation of wireless sensor networks using XML

A. Filippou, D.A. Karras

2014 22nd Telecommunications Forum Telfor (TELFOR) > 1059 - 1062

2014 22nd Telecommunications Forum Telfor (TELFOR)

The use of emulators in the WSN design process offers the advantage of precise timing and cross layer simulation. The former due to the fact that each instruction executes in specific machine cycles, and the latter because the emulated machine code contains both application and protocol stack code. Emulators used in sensor network simulation are bounded to specific hardware. Adding different modules...

chapter

Development of GPU-accelerated localization system for autonomous mobile robot

Maxim N. Rud, Alexander R. Pantiykchin

2014 International Conference on Mechanical Engineering, Automation and Control Systems (MEACS) > 1 - 4

2014 International Conference on Mechanical Engineering, Automation and Control Systems (MEACS)

We present a localization system for autonomous mobile robot, that operates in conditions of well - known environment. In our work we use particle - based Monte - Carlo localization. This algorithm has many applications in mobile robotics, but it is computationally expensive. Due to high level of parallelism in this algorithm, we had an opportunity to accelerate its execution on graphical processing...

chapter

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

Akihiko Kasagi, Koji Nakano, Yasuaki Ito

2014 43rd International Conference on Parallel Processing > 251 - 260

2014 43nd International Conference on Parallel Processing (ICPP)

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main contribution of this paper is to introduce the...

chapter

POSTER: Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array's Real Time Analysis pipeline

Juan Jose Rodriguez-Vazquez, Jose Luis Vazquez-Poletti, Carlos Delgado, Andrea Bulgarelli, more

2014 IEEE International Conference on Cluster Computing (CLUSTER) > 292 - 293

2014 IEEE International Conference On Cluster Computing (CLUSTER)

In this paper, several versions of a signal extraction algorithm, pertaining to the entry stage of the Cherenkov Telescope Array's Real Time Analysis pipeline, were implemented and optimised using SSE2, POSIX threads and CUDA. Results of this proof of concept let us gain an insight into the suitability of each platform, and the performance each one can deliver, to carry out this particular task.

chapter

A Performance Prediction Model for Memory-Intensive GPU Kernels

Zhidan Hu, Guangming Liu, Zhidan Hu

2014 IEEE Symposium on Computer Applications and Communications > 14 - 18

2014 IEEE Symposium on Computer Applications and Communications (SCAC)

Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the...

chapter

Optimizing a GPU Algorithm through Hardware Profiling Analysis

Fernando G. Tinetti, Sergio M. Martin

2014 International Conference on Computational Science and Computational Intelligence > 1 > 45 - 51

2014 International Conference on Computational Science and Computational Intelligence (CSCI)

Usage of GPU-based architectures for scientific computing has been steadily increasing in the last years. This new paradigm for both programming and execution has been applied to solve several classic problems much faster than using the conventional multiprocessor and/or multicomputer approach. These architectures allow an increase in performance -- compared to conventional CPU processors -- for specific...

chapter

High performance prime field multiplication for GPU

Karl Leboeuf, Roberto Muscedere, Majid Ahmadi

2012 IEEE International Symposium on Circuits and Systems > 93 - 96

2012 IEEE International Symposium on Circuits and Systems - ISCAS 2012

This paper presents a high performance algorithm for modular multiplication on a graphics processing unit (GPU) implemented in assembler. The proposed algorithm carries out finite field multiplication over the NIST prime fields of size 192, 224, 256 and 384 bits. Included is a detailed explanation of our algorithm, an instruction count analysis, and a comparison to recently published work; compared...

chapter

Accelerating WIPL-D numerical EM kernel by using graphics processing units

Branko M. Kolundzija, Dragan I. Olcan, Dusan P. Zoric, Sladjana M. Maric

2011 10th International Conference on Telecommunication in Modern Satellite Cable and Broadcasting Services (TELSIKS) > 2 > 413 - 419

TELSIKS 2011 - 2011 10th International Conference on Telecommunication in Modern Satellite, Cable and Broadcasting Services

We present acceleration for numerical solving of electromagnetic (EM) problems by using method of moments (MoM) and NVIDIA graphics processing units (GPU). Three stages of MoM are accelerated: matrix fill, solution of complex linear equations and post-processing. The results show that GPUs can be efficiently used for EM simulations.

chapter

Energy consumption of Graphic Processing Units with respect to automotive use-cases

L Stolz, H Endt, M Vaaraniemi, D Zehe, more

2010 International Conference on Energy Aware Computing > 1 - 4

2010 International Conference on Energy Aware Computing (ICEAC 2010)

With the introduction of API's like CUDA, Stream+ or OpenCL, modern Graphics Processing Units (GPU's) can be easily employed for general purpose computing. Plus, their comparatively low price per GFLOP makes them interesting candidates for coprocessors in future embedded Electronic Control Units (ECUs). Yet, as car manufacturers thrive to reduce the Thermal Design Power (TDP) of each and every ECU...

chapter

GPU Based Spot Noise Parallel Algorithm for 2D Vector Field Visualization

Bo Qin, Fang Su, Zhanbin Wu, Jingjing Wang

2010 International Conference on Optoelectronics and Image Processing > 1 > 580 - 583

2010 International Conference on Optoelectronics and Image Processing (ICOIP 2010)

Graphic Processing Unit (GPU) has involved into a parallel computation for it's massively multi threaded architecture. Due to its high computational power, GPU has been used to deal with many problems that can be easily parallelized. This paper will present a GPU based spot noise parallel algorithm for 2D vector field visualization. It uses spot noise method with GPU resources and compute unified...

chapter

An implementation and its evaluation of password cracking tool parallelized on GPGPU

T Murakami, R Kasahara, T Saito

2010 10th International Symposium on Communications and Information Technologies > 534 - 538

2010 10th International Symposium on Communications and Information Technologies (ISCIT 2010)

General-purpose computing on graphics processing units (GPGPU) is popular computing technology to utilize in various fields. In the paper, we parallelize cryptographical hash processing of a password cracking tool, John the Ripper, by utilizing CUDA on GPGPU. We also evaluate our work to compare the processing time of hash processing parallelized by GPU with that of the John the Ripper on a dual-core...

chapter

An Interior Point Optimization Solver for Real Time Inter-frame Collision Detection: Exploring Resource-Accuracy-Platform Tradeoffs

B Leung, Chih-Hung Wu, S O Memik, S Mehrotra

2010 International Conference on Field Programmable Logic and Applications > 113 - 118

2010 International Conference on Field Programmable Logic and Applications (FPL 2010)

We present and compare implementations of an affine interior-point algorithm for real-time collision detection on a GPGPU and an FPGA. This particular interior-point algorithm is distinguished from other collision detection methods by its ability to perform detection between pairs of objects undergoing fast rotational and translational movement. This enables inter-frame collision detection, i.e. collision...

chapter

Parallel spatial matching for object retrieval implemented on GPU

Wenying Wang, Dongming Zhang, Yongdong Zhang, Jintao Li, more

2010 IEEE International Conference on Multimedia and Expo > 890 - 895

2010 IEEE International Conference on Multimedia and Expo (ICME)

Spatial matching for object retrieval is often time-consuming and susceptible to viewpoint changes. To address this problem, we propose a novel spatial matching method and implement it on modern GPU in parallel. Unlike previous spatial matching methods, in which the affine transformation estimation is based on the gravity vector assumption, our method abandons this strong assumption by matching the...

chapter

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

G Amador, A Gomes

2010 International Conference on Computational Science and Its Applications > 118 - 128

2010 International Conference on Computational Science and Its Applications (ICCSA 2010)

Fluid simulation has been an active research field in computer graphics for the last 30 years. Stam's stable fluids method, among others, is used for solving the equations that govern fluids (i.e. Navier-Stokes equations). An implementation of stable fluids in 3D using NVIDIA Compute Unified Architecture, shortly CUDA, is provided in this paper. This CUDA-based implementation also features the accurate...

Keywords:
RANDOM ACCESS MEMORY
Publication type:
book

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNITS (16)
GPU (11)
INSTRUCTION SETS (9)
COPROCESSORS (8)
GRAPHICS PROCESSING UNIT (8)
COMPUTER GRAPHICS (6)
KERNEL (6)
ACCELERATION (4)
CENTRAL PROCESSING UNIT (4)
COMPUTATIONAL MODELING (4)
COMPUTE UNIFIED DEVICE ARCHITECTURE (4)
COMPUTER GRAPHIC EQUIPMENT (4)
GPGPU (4)
GPU ACCELERATION (4)
HARDWARE (4)
MEMORY MANAGEMENT (4)
REGISTERS (4)
ALGORITHM DESIGN AND ANALYSIS (3)
ARRAYS (3)
COMPUTER ARCHITECTURE (3)
MATHEMATICAL MODEL (3)
PIPELINES (3)
PIXEL (3)
PROGRAM PROCESSORS (3)
YARN (3)
BIOINFORMATICS (2)
C LANGUAGE (2)
EQUATIONS (2)
GPU COMPUTING (2)
GRAPHICS (2)
MEDICAL IMAGE PROCESSING (2)
METHOD OF MOMENTS (2)
PARALLEL ALGORITHMS (2)
PARALLEL ARCHITECTURES (2)
PROGRAMMING (2)
REAL-TIME SYSTEMS (2)
THREE DIMENSIONAL DISPLAYS (2)
THROUGHPUT (2)
TRANSFORMS (2)
2D VECTOR FIELD VISUALIZATION (1)
2D VELOCITY FIELD (1)
2D WAVELET TRANSFORM (1)
2D WAVELET-BASED MEDICAL DATA COMPRESSION (1)
2D-DWT IMAGE COMPRESSION (1)
3-D GAMING INDUSTRY (1)
3D NAVIGATION APPLICATION (1)
ADABOOST (1)
AFFINE COVARIANT NEIGHBORS (1)
AFFINE INTERIOR-POINT ALGORITHM (1)
AFFINE TRANSFORMATION ESTIMATION (1)
AFFINE TRANSFORMATIONS (1)
AFFINE TRANSFORMS (1)
API (1)
APPROXIMATION METHODS (1)
AUTOMATIC VOLTAGE CONTROL (1)
AUTOMOTIVE ELECTRONICS (1)
AUTOMOTIVE ENGINEERING (1)
BACKWARD PROJECTION (1)
BINARY TREE (1)
BIOLOGY COMPUTING (1)
BIOMEDICAL IMAGING (1)
BIOMEDICAL MONITORING (1)
BIOMEDICAL OPTICAL IMAGING (1)
BITMAP BASED PARALLEL SCAN (1)
BLOCK-LEVEL PARALLEL ALGORITHM (1)
CATASTROPHE RISK (1)
CELL SAMPLES (1)
CHERENKOV TELESCOPES (1)
COLLISION DETECTION (1)
COMMUNICATION SYSTEMS (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTATIONAL DEVICES (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA) (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE TECHNOLOGY (1)
COMPUTE UNITED DEVICE ARCHITECTURE (1)
COMPUTER GAMES (1)
COMPUTER UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER VISION (1)
COMPUTERISED CONTROL (1)
COMPUTERS (1)
COPROCESSOR (1)
CORRELATION (1)
CORRELATION METHODS (1)
CORRELATION WINDOW RADII (1)
CRYPTOGRAPHICAL HASH PROCESSING (1)
CRYPTOGRAPHY (1)
CTA (1)
CUDA ARCHITECTURE (1)
CUDA-BASED IMPLEMENTATION (1)
CYCLONE RISK (1)
CYCLONES (1)
DAMERAU EDIT DISTANCE (1)
DATA ANALYSIS (1)
DATA COMPRESSION (1)
DATA MINING (1)
DATA POINTS (1)
DATA TRANSFORMATION MECHANISM (1)
DATA VISUALIZATION (1)
more

INFONA - science communication portal

Search results

A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs

Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory

Parallelization of GST algorithm for source code similarity detection

Application of the CUDA® toolkit multi-GPU libraries to an out-of-core MoM solver

Accelerating generation of stochastic cyclone routes with GPU programming

Digital filter bank implementation and signal classification on the basis of CUDA

On the efficient design of a generic emulator module towards precise simulation of wireless sensor networks using XML

Development of GPU-accelerated localization system for autonomous mobile robot

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

POSTER: Performance evaluation of a signal extraction algorithm for the Cherenkov Telescope Array's Real Time Analysis pipeline

A Performance Prediction Model for Memory-Intensive GPU Kernels

Optimizing a GPU Algorithm through Hardware Profiling Analysis

High performance prime field multiplication for GPU

Accelerating WIPL-D numerical EM kernel by using graphics processing units

Energy consumption of Graphic Processing Units with respect to automotive use-cases

GPU Based Spot Noise Parallel Algorithm for 2D Vector Field Visualization

An implementation and its evaluation of password cracking tool parallelized on GPGPU

An Interior Point Optimization Solver for Real Time Inter-frame Collision Detection: Exploring Resource-Accuracy-Platform Tradeoffs

Parallel spatial matching for object retrieval implemented on GPU

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options