Search results

Items from 21 to 40 out of 303 results

chapter

Acceleration of finite element method for 3D DC resistivity modeling using multi-GPU

Hairil Anwar, Achmad Imam Kistijantoro

2016 International Conference on Information Technology Systems and Innovation (ICITSI) > 1 - 5

2016 International Conference on Information Technology Systems and Innovation (ICITSI)

In this paper finite element method for 3D DC resistivity modeling accelerated using multi-GPU (Graphics Processing Unit). Solution of the large system of linear equations is the most expensive computation in finite element method performed in GPUs to reduce the computational time. Conjugate gradient solver used to solve large system of linear equations. We developed kernel for conjugate gradient...

chapter

CUDA implementation of an optimal online Gaussian-Signal-in-Gaussian-Noise detector

Nir Nossenson, Ariel J. Jaffe

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

We address the computationally demanding task of real time optimal detection of a Gaussian Signal in Gaussian Noise. The mathematical principles of such a detector were formulated in 1965, but a full real-time implementation of these principles was not possible for decades mainly due to technological barriers. We present a CUDA based implementation of such an optimal detector and study its decision...

chapter

Performance evaluation of the parallel object tracking algorithm employing the particle filter

Grzegorz Szwoch

2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) > 119 - 124

2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing...

chapter

Contrast and Analysis about the Characteristics of MPS and CDP in GPU Kepler Architecture

Peng Yikang, Huang Zhibin, Zhou Feng

2016 Third International Conference on Trustworthy Systems and their Applications (TSA) > 137 - 141

2016 Third International Conference on Trustworthy Systems and their Applications (TSA)

The new generation architecture of NVIDIA launched Multi-Process Services (MPS), which provides a context manager in the software layer to handle tasks with different processes. MPS can only be used on the Linux platform, and requires a computing capability of 5.0 or higher NVIDIA GPU card [1]. Although these constraints limit the applicability, but it is a relatively inexpensive way to make multiple...

chapter

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 377 - 384

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

The performance of a CUDA kernel often depends on the number of threads per thread-block (thread-block size), and the optimal configuration differs according to the graphics processing unit (GPU) hardware and the given data size to the kernel. In particular, in linear algebra libraries such as Basic Linear Algebra Subprograms (BLAS), most routines support a wide range of problem sizes and various...

chapter

Non-Equispaced FFT Computation with CUDA and GPU

Xiangwen Lyu, Jian-Min Zuo, Haiyong Xie

2016 International Conference on Virtual Reality and Visualization (ICVRV) > 227 - 234

2016 International Conference on Virtual Reality and Visualization (ICVRV)

Non-equispaced fast Fourier transform (NFFT) has attracted significant interest for its applications in tomography and remote sensing where visualization and image reconstruction require non-equispaced data. Here we present an efficient implementation of high accuracy NFFT on an NVidia GPU (Graphic Processing Unit). We focused on the convolution step in the computation of NFFT, since it is the most...

chapter

An empirical study of parallel solutions for GLCM calculation of diffraction images

John Dixon, Junhua Ding

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3969 - 3972

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Feature calculation of large amount of images is time consuming. The GPU based CUDA framework offers an affordable solution for calculating image features in parallel. The research focused on an empirical study of different implementations of a general-purpose GPU-based solution for calculating Gray-Level Co-occurrence Matrices (GLCM) and associated features of diffraction images of biological cells...

chapter

Optimization of parallel WAF for two-dimensional shallow water model with CUDA

Nugool Sataporn, Worasait Suwannik, Montri Maleewong

2016 11th International Conference on Computer Science & Education (ICCSE) > 155 - 159

2016 11th International Conference on Computer Science & Education (ICCSE)

This paper proposes the parallel implementation of finite volume method based on weighted average flux (WAF) to solve the shallow water equations on a graphic processing unit. We develop two parallel programs which are 1-dimension thread block and 2-dimension thread block, respectively. We compare the performance of these two versions with a sequential program. The numerical experiment is performed...

chapter

A Statistical-Feature ML Approach to IP Traffic Classification Based on CUDA

Zhengyang Chen, Renjie Chen, Yu Zhang, Jianzhong Zhang, more

2016 IEEE Trustcom/BigDataSE/ISPA > 2235 - 2239

2016 IEEE Trustcom/BigDataSE/ISPA

In modern networks, there exist different applications which generate various different types of network traffic. In order to improve the performance of network management, it is important to identify and classify the internet traffic. The machine learning (ML) technique based on per-flow statistics has been widely used in traffic classification. Different from traditional classification methods,...

chapter

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

P. B. Jayaraj, K. Rahamathulla, G. Gopakumar

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 580 - 588

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The maximum common subgraph of two graphs, G1 and G2, is the largest subgraph in G1 that is isomorphic to a subgraph in G2. Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem....

chapter

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker, Benjamin Worpitz, Rene Widera, Axel Huebl, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 631 - 640

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits...

chapter

Counting Triangles in Large Graphs on GPU

Adam Polak

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 740 - 746

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the...

chapter

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Tomasz Topa, Artur Noga, Andrzej Karwowski

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON) > 1 - 4

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON)

Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...

chapter

Real time ultrasound image denoising using NVIDIA CUDA

Amira Hadj Fredj, Jihene Malek

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 136 - 140

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

Image filtering is a process of reducing noise which degrades the performance of image processing. In some applications such as segmentation or classification, denoising has been designed to smooth the homogeneous areas while keeping and enhancing the edges. In several applications such as video analysis, image-guided surgical interventions or visual servoing, real-time denoising is needed. The devoted...

chapter

Parallel edge detection by SOBEL algorithm using CUDA C

Adhir Jain, Anand Namdev, Meenu Chawla

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS) > 1 - 6

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

Edge detection is one of the most important paradigm of Image processing. Images contain millions of pixel and each pixel information is independent of its neighbouring pixel. Hence this paper puts to test the capability of Graphics Processing Unit (GPU) to compute in parallel against the millions of pixel calculations involved in image processing. Each pixel operation is independent from other thus...

chapter

A Simple BSP-based Model to Predict Execution Time in GPU Applications

Marcos Amaris, Daniel Cordeiro, Alfredo Goldman, Raphael Y. de Camargo

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 285 - 294

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Models are useful to represent abstractions of software and hardware processes. The Bulk Synchronous Parallel (BSP) is a bridging model for parallel computation that allows algorithmic analysis of programs on parallel computers using performance modeling. The main idea of BSP model is the treatment of communication and computation as abstractions of a parallel system. Meanwhile, the use of GPU devices...

chapter

Implementation of edge-enhancement nonlinear anisotropic diffusion filtering using different CUDA memory models

M. H. Attia, S. A. Elshehaby, A. S. Elmaghraby

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 501 - 504

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Graphics Processing Units (GPUs) are used today as affordable energy-efficient method of acceleration for computationally exhaustive algorithms to decrease execution time exploiting the power of parallel programing techniques. In the field of medical imaging, GPUs became crucial acceleration method for computationally exhaustive algorithms. This paper presented the effect of memory optimization on...

chapter

High performance GPU Bayesian image synthesis

Miguel Carcamo, Fernando R. Rannou, Pablo E. Roman, Victor Moral, more

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 264 - 268

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

ALMA is a revolutionary instrument in its scientific concept, its engineering design and its organisation as a global effort. ALMA and new incoming radio-telescopes delivery big amounts of data that are useful to the sky image reconstruction. In this context, MEM is one of the most recognized reconstruction algorithms in radio-interferometry and is based on a Bayesian approach. Our results show that...

chapter

Evaluation of CUDA memory fence performance; Berlekamp-Massey case study

Hanan Ali, Zeinab Fayez, Ghada M. Fathy, Walaa Sheta

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 586 - 590

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Graphics processors Unit (GPU) architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general purpose applications compared to contemporary general-purpose processors (CPUs). However, GPU architecture depends on multithreading that needs to share data and resources that face memory concurrency issues. Data races and deadlocks are the most...

chapter

Efficient Implementation of Genetic Algorithms on GP-GPU with Scheduled Persistent CUDA Threads

Nicola Capodieci, Paolo Burgio

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) > 6 - 12

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

In this paper we present a heavily exploration oriented implementation of genetic algorithms to be executed on graphic processor units (GPUs) that is optimized with our novel mechanism for scheduling GPU-side synchronized jobs that takes inspiration from the concept of persistent threads. Persistent Threads allow an efficient distribution of work loads throughout the GPU so to fully exploit the CUDA...

Keywords:
KERNEL
CUDA

Publication date

Set your own date range

Content availability

Available (297)
None (6)

Keywords

INSTRUCTION SETS (164)
GPU (142)
GRAPHICS PROCESSING UNIT (138)
GRAPHICS PROCESSING UNITS (130)
COPROCESSORS (72)
GPGPU (69)
COMPUTER ARCHITECTURE (63)
PARALLEL PROCESSING (58)
COMPUTATIONAL MODELING (56)
COMPUTER GRAPHIC EQUIPMENT (51)
PROGRAMMING (43)
ARRAYS (37)
OPTIMIZATION (34)
YARN (33)
MATHEMATICAL MODEL (26)
ACCELERATION (25)
PERFORMANCE EVALUATION (25)
COMPUTE UNIFIED DEVICE ARCHITECTURE (24)
HARDWARE (24)
MEMORY MANAGEMENT (24)
PARALLEL ARCHITECTURES (24)
COMPUTER GRAPHICS (23)
REGISTERS (22)
LIBRARIES (21)
PARALLEL COMPUTING (21)
ALGORITHM DESIGN AND ANALYSIS (20)
OPENMP (18)
SPARSE MATRICES (17)
SYNCHRONIZATION (17)
VECTORS (17)
CENTRAL PROCESSING UNIT (16)
GRAPHICS (16)
EQUATIONS (15)
OPENCL (15)
THROUGHPUT (15)
RUNTIME (14)
DATA MINING (13)
PARALLEL PROGRAMMING (13)
PARALLEL ALGORITHMS (12)
DATA STRUCTURES (11)
INDEXES (11)
MPI (11)
BENCHMARK TESTING (10)
BANDWIDTH (9)
BIOINFORMATICS (9)
GPU COMPUTING (9)
IMAGE EDGE DETECTION (9)
IMAGE PROCESSING (9)
MULTI-THREADING (9)
MULTICORE PROCESSING (9)
PIXEL (9)
DATA TRANSFER (8)
HISTOGRAMS (8)
MICROPROCESSOR CHIPS (8)
CONVOLUTION (7)
CPU (7)
DECODING (7)
HIGH PERFORMANCE COMPUTING (7)
ITERATIVE METHODS (7)
MATRIX MULTIPLICATION (7)
NVIDIA (7)
REAL-TIME SYSTEMS (7)
SPMV (7)
TRAINING (7)
ENCODING (6)
FEATURE EXTRACTION (6)
GENETIC ALGORITHMS (6)
GRAPHIC PROCESSING UNIT (6)
HEURISTIC ALGORITHMS (6)
IMAGE COLOR ANALYSIS (6)
IMAGE RECONSTRUCTION (6)
MAGNETIC CORES (6)
MESSAGE SYSTEMS (6)
MULTIPROCESSING SYSTEMS (6)
PROGRAM PROCESSORS (6)
RANDOM ACCESS MEMORY (6)
RENDERING (COMPUTER GRAPHICS) (6)
THREE DIMENSIONAL DISPLAYS (6)
APPROXIMATION ALGORITHMS (5)
CLUSTERING ALGORITHMS (5)
COMPUTATIONAL COMPLEXITY (5)
CRYPTOGRAPHY (5)
DATA MODELS (5)
FINITE DIFFERENCE METHODS (5)
GENOMICS (5)
MATHEMATICS COMPUTING (5)
MEDICAL IMAGE PROCESSING (5)
NUMERICAL MODELS (5)
NVIDIA GPU (5)
PARALLEL (5)
PATTERN CLUSTERING (5)
PERFORMANCE ANALYSIS (5)
POWER AWARE COMPUTING (5)
PROTEINS (5)
RADIATION DETECTORS (5)
SHAPE (5)
SHARED MEMORY (5)
TUNING (5)
more

INFONA - science communication portal

Search results

Acceleration of finite element method for 3D DC resistivity modeling using multi-GPU

CUDA implementation of an optimal online Gaussian-Signal-in-Gaussian-Noise detector

Performance evaluation of the parallel object tracking algorithm employing the particle filter

Contrast and Analysis about the Characteristics of MPS and CDP in GPU Kepler Architecture

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Non-Equispaced FFT Computation with CUDA and GPU

An empirical study of parallel solutions for GLCM calculation of diffraction images

Optimization of parallel WAF for two-dimensional shallow water model with CUDA

A Statistical-Feature ML Approach to IP Traffic Classification Based on CUDA

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Counting Triangles in Large Graphs on GPU

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Real time ultrasound image denoising using NVIDIA CUDA

Parallel edge detection by SOBEL algorithm using CUDA C

A Simple BSP-based Model to Predict Execution Time in GPU Applications

Implementation of edge-enhancement nonlinear anisotropic diffusion filtering using different CUDA memory models

High performance GPU Bayesian image synthesis

Evaluation of CUDA memory fence performance; Berlekamp-Massey case study

Efficient Implementation of Genetic Algorithms on GP-GPU with Scheduled Persistent CUDA Threads

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options