Search results

Items from 101 to 120 out of 303 results

1 ...
3
4
5
6
7
8
9

chapter

Accelerating a novel particle-based fluid simulation on the GPU

Zhilu Chen, James Kingsley, Xinming Huang, Erkan Tuzel

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. when using a large array of particles...

chapter

Face detection algorithm using haar-like feature for GPU architecture

Dmitry Pertsau, Andrey Uvarov

2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS) > 2 > 726 - 730

2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

This article describes parallel algorithm of face detection on images for GPU architecture. This algorithm is an extension of an algorithm from OpenCV library. A computational structure is presented for the developed algorithm. Also, scheduling algorithm was developed to balance a workload among GPU's threads.

chapter

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

Xinyuan Guo, Hai Jiang, Kuan-Ching Li

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler...

chapter

Analysis of Sparse Matrix-Vector Multiplication Using Iterative Method in CUDA

Rashid Hassani, Amirreza Fazely, Riaz-Ul-Ahsan Choudhury, Peter Luksch

2013 IEEE Eighth International Conference on Networking, Architecture and Storage > 262 - 266

2013 IEEE 8th International Conference on Networking, Architecture, and Storage (NAS)

Scaling up the sparse matrix-vector multiplication has been at the heart of numerous studies in both academia and industry. The massive parallelism of graphics processing units offers tremendous performance in many high-performance computing applications. In this work, we discuss performance analysis for parallel implementation of sparse matrix-vector multiplication using the conjugate gradient algorithm...

chapter

Cloud detection in satellite imagery using graphics processing units

Ujwala M. Bhangale, Surya S. Durbha

2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS > 270 - 273

IGARSS 2013 - 2013 IEEE International Geoscience and Remote Sensing Symposium

Cloud detection and removal forms an important need for change detection studies. A small amount of cloud cover may misinterpret the crucial information in disaster management applications. Although several cloud detection techniques exist, there is a critical need to apply these techniques in real time and obtain the cloud free images quickly to support real time decisions.

chapter

Multicore and GPU algorithms for nussinov RNA folding

Junjie Li, Sanjay Ranka, Sartaj Sahni

2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS) > 1 - 2

2013 IEEE 3rd International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's equations. Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded...

chapter

Towards constructing application-level GPU computation states

Yulu Zhang, Xinyuan Guo, Hai Jiang, Kuan-Ching Li

2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS) > 161 - 166

2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)

Computation state construction is an indispensable step to achieve fault tolerance and computation mobility for scientific applications by saving and restoring the state during program execution. However, there is no effective state construction scheme yet due to the GPU's batch-mode execution manner as the GPU takes on a larger role in high performance computing. The GPU's complex memory hierarchy...

chapter

The use of GPUs in image processing

Mirgita Frasheri, Betim Cico

2013 2nd Mediterranean Conference on Embedded Computing (MECO) > 124 - 127

2013 2nd Mediterranean Conference on Embedded Computing (MECO)

The analysis of climatic parameters, vegetation, humidity and pollution in the domain of time and space is done by processing a series of images of a geographic area taken by the satellite at certain times [1]. These images are subject to several computing schemes, with the aim of evaluating spatial and temporal variations of the mentioned parameters. One of the programs used to manipulate the images...

chapter

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

Jianmin Chen, Xi Tao, Zhen Yang, Jih-Kwon Peir, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 441 - 451

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Modern General-Purpose computation on Graphics Processing Units (GPGPUs) explore parallelism in applications by building massively parallel architecture and apply multithreading technology to hide the instruction and memory latencies. Such architectures become increasingly popular for parallel applications using CUDA/OpenCL programming languages. In this paper, we investigate thread scheduling algorithms...

chapter

Parallel pattern mining on Graphics Processing Units

Krzysztof Hryniow

Proceedings of the 14th International Carpathian Control Conference (ICCC) > 134 - 139

2013 14th International Carpathian Control Conference (ICCC)

Frequent pattern mining is a field with many practical applications, where large computational power and speed are needed. Many state-of-the-art frequent pattern mining applications are an inefficient solutions for both shared memory and multiprocessor systems due to problems with parallelism and memory. One of possible solutions to the problem is the use of Graphics Processing Unit (GPU) in the system...

chapter

Extending OpenSHMEM for GPU Computing

S. Potluri, D. Bureddy, H. Wang, H. Subramoni, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1001 - 1012

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...

chapter

High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform

Jing Wu, Joseph Jaja

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 115 - 125

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

We develop an optimized FFT based Poisson solver on a CPU-GPU heterogeneous platform for the case when the input is too large to fit on the GPU global memory. The solver involves memory bound computations such as 3D FFT in which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between...

chapter

Parallelization and performance analysis of the Simulated Annealing algorithm for graph coloring problem

Lejla Becirspahic, Adisa Dulovic, Novica Nosovic

2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 1306 - 1309

2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Vertex coloring is a subset of the graph coloring problem. It is of great importance in many applications. Vertex coloring implies a coloring of the vertices of the graph with minimal number of colors (k), so that adjacent vertices have different color. The paper presents a hybrid implementation of Simulated Annealing algorithm for k-coloring of the vertices of the graph. The programming has been...

chapter

Breaking Weak 1024-bit RSA Keys with CUDA

Kerry Scharfglass, Darrin Weng, Joseph White, Christopher Lupo

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies > 207 - 212

2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT)

An exploit involving the greatest common divisor (GCD) of RSA moduli was recently discovered [1]. This paper presents a tool that can efficiently and completely compare a large number of 1024-bit RSA public keys, and identify any keys that are susceptible to this weakness. NVIDIA's graphics processing units (GPU) and the CUDA massively-parallel programming model are powerful tools that can be used...

chapter

An evaluation of CUDA-enabled virtualization solutions

M S Vinaya, Nagavijayalakshmi Vydyanathan, Mrugesh Gajjar

2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing > 621 - 626

2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing (PDGC)

Virtualization, as a technology that enables easy and effective resource sharing with a low cost and energy footprint, is becoming increasingly popular not only in enterprises but also in high performance computing. Applications with stringent performance needs often make use of graphics processors for accelerating their computations. Hence virtualization solutions that support GPU acceleration are...

chapter

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

Marwan Abdellah, Salah Saleh, Ayman Eldeib, Amr Shaarawi

2012 Cairo International Biomedical Engineering Conference (CIBEC) > 171 - 174

2012 Cairo International Biomedical Engineering Conference (CIBEC)

Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from this stage as it centers the DC component of the resulting...

chapter

An effective beamforming algorithm for a GPU-based ultrasound imaging system

Jiwon Kwon, Jae Hee Song, Sua Bae, Tai-kyoung Song, more

2012 IEEE International Ultrasonics Symposium > 619 - 622

2012 International Ultrasonics Symposium

In this paper, four beamforming algorithms (i.e., interpolation and phase rotation with pre- and post-filtering, IBF-PRE, IBF-POST, PRBF-PRE and PRBF-POST, respectively) implemented on a high-performance graphics-processing unit (GPU) were presented. Each beamforming method was divided into two kernels consisting of various beamforming and mid-processing blocks and efficiently implemented on a NVIDIA's...

chapter

A Method of Accelerating LDA Program with GPU

Yanjun Jiang, Hualong Wen, Zhanchun Gao

2012 Third International Conference on Networking and Distributed Computing > 26 - 29

2012 Third International Conference on Networking and Distributed Computing (ICNDC)

LDA (Latent Dirichlet Allocation) is a text modeling algorithm based on a generative probabilistic model. It is widely used to discover latent topics among a set of documents. Mahout has implemented LDA algorithm, however, the execution time of the LDA program is very long when processing a large amount of documents, because the documents are processed in sequence. This paper introduces a method to...

chapter

Parallel High Dimensional Self Organizing Maps Using CUDA

Felipe C. Moraes, Silvia C. Botelho, Nelson Duarte Filho, Joel Felipe O. Gaya

2012 Brazilian Robotics Symposium and Latin American Robotics Symposium > 302 - 306

2012 Brazilian Robotics Symposium and Latin American Robotics Symposium (SBR-LARS)

A common neural network used for complex data clustering is the Self Organizing Maps(SOM). This algorithm have a expensive training step, that occur mainly on high dimensional applications like image clustering. This makes impossible for some of these applications to be run in real time or even in a feasible time. On this paper we explore the use of GPUs with the NVIDIA CUDA language to decrease computational...

chapter

Phase-Based Profiling in GPGPU Kernels

Robert Dietrich, Felix Schmitt, Rene Widera, Michael Bussmann

2012 41st International Conference on Parallel Processing Workshops > 414 - 423

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...

1 ...
3
4
5
6
7
8
9

Keywords:
KERNEL
CUDA

Publication date

Set your own date range

Content availability

Available (297)
None (6)

Keywords

INSTRUCTION SETS (164)
GPU (142)
GRAPHICS PROCESSING UNIT (138)
GRAPHICS PROCESSING UNITS (130)
COPROCESSORS (72)
GPGPU (69)
COMPUTER ARCHITECTURE (63)
PARALLEL PROCESSING (58)
COMPUTATIONAL MODELING (56)
COMPUTER GRAPHIC EQUIPMENT (51)
PROGRAMMING (43)
ARRAYS (37)
OPTIMIZATION (34)
YARN (33)
MATHEMATICAL MODEL (26)
ACCELERATION (25)
PERFORMANCE EVALUATION (25)
COMPUTE UNIFIED DEVICE ARCHITECTURE (24)
HARDWARE (24)
MEMORY MANAGEMENT (24)
PARALLEL ARCHITECTURES (24)
COMPUTER GRAPHICS (23)
REGISTERS (22)
LIBRARIES (21)
PARALLEL COMPUTING (21)
ALGORITHM DESIGN AND ANALYSIS (20)
OPENMP (18)
SPARSE MATRICES (17)
SYNCHRONIZATION (17)
VECTORS (17)
CENTRAL PROCESSING UNIT (16)
GRAPHICS (16)
EQUATIONS (15)
OPENCL (15)
THROUGHPUT (15)
RUNTIME (14)
DATA MINING (13)
PARALLEL PROGRAMMING (13)
PARALLEL ALGORITHMS (12)
DATA STRUCTURES (11)
INDEXES (11)
MPI (11)
BENCHMARK TESTING (10)
BANDWIDTH (9)
BIOINFORMATICS (9)
GPU COMPUTING (9)
IMAGE EDGE DETECTION (9)
IMAGE PROCESSING (9)
MULTI-THREADING (9)
MULTICORE PROCESSING (9)
PIXEL (9)
DATA TRANSFER (8)
HISTOGRAMS (8)
MICROPROCESSOR CHIPS (8)
CONVOLUTION (7)
CPU (7)
DECODING (7)
HIGH PERFORMANCE COMPUTING (7)
ITERATIVE METHODS (7)
MATRIX MULTIPLICATION (7)
NVIDIA (7)
REAL-TIME SYSTEMS (7)
SPMV (7)
TRAINING (7)
ENCODING (6)
FEATURE EXTRACTION (6)
GENETIC ALGORITHMS (6)
GRAPHIC PROCESSING UNIT (6)
HEURISTIC ALGORITHMS (6)
IMAGE COLOR ANALYSIS (6)
IMAGE RECONSTRUCTION (6)
MAGNETIC CORES (6)
MESSAGE SYSTEMS (6)
MULTIPROCESSING SYSTEMS (6)
PROGRAM PROCESSORS (6)
RANDOM ACCESS MEMORY (6)
RENDERING (COMPUTER GRAPHICS) (6)
THREE DIMENSIONAL DISPLAYS (6)
APPROXIMATION ALGORITHMS (5)
CLUSTERING ALGORITHMS (5)
COMPUTATIONAL COMPLEXITY (5)
CRYPTOGRAPHY (5)
DATA MODELS (5)
FINITE DIFFERENCE METHODS (5)
GENOMICS (5)
MATHEMATICS COMPUTING (5)
MEDICAL IMAGE PROCESSING (5)
NUMERICAL MODELS (5)
NVIDIA GPU (5)
PARALLEL (5)
PATTERN CLUSTERING (5)
PERFORMANCE ANALYSIS (5)
POWER AWARE COMPUTING (5)
PROTEINS (5)
RADIATION DETECTORS (5)
SHAPE (5)
SHARED MEMORY (5)
TUNING (5)
more

INFONA - science communication portal

Search results

Accelerating a novel particle-based fluid simulation on the GPU

Face detection algorithm using haar-like feature for GPU architecture

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

Analysis of Sparse Matrix-Vector Multiplication Using Iterative Method in CUDA

Cloud detection in satellite imagery using graphics processing units

Multicore and GPU algorithms for nussinov RNA folding

Towards constructing application-level GPU computation states

The use of GPUs in image processing

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

Parallel pattern mining on Graphics Processing Units

Extending OpenSHMEM for GPU Computing

High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform

Parallelization and performance analysis of the Simulated Annealing algorithm for graph coloring problem

Breaking Weak 1024-bit RSA Keys with CUDA

An evaluation of CUDA-enabled virtualization solutions

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

An effective beamforming algorithm for a GPU-based ultrasound imaging system

A Method of Accelerating LDA Program with GPU

Parallel High Dimensional Self Organizing Maps Using CUDA

Phase-Based Profiling in GPGPU Kernels

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options