Search results

Items from 121 to 140 out of 433 results

1 ...
4
5
6
7
8
9
10

chapter

Accelerating outlier detection with intra- and inter-node parallelism

Fabrizio Angiulli, Stefano Basta, Stefano Lodi, Claudio Sartori

2014 International Conference on High Performance Computing & Simulation (HPCS) > 476 - 483

2014 International Conference on High Performance Computing & Simulation (HPCS)

Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms,...

chapter

Maximizing diversity in CPUs: Using GPUs as coprocessors to achieve safety integrity

Frank Reichenbach, Jan Endresen, Stein-Erik Ellevseth

2014 12th IEEE International Conference on Industrial Informatics (INDIN) > 182 - 187

2014 12th IEEE International Conference on Industrial Informatics (INDIN)

Modern System-on-Chip (SOC) architectures offer much for a relatively small price, but often industrial machine builders only use a fraction of the functionality. Their main interest is the performance boost by using multiple cores. For safety devices, the on-chip redundancy is beneficially to achieve higher reliability, but since most platforms are homogenous, there is a need to get systematic and...

chapter

Agent-based mood spread diffusion model for GPU

First Xiaotong Wang, Second Zhen Liu, Third Su Deng

2014 IEEE 5th International Conference on Software Engineering and Service Science > 1056 - 1059

2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS)

By the analysis of problems of mood spread diffusion, combining the theory of Agent, Agent-Based mood diffusion model was established, using CUDA programming tool, which is suitable for parallel computing of the part to carry on the design implementation, thus proving the GPU computing can improve the efficiency of the model calculation.

chapter

Efficient String Sorting on Multi - and Many-Core Architectures

Aleksandr Drozd, Miquel Pericas, Satoshi Matsuoka

2014 IEEE International Congress on Big Data > 637 - 644

2014 IEEE International Congress on Big Data (BigData Congress)

This paper addresses the issue of efficient sorting of strings on multi-and many-core processors. We propose CPU and GPU implementations of the most-significant digit radix sort algorithm using different parallelization strategies on various stages of the execution to achieve good workload balance and optimal use of system resources. We evaluate the performance of our solution on both architectures...

chapter

Full-stream architecture for ray tracing with efficient data transmission

Youngsam Shin, Jaedon Lee, Won-Jong Lee, Soojung Ryu, more

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2165 - 2168

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

In this paper, we focus on the impact of a memory bandwidth limitation by analyzing the bandwidth consumption for a ray tracing system and present an energy efficient data transmission method using a dedicated interface between the processor and ray tracing hardware engine. To achieve real-time ray tracing, we propose a full-stream architecture through the use of this dedicated interface. For an evaluation...

chapter

A Parallel Implementation of the Durand-Kerner Algorithm for Polynomial Root-Finding on GPU

Kahina Ghidouche, Raphail Couturier, Abderrahmane Sider

2014 International Conference on Advanced Networking Distributed Systems and Applications > 53 - 57

2014 International Conference on Advanced Networking Distributed Systems and Applications (INDS)

In this article we present a parallel implementation of the Durand-Kerner algorithm to find roots of polynomials of high degree on a GPU architecture (Graphics Processing Unit). We have implemented both a CPU version in and a GPU compatible version with CUDA. The main result of our work is a parallel implementation that is 10 times as fast as its sequential counterpart on a single CPU for high degree...

chapter

CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters

Edans F. de O. Sandes, Guillermo Miranda, Alba C.M.A. de Melo, Xavier Martorell, more

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 160 - 169

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

This paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) biological sequence comparison algorithm for huge DNA sequences in multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple GPUs, which communicate border elements to the neighbour, using a circular buffer mechanism. We also provide a method to predict the...

chapter

GPU-accelerated computation for texture features using OpenCL framework

Ahmad M. Saladin, Licheng Jiao, Xiangrong Zhang

2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) > 1 - 6

2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

Texture Features introduced by Haralick in 1973 which rely on computing the so-called Gray Level Co-occurrence Matrix (GLCM), are being used extensively by many applications to understand and enhance images acquired from various scientific contexts. The main limitations of these features are their high computational costs pertaining to memory usage and processing time. In this paper a Graphics Processing...

chapter

GPU-based timing-aware test generation for small delay defects

Kuan-Yu Liao, Po-Juei Chen, Ang-Feng Lin, James Chien-Mo Li, more

2014 19th IEEE European Test Symposium (ETS) > 1 - 2

2014 19th IEEE European Test Symposium (ETS)

A GPU-based timing-aware ATPG is proposed to generate a compact high-quality test set. The test generation algorithm backtraces and propagates along multiple long paths so that many test patterns are generated at the same time. Generated test patterns are then fault simulated and selected. Compared with an 8-core CPU-based timing-aware commercial ATPG, the proposed GPU-based technique achieved 36%...

chapter

CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs

Marco Maggioni, Tanya Berger-Wolf

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 933 - 940

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few...

chapter

Acceleration of a Python-Based Tsunami Modelling Application via CUDA and OpenHMPP

Zhe Weng, Peter E. Strazdins

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1275 - 1284

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Modern graphics processing units (GPUs) have became powerful and cost-effective computing platforms. Parallel programming standards (e.g. CUDA) and directive-based programming standards (like OpenHMPP and OpenACC) are available to harness this tremendous computing power to tackle largescale modelling and simulation in scientific areas. ANUGA is a tsunami modelling application which is based on unstructured...

chapter

Using GPU Shared Memory with a Directive-Based Approach

Wei Ding, Ligang Lu, Mauricio Araya-Polo, Amik St-Cyr, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1021 - 1028

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Graphic Processing Units (GPUs) have been increasingly adopted by the High-Performance Computing community. Its unique hardware architecture supports hundreds or housands of light-weighted threads in a more power efficient manner compared with traditional CPUs, and with higher overall performance. This motivates highly parallel applications to be ported to GPUs. Programming GPUs is not a trivial task...

chapter

Transparent GPU Execution of NumPy Applications

Troels Blum, Mads R.B. Kristensen, Brian Vinter

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1002 - 1010

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or...

chapter

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Dmitry Mikushin, Nikolay Likhogrud, Eddy Z. Zhang, Christopher Bergstrom

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1011 - 1020

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...

chapter

Resource Centered Computing Delivering High Parallel Performance

Jens Gustedt, Stephane Vialle, Patrick Mercier

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 77 - 88

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today's hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation...

chapter

CUDA Memory Techniques for Matrix Multiplication on Quadro 4000

Tekesha Athil, Richard Christian, Yenumula B. Reddy

2014 11th International Conference on Information Technology: New Generations > 419 - 425

2014 Eleventh International Conference on Information Technology: New Generations (ITNG)

Today, the industry old adage of sequential processing is certainly no longer sufficient. The need for high performance computation is ever growing, even though certain problem sets remain within the realm of super high performance computing with applications such as weather forecasting, quantum physics and climate research to name a few. Within the commercial realm of computation, NVIDIA has proposed...

chapter

A CUDA Based Implementation of Locally-and Feature-Adaptive Diffusion Based Image Denoising Algorithm

Ali Pour Yazdanpanah, Ajay K. Mandava, Emma E. Regentova, Venkatesan Muthukumar, more

2014 11th International Conference on Information Technology: New Generations > 388 - 393

2014 Eleventh International Conference on Information Technology: New Generations (ITNG)

In this paper we introduce a parallel implementation of locally-and feature-adaptive diffusion based (LFAD) method for image denoising using NVIDIA CUDA framework and graphics processing units (GPUs). LFAD is a novel method for removing additive white Gaussian (AWG) noise in images reported to yield high quality denoised images [1]. It approaches each image region separately and uses different number...

chapter

OpenCL implementation of unsharp filtering on GPU and FPGA

Ozge Unel, Toygar Akgun

2014 22nd Signal Processing and Communications Applications Conference (SIU) > 212 - 215

2014 22nd Signal Processing and Communications Applications Conference (SIU)

The purpose of this study is to evaluate the performance of two dimensional multi-threaded linear filtering process on the GPU and FPGA platforms. To obtain the implementation on varying platforms, OpenCL API is used. OpenCL provides platform independent programming advantage. The results on three different platforms are compared to each other within this scope. These platforms are CPU, GPU, and FPGA...

chapter

A parallel clustering algorithm for placement

Amir Momeni, Perhaad Mistry, David Kaeli

Fifteenth International Symposium on Quality Electronic Design > 349 - 356

2014 15th International Symposium on Quality Electronic Design (ISQED)

In order to improve the layout quality of a VLSI design, many placement tools employ clustering algorithms to prune the optimization space and produce a design that can be enhanced while considering multiple design constraints. An intelligent clustering algorithm can guide a placement tool to reduce wire length, reduce cycle time, consider additional metrics or optimize a design based on a combination...

chapter

Parallel graph coloring algorithms on the GPU using OpenCL

Shilpi Sengupta

2014 International Conference on Computing for Sustainable Global Development (INDIACom) > 353 - 357

2014 International Conference on Computing for Sustainable Global Development (INDIACom)

GPUs (Graphics Processing Units) are designed to solve large data-parallel problems encountered in the fields of image processing, scene rendering, video playback, and gaming. GPUs are therefore designed to handle a higher degree of parallelism as compared to conventional CPUs. GPGPU (General Purpose computing on Graphics Processing Units) enables users to do parallel computing on the graphics hardware...

1 ...
4
5
6
7
8
9
10

Keywords:
KERNEL
GPU

Publication date

Set your own date range

Content availability

Available (431)
None (2)

Keywords

GRAPHICS PROCESSING UNITS (213)
INSTRUCTION SETS (204)
GRAPHICS PROCESSING UNIT (180)
CUDA (142)
COPROCESSORS (86)
COMPUTER ARCHITECTURE (83)
PARALLEL PROCESSING (82)
COMPUTER GRAPHIC EQUIPMENT (70)
COMPUTATIONAL MODELING (69)
HARDWARE (57)
OPTIMIZATION (56)
OPENCL (55)
PROGRAMMING (51)
ARRAYS (50)
ALGORITHM DESIGN AND ANALYSIS (49)
MEMORY MANAGEMENT (42)
ACCELERATION (41)
REGISTERS (31)
PERFORMANCE EVALUATION (30)
SPARSE MATRICES (27)
YARN (26)
PARALLEL COMPUTING (25)
PIXEL (25)
VECTORS (25)
GPGPU (24)
MATHEMATICAL MODEL (24)
BANDWIDTH (23)
COMPUTER GRAPHICS (22)
LIBRARIES (22)
THROUGHPUT (21)
COMPUTE UNIFIED DEVICE ARCHITECTURE (20)
BENCHMARK TESTING (19)
RUNTIME (19)
GRAPHICS (18)
PARALLEL ALGORITHMS (18)
CPU (17)
CENTRAL PROCESSING UNIT (16)
FIELD PROGRAMMABLE GATE ARRAYS (16)
PARALLEL (16)
EQUATIONS (15)
FPGA (15)
IMAGE PROCESSING (15)
INDEXES (15)
FEATURE EXTRACTION (13)
PARALLEL PROGRAMMING (13)
PERFORMANCE (13)
TRAINING (13)
OPENMP (12)
PARALLEL ARCHITECTURES (12)
CONVOLUTION (11)
HIGH PERFORMANCE COMPUTING (11)
SUPPORT VECTOR MACHINES (11)
CONTEXT (10)
GRAPHIC PROCESSING UNIT (10)
MULTICORE PROCESSING (10)
RANDOM ACCESS MEMORY (10)
RENDERING (COMPUTER GRAPHICS) (10)
IMAGE RECONSTRUCTION (9)
JACOBIAN MATRICES (9)
MATRIX MULTIPLICATION (9)
REAL-TIME SYSTEMS (9)
RESOURCE MANAGEMENT (9)
THREE DIMENSIONAL DISPLAYS (9)
VIDEO CODING (9)
ANALYTICAL MODELS (8)
CONFERENCES (8)
DATA MINING (8)
DATA STRUCTURES (8)
DATABASES (8)
ENCODING (8)
ENERGY EFFICIENCY (8)
LINEAR ALGEBRA (8)
MOTION ESTIMATION (8)
MULTIPROCESSING SYSTEMS (8)
NVIDIA (8)
PARALLEL ALGORITHM (8)
PROGRAM PROCESSORS (8)
SPMV (8)
SYNCHRONIZATION (8)
TILES (8)
TUNING (8)
ACCURACY (7)
APPROXIMATION ALGORITHMS (7)
COMPUTER VISION (7)
DECODING (7)
EDUCATIONAL INSTITUTIONS (7)
HIGH DEFINITION VIDEO (7)
HISTOGRAMS (7)
IMAGE COLOR ANALYSIS (7)
IMAGE SEGMENTATION (7)
ITERATIVE METHODS (7)
MPI (7)
OPTIMISATION (7)
PARTITIONING ALGORITHMS (7)
PIPELINES (7)
RADIATION DETECTORS (7)
SHAPE (7)
SIMD (7)
more

INFONA - science communication portal

Search results

Accelerating outlier detection with intra- and inter-node parallelism

Maximizing diversity in CPUs: Using GPUs as coprocessors to achieve safety integrity

Agent-based mood spread diffusion model for GPU

Efficient String Sorting on Multi - and Many-Core Architectures

Full-stream architecture for ray tracing with efficient data transmission

A Parallel Implementation of the Durand-Kerner Algorithm for Polynomial Root-Finding on GPU

CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters

GPU-accelerated computation for texture features using OpenCL framework

GPU-based timing-aware test generation for small delay defects

CoAdELL: Adaptivity and Compression for Improving Sparse Matrix-Vector Multiplication on GPUs

Acceleration of a Python-Based Tsunami Modelling Application via CUDA and OpenHMPP

Using GPU Shared Memory with a Directive-Based Approach

Transparent GPU Execution of NumPy Applications

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Resource Centered Computing Delivering High Parallel Performance

CUDA Memory Techniques for Matrix Multiplication on Quadro 4000

A CUDA Based Implementation of Locally-and Feature-Adaptive Diffusion Based Image Denoising Algorithm

OpenCL implementation of unsharp filtering on GPU and FPGA

A parallel clustering algorithm for placement

Parallel graph coloring algorithms on the GPU using OpenCL

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options