Search results

Items from 1 to 15 out of 15 results

chapter

GPGPU vs multiprocessor SPSO implementations to solve electromagnetic optimization problems

Anton Duca, Laurentiu Duca, Gabriela Ciuprina, Daniel Ioan

2015 7th International Joint Conference on Computational Intelligence (IJCCI) > 1 > 64 - 73

2015 7th International Joint Conference on Computational Intelligence (IJCCI)

This paper studies two parallelization techniques for the implementation of a SPSO algorithm applied to optimize electromagnetic field devices, GPGPU and Pthreads for multiprocessor architectures. The GPGPU and Pthreads implementations are compared in terms of solution quality and speed up. The electromagnetic optimization problems chosen for testing the efficiency of the parallelization techniques...

chapter

GPU Solver for Systems of Linear Equations with Infinite Precision

J. Khun, I. imeeek, R. Lorencz

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 121 - 124

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using...

chapter

SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers

Jiwei Liu, Jun Yang, Rami Melhem

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 383 - 394

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...

chapter

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Felix Schmitt, Robert Dietrich, Guido Juckeland

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 908 - 915

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...

chapter

Auto-parallelization of data structure operations for GPUs

Rupesh Nasre

2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) > 1 - 10

2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

We present an auto-parallelization technique for generating GPU implementation of data-structure operations from a sequential spec-ification. The technique partitions the data-structure operations into barrier-separated phases such that each phase executes only homogeneous operations. Homogeneity is dictated by the method type, which is derived from the specification. Two key aspects of our technique...

chapter

Data-Driven Versus Topology-driven Irregular Computations on GPUs

Rupesh Nasre, Martin Burtscher, Keshav Pingali

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 463 - 474

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an activity, usually touches only a...

chapter

Parallel one- and two-dimensional FFTs on GPGPUs

Mehrdad Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang

Anti-counterfeiting, Security, and Identification > 1 - 5

2012 International Conference on Anti-Counterfeiting, Security and Identification (2012 ASID)

This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4...

chapter

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Pedro J. Martin, Luis F. Ayuso, Roberto Torres, Antonio Gavilanes

2012 International Conference on High Performance Computing & Simulation (HPCS) > 511 - 519

2012 International Conference on High Performance Computing & Simulation (HPCS)

Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...

chapter

A novel approach for indexing Arabic documents through GPU computing

Nermine N. Sophoclis, M. Abdeen, El-Sayed M. El-Horbaty, M. Yagoub

2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) > 1 - 4

2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)

In contrast to English search engines, Arabic search engines did not have their fair share in modern studies despite the continuous growth of Arabic Internet users and data. Towards bridging the gap, this paper presents a novel indexing algorithm customized for Arabic documents. Our algorithm exploits the characteristics of the Arabic language to enhance indexing and lookup. Additionally, the algorithm...

chapter

An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture

Hung-Fu Li, Tyng-Yeu Liang, Jhen-Lin Jiang

2011 Third International Conference on Intelligent Networking and Collaborative Systems > 209 - 216

2011 Third International Conference on Intelligent Networking and Collaborative Systems (INCoS)

Hybrid CPU/GPU computing architecture has received great attention from the researchers of high performance computing. This new architecture provides higher computation performance than that uses only CPUs for data computation. However, the programming on this computing architecture is not easy for programmers since they have to learn the programming APIs of GPU and handle data communication between...

chapter

GPGPU-based data parallel region growing algorithm for cell nuclei detection

Sandor Szenasi, Zoltan Vamossy, Miklos Kozlovszky

2011 IEEE 12th International Symposium on Computational Intelligence and Informatics (CINTI) > 493 - 499

2011 IEEE 12th International Symposium on Computational Intelligence and Informatics (CINTI)

Nowadays microscopic analysis of tissue samples is done more and more by using digital imagery and special immunodiagnostic software. These are typically specific applications developed for one distinct field, but some subroutines are commonly repeated, for example several applications contain steps that can detect cell nuclei in a sample image. The aim of our research is developing a new data parallel...

chapter

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Marwa Elteir, Heshan Lin, Wu-Chun Feng

2011 IEEE International Conference on Cluster Computing > 234 - 243

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially...

chapter

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

R Dietrich, T Ilsche, G Juckeland

2010 39th International Conference on Parallel Processing Workshops > 135 - 143

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...

chapter

Profiling General Purpose GPU Applications

B.R. Coutinho, G.L.M. Teodoro, R.S. Oliveira, D.O.G. Neto, more

2009 21st International Symposium on Computer Architecture and High Performance Computing > 11 - 18

2009 21st International Symposium on Computer Architecture and High Performance Computing. SBAC-PAD 2009

We are witnessing an increasing adoption of GPUs for performing general purpose computation, which is usually known as GPGPU. The main challenge in developing such applications is that they often do not fit in the model required by the graphics processing devices, limiting the scope of applications that may be benefit from the computing power provided by GPUs. Even when the application fits GPU model,...

chapter

Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors

H. Courtecuisse, J. Allard

2009 11th IEEE International Conference on High Performance Computing and Communications > 139 - 147

2009 11th IEEE International Conference on High Performance Computing and Communications (HPCC)

The Gauss-Seidel method is very efficient for solving problems such as tightly-coupled constraints with possible redundancies. However, the underlying algorithm is inherently sequential. Previous works have exploited sparsity in the system matrix to extract parallelism. In this paper, we propose to study several parallelization schemes for fully-coupled systems, unable to be parallelized by existing...

Filter options

Content availability:
Available
Data set:
ieee
Keywords:
SYNCHRONIZATION
KERNEL
GPGPU

Publication date

Set your own date range

Keywords

INSTRUCTION SETS (9)
GRAPHICS PROCESSING UNIT (6)
GRAPHICS PROCESSING UNITS (6)
CUDA (5)
OPTIMIZATION (4)
COMPUTER ARCHITECTURE (3)
ALGORITHM DESIGN AND ANALYSIS (2)
ARRAYS (2)
GPU (2)
LIBRARIES (2)
MPI (2)
OPENMP (2)
PARALLEL PROCESSING (2)
PERFORMANCE ANALYSIS (2)
PROGRAMMING (2)
RADIATION DETECTORS (2)
RUNTIME (2)
YARN (2)
1-D FFT (1)
2-D FFT (1)
ACCELERATION (1)
ACCELERATORS (1)
ALGORITHMIC PROPERTIES (1)
AMD (1)
ARABIC INDEXER (1)
ATOMIC OPERATIONS (1)
AUTO (1)
BIOMEDICAL IMAGE PROCESSING (1)
COMPUTER GRAPHIC EQUIPMENT (1)
COMPUTERS (1)
COPROCESSORS (1)
CRITICAL PATH ANALYSIS (1)
CUDA ENVIRONMENT (1)
CUDA PROGRAMS (1)
DATA MINING (1)
DATA PARALLEL ALGORITHM (1)
DATA STRUCTURES (1)
DATA-DRIVEN (1)
DATA-PARALLEL ALGORITHMS (1)
DEFORMABLE BODY (1)
DENSE MATRIX REPRESENTATION (1)
DENSE MATRIX-VECTOR MULTIPLICATION (1)
DISTRIBUTED/PARALLEL INFORMATION RETRIEVAL (1)
ELECTROMAGNETIC FIELD (1)
ELECTROMAGNETICS (1)
EVENT LOGGING (1)
FFTW (1)
FULLY-COUPLED SYSTEM (1)
GENERAL PURPOSE COMPUTATION (1)
GENERAL PURPOSE GPU APPLICATION (1)
GPU SYNCHRONIZATION (1)
GRAPHICS PROCESSING DEVICES (1)
HARDWARE (1)
HETEROGENEOUS COMPUTING (1)
HIGH DEFINITION VIDEO (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH PERFORMANCE GRAPHIC CARDS (1)
HIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA (1)
HPC APPLICATIONS (1)
HYBRID ARCHITECTURES (1)
HYBRID CPU/GPU COMPUTING ARCHITECTURE (1)
HYBRID SIMULATION (1)
INDEXES (1)
INDEXING (1)
INSTRUMENTS (1)
INTERNET (1)
IRREGULAR ALGORITHMS (1)
ITERATIVE METHODS (1)
LARGE COMPUTING SYSTEMS (1)
LINEAR COMPLEMENTARY PROBLEM (1)
LINEAR PROGRAMMING (1)
MAGNETIC CORES (1)
MANY-CORE PROCESSOR (1)
MAPREDUCE (1)
MATHEMATICAL MODEL (1)
MATHEMATICS COMPUTING (1)
MEDICAL INTERVENTION PLANNING (1)
MEMORY MANAGEMENT (1)
MICROPROCESSORS (1)
MODULAR ARITHMETIC (1)
MONITORING (1)
MONITORING LIBRARIES (1)
MULTICORE CPU (1)
MULTICORE GPU (1)
MULTIPLE WARP SCHEDULERS (1)
MULTIPROCESSING SYSTEMS (1)
NONINTRUSIVE PERFORMANCE ANALYSIS (1)
NUCLEI DETECTION (1)
OPENCL (1)
OPENCL FRAMEWORK (1)
PARALLEL ALGORITHMS (1)
PARALLEL DENSE GAUSS-SEIDEL ALGORITHM (1)
PARALLEL EXECUTION (1)
PARALLEL HARDWARE ACCELERATED APPLICATIONS (1)
PARALLEL REDUCTION (1)
PARALLELIZATION (1)
PERFORMANCE EVALUATION (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options