Search results

chapter

Productivity improvement of tools for graphical evaluation of stochastic transformation results

Anton O. Prokofiev, Elena R. Skorokhodova, Ekaterina D. Smirnova

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) > 533 - 536

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

Various aspects of improving the efficiency of graphical tests used to explore the results of stochastic transformations are discussed in the article. A method to improve performance of stochastic data processing using hybrid computing technologies and comparative analysis of the original tests and the tests with the proposed improvements are presented. Potential areas for improvement of the graphical...

chapter

CUDA-Sankoff: Using GPU to Accelerate the Pairwise Structural RNA Alignment

Daniel Sundfeld, Jakob H. Havgaard, Jan Gorodkin, Alba C. M. A. de Melo

2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) > 295 - 302

2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

In this paper, we propose and evaluate CUDASankoff, a solution to the RNA structural alignment problem based on the Sankoff algorithm in Graphics Processing Units (GPUs). To our knowledge, this is the first time the Sankoff algorithm is implemented in GPU. In our solution, we show how to linearize the Sankoff 4-dimensional dynamic programming (4D DP) matrix and we propose a two-level wavefront approach...

chapter

The distribution in space test for quality evaluation of pseudorandom numbers generators

Anton O. Prokofiev, Dmitry V. Denisov, Andrey V. Chirkin

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) > 529 - 532

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

The article considers one of the most demonstrative graphical methods of quality evaluation of pseudorandom numbers generators. Existing approaches to improve this method are described. The method, which allows increasing of the amount of useful information obtained through testing, is presented. Results of increase in productivity of the test using hybrid computing technologies are considered.

chapter

Frequency-domain based Reed Solomon decoders for GPU

Arul K. Subbiah, Tokunbo Ogunfunmi

2017 IEEE International Conference on Consumer Electronics (ICCE) > 352 - 354

2017 IEEE International Conference on Consumer Electronics (ICCE)

Recent development and popularity of the Graphical Processing Unit (GPU) has attracted researchers to utilize it for error correction codes like Reed Solomon (RS). In this paper, we have proposed an efficient implementation of the RS decoder based on Frequency-Domain analysis. This decoder employs the Finite-field Fast Fourier Transform (FFFT) to convert the received code in Frequency-Domain; the...

chapter

GMRCube: A GPGPU accelerated MapReduce DataCube construction model

Ghanshyam Verma, Priyanka Tripathi

2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT) > 513 - 517

2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

GMRCube is a MapReduce based data cube construction model, which utilizes the GPU compute time to reduce its compute time. The model is designed for optimum utilization of the combined GPU-CPU compute capabilities. The paper presents the dataflow of the model, its algorithm along with a detailed explanation. The model was tested on multi dimensional data ranging from 3 to 7 dimensions, and tuples...

chapter

CUDA-Based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation

Si-Yuan Jing, Gao-Rong Yan, Xing-Yuan Chen, Peng Jin, more

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) > 189 - 194

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Word alignment is a basic task in natural language processing and it usually serves as the starting point when building a modern statistical machine translation system. However, the state-of-art parallel algorithm for word alignment is still time-consuming. In this work, we explore a parallel implementation of word alignment algorithm on Graphics Processor Unit (GPU), which has been widely available...

chapter

Performance and Portability Studies with OpenACC Accelerated Version of GTC-P

Yueming Wei, Yichao Wang, Linjin Cai, William Tang, more

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) > 13 - 18

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Accelerator-based heterogeneous computing is of paramount importance to High Performance Computing. The increasing complexity of the cluster architectures requires more generic, high-level programming models. OpenACC is a directive-based parallel programming model, which provides performance on and portability across a wide variety of platforms, including GPU, multicore CPU, and many-core processors...

chapter

To use or not to use: CPUs' cache optimization techniques on GPGPUs

D.R.V.L.B. Thambawita, Roshan G. Ragel, Dhammike Elkaduwe

2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS) > 1 - 6

2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS)

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work...

chapter

GPU-Accelerated Solution of Activated Sludge Model's System of ODEs with a High Degree of Stiffness

Jamal Alikhania, Arash Massoudiehb, Ujjal K. Bhowmika

2016 International Conference on Computational Science and Computational Intelligence (CSCI) > 555 - 560

2016 International Conference on Computational Science and Computational Intelligence (CSCI)

Simulation of activated sludge model (ASM) including detailed biokinetic reaction network often requires the solution of a large system of ordinary differential equations (ODEs) at each time frame, which requires long computing times. In this study, an adaptive time step backward differentiation formula (BDF) is proposed to solve the ASM's system of ODEs that mainly contains a high degree of stiffness...

chapter

Pattern classification using updated fuzzy hyper-line segment neural network and it's GPU parallel implementation for large datasets using CUDA

Priyadarshan Dhabe, Prashant Vyas, Devrat Ganeriwal, Aditya Pathak

2016 International Conference on Computing, Analytics and Security Trends (CAST) > 24 - 29

2016 International Conference on Computing, Analytics and Security Trends (CAST)

Fuzzy hyper-line segment neural network (FHLSNN) is a hybrid system of fuzzy logic and neural network and is used for pattern classification. It learns patterns in terms of n-dimensional hyper line segment (HLS). Modified fuzzy hyperline segment neural network (MFHLSNN) is a modified version of FHLSNN that improves the quality of reasoning and recall time per pattern using modified fuzzy membership...

chapter

Parallelization of unit propagation algorithm for SAT-based ATPG of digital circuits

Lamya G Ali, Aziza I Hussein, Hanafy M Ali

2016 28th International Conference on Microelectronics (ICM) > 184 - 188

2016 28th International Conference on Microelectronics (ICM)

The recent enhancements in Boolean Satisfiability solving has made SAT solvers a core engine for many real world applications especially for Automatic Test Pattern Generation (ATPG) in digital circuits. The majority of solving time is spent on iteratively propagating variable assignments that are inferred by decisions, so the Unit propagation (UP) is the most significant part in the Satisfiability...

chapter

An Efficient GPU Parallelization for Arbitrary Collocated Polyhedral Finite Volume Grids and Its Application to Incompressible Fluid Flows

Shashank Jaiswal, Rajesh Reddy, Raja Banerjee, Shingo Sato, more

2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW) > 81 - 89

2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW)

This paper presents GPU parallelization for a computational fluid dynamics solver which works on a mesh consisting of polyhedral cells, where each cell has an arbitrary number of faces and each face has an arbitrary number of vertices. The parallelization is achieved using NVIDIAs compute unified device architecture (CUDA). The developed code specifically targets performance improvement on NVIDIA...

chapter

PCAFP for Solving CNOP in Double-Gyre Variation and Its Parallelization on Clusters

Shijin Yuan, Mi Li, Bin Mu, Jingpeng Wang

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 284 - 291

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Double-gyre ocean circulation is a typical phenomenon in the northern mid-latitude ocean basins. Its low-frequency variability significantly influences on both ocean and climate. To enhance its predictability, the finding of optimal initial perturbation which can trigger the double-gyre variation is important. CNOP method is adopted to calculate the optimal initial perturbation and this method has...

chapter

Using Remote Accelerators to Improve the Performance of the FFTW Library

Santiago Mislata, Federico Silla

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 913 - 920

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Hardware accelerators have forced a change in high performance computing. Their use has enabled an increment in the performance of data centers. For this reason, developers have decided to port many applications belonging to diverse science fields, such as biology or chemistry, to hardware accelerators like GPUs (Graphics Processing Units). Nevertheless, not all the applications have been able to...

chapter

Constructing a GPU cluster platform based on multiple NVIDIA Jetson TK1

Kuan-Yu Yeh, Hui-Jun Cheng, Jin Ye, Jyh-Da Wei, more

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 917 - 922

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

High-end graphics processing units (GPUs), such as NVIDIA Fermi/Tesla series cards, are widely applied to the high performance computing fields in a decade. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) is an embedded board with low cost, low power consumption, and high applicability advantages for several specific applications. In...

chapter

Extending rCUDA with Support for P2P Memory Copies between Remote GPUs

Carlos Reano, Federico Silla

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 789 - 796

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Although GPUs are being widely adopted in order to noticeably reduce the execution time of many applications, their use presents several side effects such as an increased acquisition cost of the cluster nodes or an increased overall energy consumption. To address these concerns, GPU virtualization frameworks could be used. These frameworks allow accelerated applications to transparently use GPUs located...

chapter

dMath: Linear Algebra for Scaleout GP-GPUs

Steven Eliuk, Cameron Upright, Anthony Skjellum

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 647 - 654

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

A new scalable parallel math library, dMath, is presented that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms. These include matrix multiplication, convolutions, and others allowing for rapid development of scalable applications, including Deep...

chapter

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Yasuhiro Ohno, Shin Morishima, Hiroki Matsutani

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) > 791 - 799

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)

Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive...

chapter

Performance Evaluation of the NVIDIA Pascal GPU Architecture: Early Experiences

Carlos Reano, Federico Silla

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 1234 - 1235

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

With the introduction of the new NVIDIA Pascal GPU architecture, the need to evaluate its real performance in HPC environments arises. In this paper we briefly present some preliminary results. Compared to its predecessors, the new architecture clearly shows a great improvement.

chapter

Accelerated Processing Unit (APU) potential: N-body simulation case study

Hassan Youness, Mohamed Moness, Omar Shaaban, Aziza I. Hussein

2016 11th International Conference on Computer Engineering & Systems (ICCES) > 110 - 115

2016 11th International Conference on Computer Engineering & Systems (ICCES)

This paper investigates and studies the acceleration of irregular/regular algorithms via Integrate Graphic Processing Unit (Integrated GPU) known as Accelerated Processing Unit (APU) that is fused on the same die with the CPU, and Discrete Graphic Processing Unit (GPU), while answering the question of How potential is the APU for applications with iregular data structures such as trees knowing that...

INFONA - science communication portal

Search results

Productivity improvement of tools for graphical evaluation of stochastic transformation results

CUDA-Sankoff: Using GPU to Accelerate the Pairwise Structural RNA Alignment

The distribution in space test for quality evaluation of pseudorandom numbers generators

Frequency-domain based Reed Solomon decoders for GPU

GMRCube: A GPGPU accelerated MapReduce DataCube construction model

CUDA-Based Parallel Implementation of IBM Word Alignment Algorithm for Statistical Machine Translation

Performance and Portability Studies with OpenACC Accelerated Version of GTC-P

To use or not to use: CPUs' cache optimization techniques on GPGPUs

GPU-Accelerated Solution of Activated Sludge Model's System of ODEs with a High Degree of Stiffness

Pattern classification using updated fuzzy hyper-line segment neural network and it's GPU parallel implementation for large datasets using CUDA

Parallelization of unit propagation algorithm for SAT-based ATPG of digital circuits

An Efficient GPU Parallelization for Arbitrary Collocated Polyhedral Finite Volume Grids and Its Application to Incompressible Fluid Flows

PCAFP for Solving CNOP in Double-Gyre Variation and Its Parallelization on Clusters

Using Remote Accelerators to Improve the Performance of the FFTW Library

Constructing a GPU cluster platform based on multiple NVIDIA Jetson TK1

Extending rCUDA with Support for P2P Memory Copies between Remote GPUs

dMath: Linear Algebra for Scaleout GP-GPUs

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Performance Evaluation of the NVIDIA Pascal GPU Architecture: Early Experiences

Accelerated Processing Unit (APU) potential: N-body simulation case study

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options