Search results

chapter

Improved Meshless Method for Simulating Incompressible Fluids on GPU

Andre Luiz Buarque Vieira-e-Silva, Mozart William Santos Almeida, Caio Jose dos Santos Brito, Veronica Teichrieb

2017 19th Symposium on Virtual and Augmented Reality (SVR) > 297 - 308

2017 19th Symposium on Virtual and Augmented Reality (SVR)

Meshless methods to simulate fluid flows have been increasingly evolving through the years since they are a great alternative to deal with large deformations, which is where mesh-based methods fail to perform efficiently. A well known meshless method is the Moving Particle Semi-implicit (MPS) method, which was designed to simulate free-surface truly incompressible fluid flows. Many variations and...

chapter

Bioinformatics tools with deep learning based on GPU

Che-Lun Hung, Chuan Yi Tang

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 1906 - 1908

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Due to the rapid increase in biological data dimension and acquisition rate, the traditional analysis methods are unable to achieve acceptable accuracy. Recently, Deep learning technologies have shown outstanding results in many domains; especially in pattern recognition in the field of bioinformatics. In this paper, we provide background of what deep learning and its frameworks. In addition, we review...

chapter

Optimization of GPU and CPU acceleration for neural networks layers implemented in python

Radu Dogaru, Ioana Dogaru

2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE) > 1 - 6

2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE)

Many neural architectures including RBF, SVM, FSVC classifiers, or deep-learning solutions require the efficient implementation of neurons layers, each of them having a given number of m neurons, a specific set of parameters and operating on a training or test set of N feature vectors having each a dimension n. Herein we investigate how to allocate the computation on GPU kernels and how to better...

chapter

HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems

Joao V.F. Lima, Daniel Di Domenico

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 31 - 36

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination...

chapter

Application-Specific Autonomic Cache Tuning for General Purpose GPUs

Sam Gianelli, Edward Richter, Diego Jimenez, Hugo Valdez, more

2017 International Conference on Cloud and Autonomic Computing (ICCAC) > 104 - 113

2017 International Conference on Cloud and Autonomic Computing (ICCAC)

Cache tuning has been widely studied in CPUs, and shown to achieve substantial energy savings, with minimal performance degradations. However, cache tuning has yet to be explored in General Purpose Graphics Processing Units (GPGPU), which have emerged as efficient alternatives for general purpose high-performance computing. In this paper, we explore autonomic cache tuning for GPGPUs, where the cache...

chapter

Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU

Max Raphael Sobroza Marques, Ghouthi Boukli Hacene, Carlos Eduardo Rosar Kos Lassance, Pierre-Henri Horrein

2017 International Conference on High Performance Computing & Simulation (HPCS) > 553 - 559

2017 International Conference on High Performance Computing & Simulation (HPCS)

Associative memories are models capable to store and retrieve messages given only a part of their content. These systems have been used in several applications such as databases engines, network routers, natural language processing and image recognition due to their error correction capability in pattern retrieving. Recently, Gripon and Berrou introduced a sparse associative memory based on cliques...

chapter

Optimum Power-Performance GPU Configuration Prediction Based on Code Attributes

Ali Jooya, Nikitas Dimopoulos, Amirali Baniasadi

2017 International Conference on High Performance Computing & Simulation (HPCS) > 418 - 425

2017 International Conference on High Performance Computing & Simulation (HPCS)

GPUs have been widely used in the past decade to speed up the execution of general purpose applications with high level of parallelism. The efficiency of running general purpose applications on GPUs depends on how well the processing and memory demands of the application is balanced with the hardware resources available on the target GPU and it can significantly affect the power and performance of...

chapter

High performance 2-D Laplace equation solver through massive hybrid parallelism

M. Usman Ashraf, Fathy Alboraei Eassa, Aiiad Ahmad Albeshri

2017 8th International Conference on Information Technology (ICIT) > 594 - 598

2017 8th International Conference on Information Technology (ICIT)

High Performance Computing (HPC) is a strategical resource that allows research communities and developers to fulfill the processing demand (1 ExaFlops/Sec) for future Exascale Computing system which is expected in the end of current decade. In order to provide an extensive level of performance, many powerful and energy efficient devices (MIC, GPU) and parallel programming models have been proposed...

chapter

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

Jawwad Ahmed Shamsi

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 367 - 374

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant...

chapter

User-friendly interface for GPGPU programming

Hasindu Gamaarachchi, Mohamed Fawsan, Fathima Fasna, Dhammika Elkaduwe

2017 6th National Conference on Technology and Management (NCTM) > 99 - 104

2017 6th National Conference on Technology and Management (NCTM)

Compute Unified Device Architecture (CUDA) is an attractive alternative for our ever growing need for high performance computing. However to extract the full potential of CUDA one should, at the least be familiar with the programming model and should have a fair understanding of the memory and the cache architecture. Yet most of the domain experts from domains that warrant high performance computing...

chapter

Application of hybrid computing technologies for high-performance distributed NFV systems

Mikhail M. Rovnyagin, Alexey A. Kuznetsov

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) > 540 - 543

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

Currently, the areas of transmission and distributed processing technology commonly used network functions virtualization (NFV). A key feature of these solutions is that they can use in the Cloud. Modern cloud infrastructure often has the GPGPU-coprocessors for acceleration purposes. There are a number of problems in which it is possible to use hybrid computing technology to speed up the NFV: encryption,...

chapter

Methods for implementation of Pseudo-Random Number Generator based on GOST R 34.12-2015 in hybrid CPU/GPU/FPGA high-performance systems

Andrey. A. Skitev, Mikhail M. Rovnyagin, Ekaterina N. Martynova, Marina I. Zvyagina, more

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) > 555 - 559

2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

The architecture of high-performance data storage and processing systems has changed considerably. Modern cloud computing systems are often not just a hybrid but also supports hardware acceleration. The paper describes the scope of information security protocols based on PRNG in industrial systems. The work provides a method for implementing GOST R 34.12-2015 Based Pseudo-Random Number Generator in...

chapter

Application Characterization Assisted System Design

I-Hsin Chung, Carlos H. A. Costa, Leopold Grinberg, Hui-Fang Wen

2016 IEEE International Conference on Computer and Information Technology (CIT) > 18 - 25

2016 IEEE International Conference on Computer and Information Technology (CIT)

The understanding of application characteristics such as hardware resource requirements and communication patterns is key in building highly utilized high performance computing systems for target workloads at a reasonable cost and with available technology. The characterization drives the design decision of both hardware and software. Memory access pattern is a key factor as data movement is a major...

chapter

Performance evaluation of image smoothing on CPU and GPU using multithreading — An experimental approach in high performance computing

Gopalakrishnan Sethumadhavan, Senthil Anand Narayanasamy, Anantharaman Gopalakrishnan

2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) > 1 - 5

2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC)

The processing techniques and time consumed during the execution of a task in CPU and GPU vary depending on the technology of the architecture and their configuration. This paper compares the time consumed and the efficiency of CPU and GPU with different architectures and configurations to apply spatial smoothing filters on the images of a fixed spatial resolution. The processing speed was increased...

chapter

Performance of Point and Range Queries for In-memory Databases Using Radix Trees on GPUs

Maksudul Alam, Srikanth B. Yoginath, Kalyan S. Perumalla

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 1493 - 1500

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based...

chapter

A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing

Yuxiang Gao, Peng Zhang

2016 IEEE International Conference on Smart Cloud (SmartCloud) > 170 - 175

2016 IEEE International Conference on Smart Cloud (SmartCloud)

The TOP500 and GREEN500 lists are two major resources to understand and forecast the future architecture design of high performance computing platform. Generally, supercomputer system design can be divided to two parts: single computing node and interconnection. Regardless interconnection, we categorize the systems into two types: homogeneous and heterogeneous, based on single node architecture. While...

chapter

Distributed-Memory Large Deformation Diffeomorphic 3D Image Registration

Andreas Mang, Amir Gholami, George Biros

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 842 - 853

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the...

chapter

Runtime Coordinated Heterogeneous Tasks in Charm++

Michael P. Robson, Ronak Buch, Laxmikant V. Kale

2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2) > 40 - 43

2016 Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)

Effective utilization of the increasingly heterogeneous hardware in modern supercomputers is a significant challenge. Many applications have seen performance gains by using GPUs, but many implementations leave CPUs sitting idle.In this paper, we describe a runtime managed system for coordinating heterogeneous execution. This system manages data transfers to and from GPU devices and schedules work...

chapter

Characterizing Power and Performance of GPU Memory Access

Tyler Allen, Rong Ge

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC) > 46 - 53

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)

Power is a major limiting factor for the future of HPC and the realization of exascale computing under a power budget. GPUs have now become a mainstream parallel computation device in HPC, and optimizing power usage on GPUs is critical to achieving future goals. GPU memory is seldom studied, especially for power usage. Nevertheless, memory accesses draw significant power and are critical to understanding...

chapter

GPU optimization and implementation of Gaussian belief propagation algorithm

Zahra Hosseinidoust, Dennis Giannacopoulos, Warren J. Gross

2016 IEEE Conference on Electromagnetic Field Computation (CEFC) > 1 - 5

2016 IEEE Conference on Electromagnetic Field Computation (CEFC)

We report the utilization of the parallel resources of the graphic processing unit (GPU) to solve sparse systems by optimizing and implementation of a variant of Gaussian belief propagation algorithm for sparse matrices on a Tesla 2070M GPU with the Fermi architecture. The implementation was verified with finite element method data and achieved up to 4× improvement in execution time compared to serial...

INFONA - science communication portal

Search results

Improved Meshless Method for Simulating Incompressible Fluids on GPU

Bioinformatics tools with deep learning based on GPU

Optimization of GPU and CPU acceleration for neural networks layers implemented in python

HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems

Application-Specific Autonomic Cache Tuning for General Purpose GPUs

Large-Scale Memory of Sequences Using Binary Sparse Neural Networks on GPU

Optimum Power-Performance GPU Configuration Prediction Based on Code Attributes

High performance 2-D Laplace equation solver through massive hybrid parallelism

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

User-friendly interface for GPGPU programming

Application of hybrid computing technologies for high-performance distributed NFV systems

Methods for implementation of Pseudo-Random Number Generator based on GOST R 34.12-2015 in hybrid CPU/GPU/FPGA high-performance systems

Application Characterization Assisted System Design

Performance evaluation of image smoothing on CPU and GPU using multithreading — An experimental approach in high performance computing

Performance of Point and Range Queries for In-memory Databases Using Radix Trees on GPUs

A Survey of Homogeneous and Heterogeneous System Architectures in High Performance Computing

Distributed-Memory Large Deformation Diffeomorphic 3D Image Registration

Runtime Coordinated Heterogeneous Tasks in Charm++

Characterizing Power and Performance of GPU Memory Access

GPU optimization and implementation of Gaussian belief propagation algorithm

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options