Search results

Items from 1 to 20 out of 29 results

chapter

The Arch Project: Physics Mini-Apps for Algorithmic Exploration and Evaluating Programming Environments on HPC Architectures

Matthew Martineau, Simon McIntosh-Smith

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 850 - 857

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The arch project is a suite of mini-apps that have been developed with consistent coding practices, under a common infrastructural layer. Great emphasis has been placed on making the applications concise and easy to manipulate, while capturing the key performance characteristics of their proxied algorithmic classes. The suite is intended for traditional exploration of performance, portability and...

chapter

Creation of a deep convolutional auto-encoder in Caffe

Volodymyr Turchenko, Artur Luczak

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 2 > 651 - 659

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

The development of a deep (stacked) convolutional auto-encoder in the Caffe deep learning framework is presented in this paper. We describe simple principles which we used to create this model in Caffe. The proposed model of convolutional auto-encoder does not have pooling/unpooling layers yet. The results of our experimental research show comparable accuracy of dimensionality reduction in comparison...

chapter

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

Hocine Saadi, Nadia Nouali-Taboudjemat, Abdellatif Rahmoun, Baldomero Imbernon, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 16 - 22

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In the recent literature, drug design relying on molecular docking (MD) techniques is becoming a very promising field. Most of these techniques rely on the way ligands interact with protein target using only one binding site, in addition, they ignore the fact that assorted ligands interact with unconnected parts of the target. However, by taking the latter fact into consideration, the computational...

chapter

Static WCET Analysis of GPUs with Predictable Warp Scheduling

Yijie Huangfu, Wei Zhang

2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC) > 101 - 108

2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC)

The capability of GPUs to accelerate general-purpose applications that can be parallelized into massive number of threads makes it promising to apply GPUs to real-time applications as well, where high throughput and intensive computation are also needed. However, due to the different architecture and programming model of GPUs, the worst-case execution time (WCET) analysis methods and techniques designed...

chapter

An Efficient GPU Parallelization for Arbitrary Collocated Polyhedral Finite Volume Grids and Its Application to Incompressible Fluid Flows

Shashank Jaiswal, Rajesh Reddy, Raja Banerjee, Shingo Sato, more

2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW) > 81 - 89

2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW)

This paper presents GPU parallelization for a computational fluid dynamics solver which works on a mesh consisting of polyhedral cells, where each cell has an arbitrary number of faces and each face has an arbitrary number of vertices. The parallelization is achieved using NVIDIAs compute unified device architecture (CUDA). The developed code specifically targets performance improvement on NVIDIA...

chapter

A comparison of GPU execution time prediction using machine learning and analytical modeling

Marcos Amaris, Raphael Y. de Camargo, Mohamed Dyab, Alfredo Goldman, more

2016 IEEE 15th International Symposium on Network Computing and Applications (NCA) > 326 - 333

2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)

Today, most high-performance computing (HPC) platforms have heterogeneous hardware resources (CPUs, GPUs, storage, etc.) A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The prediction of application execution times over these devices is a great challenge and is essential for efficient job scheduling. There are different approaches...

chapter

Evaluating Multi-core and Many-Core Architectures through Parallelizing a High-Order WENO Solver

Liang Deng, Hanli Bai, Dan Zhao, Fang Wang

2016 IEEE Trustcom/BigDataSE/ISPA > 2167 - 2174

2016 IEEE Trustcom/BigDataSE/ISPA

This paper studies the implementation and optimization of a high-order weighted essentially non-oscillatory (WENO) solver to the solution of the Euler equations on the multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). The implementation of up to ninth-order accurate WENO schemes is used in the solver. For the GPU platform, both...

chapter

Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver

Haohuan Fu, Jingheng Xu, Lin Gan, Chao Yang, more

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 41 - 49

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides...

chapter

An Out-of-Core Method for Physical Simulations on a Multi-GPU Architecture Using Lattice Boltzmann Method

Julien Duchateau, Francois Rousselle, Nicolas Maquignon, Gilles Roussel, more

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 581 - 588

Simulating complex physical phenomena implies the manipulation of an important amount of data. In order to simulate very large simulation domains on a limited computing architecture, such as industrial infrastructures, solutions have to be proposed. In this paper, a new out-of-core method is introduced in order to perform fast physical simulations using a complex Lattice Boltzmann model (LBM) on a...

article

Speedup of Micromagnetic Simulations with C++ AMP on Graphics Processing Units

Ru Zhu

Computing in Science & Engineering > 2016 > 18 > 4 > 53 - 59

A finite-difference micromagnetic solver called Grace uses C++ Accelerated Massive Parallelism (C++ AMP). The high-speed performance of a single GPU is compared against a typical CPU-based solver. The speedup of GPU to CPU is shown to be two orders of magnitude for problems with larger sizes. This solver can run on GPUs from various hardware vendors, such as Nvidia, AMD, and Intel, regardless of whether...

article

Real Data Evaluation of a Crowd Supervising System for Stadium Evacuation and Its Hardware Implementation

Anastasios Tsiftsis, Ioakeim G. Georgoudas, Georgios Ch. Sirakoulis

IEEE Systems Journal > 2016 > 10 > 2 > 649 - 660

The aim of this paper is to develop an integrated electronic system that allows the dynamical management of congestion and provides the fast evaluation of dynamical circumstances. Thus, a cellular-automata-based model is proposed that estimates the movement of individuals. The presented system incorporates a process that allows the efficient camera-based initialization of the model, without any special...

chapter

Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement

Alan Humphrey, Daniel Sunderland, Todd Harman, Martin Berzins

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1222 - 1231

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Modeling thermal radiation is computationally challenging in parallel due to its all-to-all physical and resulting computational connectivity, and is also the dominant mode of heat transfer in practical applications such as next-generation clean coal boilers, being modeled by the Uintah framework. However, a direct all-to-all treatment of radiation is prohibitively expensive on large computers systems...

chapter

Evaluating Multi-core and Many-Core Architectures through Accelerating an Alternating Direction Implicit CFD Solver

Liang Deng, Jianbin Fang, Fang Wang, Hanli Bai

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC) > 1 - 10

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC)

In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and...

chapter

GPU implementation of cross-correlation for image generation in real time

Konstantin Kapinchev, Adrian Bradu, Frederick Barnes, Adrian Podoleanu

2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS) > 1 - 6

2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS)

This paper presents an approach for parallel implementation of cross-correlation using the graphics processing unit (GPU). Cross-correlation is a central digital signal processing (DSP) algorithm with applications in many areas. In many cases in real time systems, a sequential implementation of the cross-correlation creates a performance bottleneck and prevents the systems from reaching the real time...

chapter

Parallel Megabase DNA Sequence Comparison with OpenCL

Marco Antonio C. de Figueiredo, Edans F. de O. Sandes, Alba Cristina M. A. de Melo

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 436 - 445

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Biological sequence comparison is a very common task in Bioinformatics applications. Many parallel solutions have been proposed for this problem, using different HPC platforms, programmed usually with platform-specific languages and frameworks. With this approach, it is difficult to port solutions among different platforms such as CPUs and GPUs, for instance. To tackle this problem, this paper proposes...

chapter

Problems of Developing Parallel S of Complex Architecture Based on GPGPU Techno Hybrid Fluid-Based Model of Internet Traffic in Computer Networklogy and their Solutions

Dmitry Basavin, Sergey Porshnev

2015 International Conference on Computational Intelligence and Communication Networks (CICN) > 156 - 160

2015 International Conference on Computational Intelligence and Communication Networks (CICN)

The paper considers problems of developing the parallel hybrid fluid-based model and methods to solve them. The main reasons of falling of GPU performance that had arisen during development and ways to address them are described. The method for describing the structures of networks using routes adjacency matrix is provided. Also several methods to evaluate line matrix summation are considered and...

chapter

GPU accelerated geometric multigrid method: Performance comparison on recent NVIDIA architectures

Iulian Stroia, Lucian Itu, Cosmin Nita, Laszlo Lazar, more

2015 19th International Conference on System Theory, Control and Computing (ICSTCC) > 175 - 179

2015 19th International Conference on System Theory, Control and Computing (ICSTCC)

During the past decade Graphics Processing Units (GPU) have been increasingly employed for speeding up compute intensive scientific applications. In this field, the geometric multigrid method (GMG) is one of the most efficient algorithms for solving large sparse linear systems of equations. Herein we analyze the performance of an optimized GPU based implementation of the GMG method on different state-of-the-art...

chapter

Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System

Victor Martinez, David Michea, Fabrice Dupros, Olivier Aumage, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 1 - 8

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Understanding three-dimensional seismic wave propagation in complex media is still one of the main challenges of quantitative seismology. Because of its simplicity and numerical efficiency, the finite-differences method is one of the standard techniques implemented to consider the elastodynamics equation. Additionally, this class of modeling heavily relies on parallel architectures in order to tackle...

chapter

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often...

chapter

Adaptive parallel simulation of a two-timescale model for apoptotic receptor-clustering on GPUs

Alexander Scholl, Claus Braun, Markus Daub, Guido Schneider, more

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 424 - 431

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Computational biology contributes important solutions for major biological challenges. Unfortunately, most applications in computational biology are highly compute-intensive and associated with extensive computing times. Biological problems of interest are often not treatable with traditional simulation models on conventional multi-core CPU systems. This interdisciplinary work introduces a new multi-timescale...

Keywords:
MATHEMATICAL MODEL
COMPUTER ARCHITECTURE

Publication date

Set your own date range

Publication type

book (27)
article (2)

Keywords

COMPUTATIONAL MODELING (14)
CUDA (9)
GPU (8)
INSTRUCTION SETS (8)
KERNEL (5)
EQUATIONS (4)
MICROPROCESSORS (4)
PROGRAMMABILITY (4)
ATMOSPHERIC MODELING (3)
BIOLOGICAL SYSTEM MODELING (3)
FIELD PROGRAMMABLE GATE ARRAYS (3)
GPGPU (3)
GPU COMPUTING (3)
OPENACC (3)
OPTIMIZATION (3)
PERFORMANCE (3)
PROGRAMMING (3)
ACCELERATION (2)
ANALYTICAL MODELS (2)
BRIDGES (2)
HARDWARE (2)
HEAT TRANSFER (2)
HPC (2)
IVY BRIDGE (2)
MACHINE LEARNING (2)
MESSAGE SYSTEMS (2)
OPENCL (2)
OPTIMIZATION TECHNIQUES (2)
PARALLEL COMPUTING (2)
PERFORMANCE EVALUATION (2)
PROTEINS (2)
XEON PHI (2)
ADAPTATION MODELS (1)
ADAPTIVE EULER-MARUYAMA APPROXIMATION (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALTERNATING DIRECTION IMPLICIT (1)
AVX (1)
BANDWIDTH (1)
BIOLOGICAL SEQUENCE COMPARISON (1)
BIOMEMBRANES (1)
BLIND DOCKING (1)
BOILERS (1)
BSP MODEL (1)
C++ AMP (1)
CARTESIAN GIRD (1)
CELLULAR AUTOMATA (CA) (1)
CENTRAL PROCESSING UNIT (1)
CFD (1)
CFD SOLVER (1)
COMPUTATIONAL FLUID DYNAMICS (1)
COMPUTATIONAL SIMULATION (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER NETWORK (1)
COMPUTER NETWORKS OF COMPLEX ARCHITECTURE (1)
CONTROL ENGINEERING COMPUTING (1)
COOLING (1)
COOLING CONTROL (1)
COPROCESSORS (1)
CORRELATION (1)
COST ACCOUNTING (1)
COST-EFFICIENT COMPUTING ARCHITECTURE (1)
CROSS-CORRELATION (1)
CROWD MODELING (1)
DATA MODELS (1)
DECODING (1)
DEEP CONVOLUTIONAL AUTO-ENCODER (1)
DELAYS (1)
DEMAGNETIZATION (1)
DESOLVATION ENERGY (1)
DIGITAL SIGNAL PROCESSING (1)
DIMENSIONALITY REDUCTION (1)
DNA (1)
DRUG DISCOVERY (1)
DYNAMIC SCHEDULING (1)
ENERGY EFFICIENCY (1)
EULER EQUATIONS (1)
EULER SOLVER (1)
EVACUATION (1)
EXPLICIT FINITE DIFFERENCE OPTION PRICING MODELS (1)
FIELD-PROGRAMMABLE GATE ARRAY (FPGA) (1)
FINANCIAL COMPUTATION (1)
FINITE DIFFERENCE METHOD (1)
FINITE DIFFERENCE METHODS (1)
FLOATING POINT OPERATIONS (1)
FPGA IMPLEMENTATION (1)
FUNDAMENTAL DIAGRAM (1)
GEOMETRIC MULTIGRID (1)
GPU ACCELERATION (1)
GPU ARCHITECTURES (1)
GPU FOURIER METHOD (1)
GPUS (1)
GRAPHICAL PROCESSOR UNITS (GPUS) (1)
GRAPHICS PROCESSING UNIT (1)
HEAT SYSTEMS (1)
HETEROGENEOUS ARCHITECTURES (1)
HETEROGENEOUS COMPUTING (1)
HEURISTIC ALGORITHMS (1)
more

INFONA - science communication portal

Search results

The Arch Project: Physics Mini-Apps for Algorithmic Exploration and Evaluating Programming Environments on HPC Architectures

Creation of a deep convolutional auto-encoder in Caffe

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

Static WCET Analysis of GPUs with Predictable Warp Scheduling

An Efficient GPU Parallelization for Arbitrary Collocated Polyhedral Finite Volume Grids and Its Application to Incompressible Fluid Flows

A comparison of GPU execution time prediction using machine learning and analytical modeling

Evaluating Multi-core and Many-Core Architectures through Parallelizing a High-Order WENO Solver

Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver

An Out-of-Core Method for Physical Simulations on a Multi-GPU Architecture Using Lattice Boltzmann Method

Speedup of Micromagnetic Simulations with C++ AMP on Graphics Processing Units

Real Data Evaluation of a Crowd Supervising System for Stadium Evacuation and Its Hardware Implementation

Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement

Evaluating Multi-core and Many-Core Architectures through Accelerating an Alternating Direction Implicit CFD Solver

GPU implementation of cross-correlation for image generation in real time

Parallel Megabase DNA Sequence Comparison with OpenCL

Problems of Developing Parallel S of Complex Architecture Based on GPGPU Techno Hybrid Fluid-Based Model of Internet Traffic in Computer Networklogy and their Solutions

GPU accelerated geometric multigrid method: Performance comparison on recent NVIDIA architectures

Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Adaptive parallel simulation of a two-timescale model for apoptotic receptor-clustering on GPUs

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options