Search results

Items from 1 to 20 out of 29 results

chapter

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Mohsen Kiani, Amir Rajabzadeh

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE) > 260 - 265

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)

Performance modeling plays an important role for optimal hardware design and optimized application implementation. This paper presents a very low overhead performance model, called VLAG, to approximate the data localities exploited by GPU kernels. VLAG receives source code-level information to estimate per memory-access instruction, per data array, and per kernel localities within GPU kernels. VLAG...

chapter

Resilience for Stencil Computations with Latent Errors

Aiman Fang, Aurelien Cavelan, Yves Robert, Andrew A. Chien

2017 46th International Conference on Parallel Processing (ICPP) > 581 - 590

2017 46th International Conference on Parallel Processing (ICPP)

Projections and measurements of error rates in near-exascale and exascale systems suggest a dramatic growth, due to extreme scale (10^9 cores), concurrency, software complexity, and deep submicron transistor scaling. Such a growth makes resilience a critical concern, and may increase the incidence of errors that "escape", silently corrupting application state. Such errors can often be revealed...

chapter

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Satyajit Das, Davide Rossi, Kevin J. M. Martin, Philippe Coussy, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded...

chapter

Directive-Based Pipelining Extension for OpenMP

Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-Chun Feng

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 481 - 484

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...

chapter

Exploring pipe implementations using an OpenCL framework for FPGAs

Vincent Mirian, Paul Chow

2015 International Conference on Field Programmable Technology (FPT) > 112 - 119

2015 International Conference on Field Programmable Technology (FPT)

In the last decade, OpenCL has sparked the interest of the computing world as it is a language based on an open standard that can run on many different heterogeneous platforms. This standard is continuously evolving to adapt to various use cases of different platforms. For example, with requests from the FPGA community, the pipe construct was added to the standard to facilitate the implementation...

chapter

AnalyzeThis: an analysis workflow-aware storage system

Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

The need for novel data analysis is urgent in the face of a data deluge from modern applications. Traditional approaches to data analysis incur significant data movement costs, moving data back and forth between the storage system and the processor. Emerging Active Flash devices enable processing on the flash, where the data already resides. An array of such Active Flash devices allows us to revisit...

chapter

Directive-Based Parallelization of the NIM Weather Model for GPUs

Mark Govett, Jacques Middlecoff, Tom Henderson

2014 First Workshop on Accelerator Programming using Directives > 55 - 61

2014 First Workshop on Accelerator Programming using Directives (WACCPD)

The NIM is a performance-portable model that runs on CPU, GPU and MIC architectures with a single source code. The single source plus efficient code design allows application scientists to maintain the Fortran code, while computer scientists optimize performance and portability using OpenMP, OpenACC, and F2CACC directives. The F2C-ACC compiler was developed in 2008 at NOAA's Earth System Research...

chapter

A Model of Microkernel Based on Spatial-Temporal Isolation in Haskell

Fan Zhang, Xiaopeng Wang

2013 International Conference on Computer Sciences and Applications > 564 - 569

2013 International Conference on Computer Sciences and Applications (CSA)

The safety and security of kernel is the key to the security of the embedded system and we even have to formal verification the kernel in the field of safety-critical embedded applications. In this paper we introduce a design and implementation of the modeling of micro kernel based on spatial-temporal isolation in Haskell which is a functional language. This not only could significantly improve the...

chapter

Accelerating a novel particle-based fluid simulation on the GPU

Zhilu Chen, James Kingsley, Xinming Huang, Erkan Tuzel

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. when using a large array of particles...

article

Fast Sparse Level Sets on Graphics Hardware

Andrei C. Jalba, Wladimir J. van der Laan, Jos B.T.M. Roerdink

IEEE Transactions on Visualization and Computer Graphics > 2013 > 19 > 1 > 30 - 44

The level-set method is one of the most popular techniques for capturing and tracking deformable interfaces. Although level sets have demonstrated great potential in visualization and computer graphics applications, such as surface editing and physically based modeling, their use for interactive simulations has been limited due to the high computational demands involved. In this paper, we address...

chapter

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Han Dong, Dibyajyoti Ghosh, Fahad Zafar, Shujia Zhou

2012 41st International Conference on Parallel Processing Workshops > 126 - 134

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...

chapter

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Alessandro Pellegrini, Roberto Vitali, Sebastiano Peluso, Francesco Quaglia

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems > 134 - 141

2012 IEEE 20th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)

Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore...

chapter

Optimizing and multithreading SNPHAP on a multi-core APU with OpenCL

Apisit Rattanatranurak, Surin Kittitornkun, Sissades Tongsima

2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE) > 174 - 179

2012 International Joint Conference on Computer Science and Software Engineering (JCSSE)

In this paper, we have optimized and multithreaded SNPHAP, a bioinformatics program, with OpenCL to reduce the computation time and thus accelerate the execution. Our method is called Radix Comparison algorithm running in sequential and parallel (multithreading). Based on the recent multi-core AMD A6-3650 APU (Accelerated Processing Unit), the achieveable Speedups of Sequential Radix and Parallel...

chapter

Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays

H. Carter Edwards, Daniel Sunderland, Chris Amsler, Sam Mish

2011 IEEE International Conference on Cluster Computing > 363 - 370

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The...

chapter

Multiphase LBM Distributed over Multiple GPUs

Carlos Rosales

2011 IEEE International Conference on Cluster Computing > 1 - 7

2011 IEEE International Conference on Cluster Computing (CLUSTER)

A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm...

chapter

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Bogdan Vacaliuc, Dilip R. Patlolla, Ed. D'Azevedo, Greg G. Davidson, more

2011 Symposium on Application Accelerators in High-Performance Computing > 159 - 167

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve...

chapter

Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case

Myungho Lee, Jin-hong Jeon, Joonsuk Kim, Joonhyun Song

2010 10th IEEE International Conference on Computer and Information Technology > 1323 - 1327

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

The architecture of the latest Graphic Processing Unit (GPU) consists of a number of uniform programmable units integrated on the same chip, which facilitate the general-purpose computing beyond the graphic processing. With the multiple programmable units executing in parallel, the latest GPU shows superior performance for many non-graphic applications. Furthermore, programmers can have a direct control...

chapter

High performance Molecular Dynamic simulation on single and multi-GPU systems

O Villa, Long Chen, S Krishnamoorthy

Proceedings of 2010 IEEE International Symposium on Circuits and Systems > 3805 - 3808

2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010

The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques supported and employed on these GPUs and Multi-GPUs systems are not sufficient to address problems exhibiting irregular, and unbalanced workload such as Molecular Dynamic (MD) simulations of systems with non-uniform densities. In this paper, we propose...

chapter

Simulating anomalous diffusion on graphics processing units

Karl Heinz Hoffmann, Michael Hofmann, Jens Lang, Gudula Runger, more

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 8

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

The computational power of modern graphics processing units (GPUs) has become an interesting alternative in high performance computing. The specialized hardware of GPUs delivers a high degree of parallelism and performance. Various applications in scientific computing have been implemented such that computationally intensive parts are executed on GPUs. In this article, we present a GPU implementation...

chapter

A Parallel Immune Algorithm Based on Fine-Grained Model with GPU-Acceleration

Jianming Li, Lihua Zhang, Linlin Liu

2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC) > 683 - 686

2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC 2009)

Fine-grained parallel immune algorithm (FGIA), though a popular and robust strategy for solving complicated optimization problems, is sometimes inconvenient to use as its population size is restricted by heavy data communication and the parallel computers are relatively difficult to use, manage, maintain and may not be accessible to most researchers. In this paper, we propose a FGIA method based on...

Data set:
ieee
Keywords:
KERNEL
COMPUTATIONAL MODELING
ARRAYS

Publication date

Set your own date range

Publication type

book (26)
article (3)

Keywords

GRAPHICS PROCESSING UNIT (10)
INSTRUCTION SETS (10)
GPU (7)
GRAPHICS PROCESSING UNITS (6)
CUDA (5)
COMPUTER GRAPHICS (4)
COPROCESSORS (4)
GRAPHICS (4)
MATHEMATICAL MODEL (4)
PARALLEL PROCESSING (4)
COMPUTER GRAPHIC EQUIPMENT (3)
PARALLEL PROGRAMMING (3)
PROGRAMMING (3)
SEARCH PROBLEMS (3)
ACCELERATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ALGORITHMS (2)
APPLICATION PROGRAM INTERFACES (2)
BANDWIDTH (2)
BENCHMARK TESTING (2)
EDUCATIONAL INSTITUTIONS (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
GENETIC ALGORITHM (2)
GENETIC ALGORITHMS (2)
GPGPU (2)
GPU PROGRAMMING (2)
GRAPHIC PROCESSING UNIT (2)
HIGH PERFORMANCE COMPUTING (2)
METEOROLOGY (2)
MONTE CARLO METHODS (2)
MULTIPROCESSING SYSTEMS (2)
OPTIMIZATION (2)
PARALLEL (2)
PERFORMANCE EVALUATION (2)
SCIENTIFIC COMPUTING (2)
SYNCHRONIZATION (2)
TILES (2)
VISUALIZATION (2)
YARN (2)
ABSTRACTS (1)
ANALYTICAL MODELS (1)
ANOMALOUS DIFFUSION SIMULATION PROCESS (1)
APPLICATION-BASED FAULT TOLERANCE (1)
APPLICATIONS RECODING (1)
ARTIFICIAL IMMUNE SYSTEMS (1)
ARTIFICIAL INTELLIGENCE (1)
ARTIFICIAL LIFE (1)
BIOINFORMATICS (1)
BIOLOGICAL SYSTEM MODELING (1)
BLOCKING (1)
BLOCKING STORAGE FORMAT (1)
C LANGUAGE (1)
C4I (1)
CACHE MEMORY (1)
CFD (1)
CHEMISTRY COMPUTING (1)
CITIES AND TOWNS (1)
CODE STRUCTURE (1)
COLLECTIVE BEHAVIOR (1)
COLLECTIVE MOTION (1)
COMPILER; ACCELERATOR; MULTICORE; GPGPU; PARALLELIZATION; OPENACC; NUMERICAL WEATHER PREDICTION (1)
COMPLEXITY (1)
COMPUTATIONAL COST (1)
COMPUTATIONAL POWER (1)
COMPUTER ARCHITECTURE (1)
COMPUTERS (1)
CUDA API (1)
CUDA PROGRAMMING MODEL (1)
CUDA SCHEDULER (1)
DATA ANALYSIS (1)
DATA COMMUNICATION (1)
DATA MINING (1)
DATA MODELS (1)
DATA STRUCTURE (1)
DATA STRUCTURES (1)
DATA TRANSFER (1)
DATA VISUALISATION (1)
DECENTRALIZED SYNCHRONIZATION (1)
DECISION THEORY (1)
DENSITY ESTIMATION (1)
DETECTORS (1)
DIFFUSION (1)
DIGITAL SIMULATION (1)
DIRECT NUMERICAL SIMULATION (1)
DISCRETE FOURIER TRANSFORMS (1)
DISTRIBUTED (1)
DISTRIBUTED MEMORY PARALLEL ARCHITECTURE PROGRAMMING (1)
DISTRIBUTED MEMORY SYSTEMS (1)
DWARF CODES (1)
EMBEDDED APPLICATION (1)
ENS PROBLEMS (1)
EQUATIONS (1)
ERROR ANALYSIS (1)
ESTIMATION (1)
EXECUTION-DRIVEN APPROACH (1)
EXPECTED TIME (1)
EYE MOVEMENT INFOMAX MODEL (1)
more

INFONA - science communication portal

Search results

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Resilience for Stencil Computations with Latent Errors

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Directive-Based Pipelining Extension for OpenMP

Exploring pipe implementations using an OpenCL framework for FPGAs

AnalyzeThis: an analysis workflow-aware storage system

Directive-Based Parallelization of the NIM Weather Model for GPUs

A Model of Microkernel Based on Spatial-Temporal Isolation in Haskell

Accelerating a novel particle-based fluid simulation on the GPU

Fast Sparse Level Sets on Graphics Hardware

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Optimizing and multithreading SNPHAP on a multi-core APU with OpenCL

Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays

Multiphase LBM Distributed over Multiple GPUs

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case

High performance Molecular Dynamic simulation on single and multi-GPU systems

Simulating anomalous diffusion on graphics processing units

A Parallel Immune Algorithm Based on Fine-Grained Model with GPU-Acceleration

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options