Search results

Items from 21 to 40 out of 262 results

chapter

Performance and energy efficiency in material science simulation on heterogeneous architectures

Eric Pascolo, Fabio Affinito, Carlo Cavazzoni

2014 International Conference on High Performance Computing & Simulation (HPCS) > 927 - 932

2014 International Conference on High Performance Computing & Simulation (HPCS)

In HPC applications, the energy efficiency is becoming more and more important, due to architectural constraints. It is therefore of primary interest to measure and evaluate the energy efficiency of current architectures using typical HPC workloads. One of the most used and appreciated codes publicly available for computational material science simulation, and largely used in many high end HPC system...

chapter

Efficient Computation of the Phylogenetic Likelihood Function on the Intel MIC Architecture

Alexey M. Kozlov, Christian Goll, Alexandros Stamatakis

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 518 - 527

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Phylogenetic inference is the process of reconstructing the evolutionary history of species based on their traits, nowadays mostly using molecular sequence data. Current state-of-the-art inference methods, like Bayesian and Maximum Likelihood (ML) inference, rely on the Phylogenetic Likelihood Function (PLF) as their computational core. Due to the large number of floating-point operations involved,...

chapter

Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor

Anish Varghese, Bob Edwards, Gaurav Mitra, Alistair P. Rendell

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 984 - 992

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

With energy efficiency and power consumption being the primary impediment in the path to exascale systems, low-power high performance embedded systems are of increasing interest. The Parallella System-on-module (SoM) created by Adapteva combines the Epiphany-IV 64-core coprocessor with a host ARM processor housed in a Zynq System-on-chip. The Epiphany integrates low-power RISC cores on a 2D mesh network...

chapter

Sparse matrix-vector multiply on the Texas Instruments C6678 Digital Signal Processor

Yang Gao, Jason D. Bakos

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors > 168 - 174

2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

The Texas Instruments (TI) C6678 “Shannon” is TI's most recently-released Digital Signal Processor (DSP). Although its original purpose was voice and video encoding and decoding, it may have the potential to become a practical coprocessor for scientific computing. In this paper, we evaluate the C6678 in terms of its programming methodology, performance, and power efficiency. As a case study, we implemented...

chapter

A coarse-grained reconfigurable wavelet denoiser exploiting the Multi-Dataflow Composer tool

Nicola Carta, Carlo Sau, Francesca Palumbo, Danilo Pani, more

2013 Conference on Design and Architectures for Signal and Image Processing > 141 - 148

2013 Conference on Design and Architectures for Signal and Image Processing (DASIP)

In the last few years, efficient resource management turned out to be one of the major challenges for hardware designers. Strategies of reusability through reconfiguration have demonstrated interesting potentials to address it, providing also power and area minimization. The Multi-Dataflow Composer (MDC) tool has been presented to the scientific community to automatically build-up runtime coarse-grained...

chapter

Software-managed automatic data sharing for Coarse-Grained Reconfigurable coprocessors

Toan X. Mai, Jongeun Lee

2012 International Conference on Field-Programmable Technology > 277 - 284

2012 International Conference on Field-Programmable Technology (FPT)

Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate the execution of compute-intensive kernels of applications. However, the data communication overhead between the main processor (MP) and the CGRA may be huge and can negate the speed-up of the CGRA. In this paper we address the problem of reducing the data communication overhead in a hybrid system by...

chapter

A scalable memory interface for multicore reconfigurable computing systems

Philip Garcia, Katherine Compton

2011 International Conference on Field-Programmable Technology > 1 - 8

2011 International Conference on Field-Programmable Technology (FPT 2011)

Embedded multicore devices require high performance with minimal power consumption; many systems use dedicated hardware units to meet these constraints. However, embedded systems have also become increasingly multi-purpose and must be able to execute a wide range of applications — some of which might not yet be known at design time. It is therefore difficult to choose an appropriate mix of dedicated...

chapter

An Architecture for Reconfigurable Multi-core Explorations

Olivier Serres, Vikram K. Narayana, Tarek El-Ghazawi

2011 International Conference on Reconfigurable Computing and FPGAs > 105 - 110

2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)

Multi-core systems are now the norm, and reconfigurable systems have shown substantial benefits over general purpose ones. This paper presents a combination of the two: a fully featured reconfigurable multi-core processor based on the Leon3 processor. The platform has important features like cache coherency, a fully running modern OS (GNU/Linux) and each core has a tightly coupled reconfigurable coprocessor...

chapter

Parallel Fish Swarm Algorithm Based on GPU-Acceleration

Yifan Hu, Baozhong Yu, Jianliang Ma, Tianzhou Chen

2011 3rd International Workshop on Intelligent Systems and Applications > 1 - 4

2011 3rd International Workshop on Intelligent Systems and Applications (ISA)

With the development of Graphics Processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) platform, researchers shift their attentions to general-purpose computing applications with GPU. In this paper, we present a novel parallel approach to run artificial fish swarm algorithm (AFSA) on GPU. Experiments are conducted by running AFSA both on GPU and CPU respectively to optimize four...

chapter

An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs

J Kathiara, M Leeser

2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines > 33 - 36

2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2011)

We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The FPVC is completely autonomous from the embedded processor, exploiting parallelism and exhibiting greater speedup than alternative vector processors. The FPVC supports scalar computation so that loops can be executed independently of the main embedded processor. Floating point addition, multiplication,...

chapter

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

C Gregg, K Hazelwood

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 134 - 144

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code...

chapter

GPU Implementation of the LFT Shape Matching Algorithm

A Kooijman, J Vergeest

2011 Sixth International Symposium on Parallel Computing in Electrical Engineering > 111 - 116

2011 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011)

Registration of partial scan data sets is still a challenge for today's CAD systems and CAD system users. Many of the known methods rely on user interaction or feature recognition. For non-regular users this is too time consuming and error prone. The paper describes a method to register partial scan data by fitting a large fat tetrahedron (LFT) in the target point cloud. The method is computational...

chapter

GPU Accelerated Lanczos Algorithm with Applications

K K Matam, K Kothapalli

2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications > 71 - 76

2011 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2011)

Graphics Processing Units provide a large computational power at a very low price which position them as an ubiquitous accelerator. GPGPU is accelerating general purpose computations using GPU's. GPU's have been used to accelerate many Linear Algebra routines and Numerical Methods. Lanczos is an iterative method well suited for finding the extreme eigenvalues and the corresponding eigenvectors of...

chapter

Optimizing simulated annealing on GPU: A case study with IC floorplanning

Yiding Han, Sanghamitra Roy, Koushik Chakraborty

2011 12th International Symposium on Quality Electronic Design > 1 - 7

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

In this paper, we propose a novel floorplanning algorithm based on simulated annealing on GPUs. Simulated annealing is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Data (SIMD) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floorplan solution space, where we evaluate concurrent moves on a given...

chapter

GridCuda: A Grid-Enabled CUDA Programming Toolkit

Tyng-Yeu Liang, Yu-Wei Chang

2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications > 141 - 146

2011 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2011)

A grid-enabled programming toolkit called GridCuda is proposed in this paper. This programming toolkit provides a platform for users to write programs with the CUDA API, and exploit GPGPU resources available in computational grids to execute their programs. Whenever the CUDA functions in user programs are invoked, they will be transparently redirected to remote GPGPUs for execution by means of remote...

chapter

Fast Two Dimensional Convex Hull on the GPU

S Srungarapu, D P Reddy, K Kothapalli, P J Narayanan

2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications > 7 - 12

2011 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2011)

General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as convex hull, list ranking etc. In this...

chapter

Dense Dynamic Programming on Multi GPU

Vincent Boyer, Didier El Baz, Moussa Elkihel

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 545 - 551

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

The implementation via CUDA of a hybrid dense dynamic programming method for knapsack problems on amulti-GPU architecture is considered. Tests are carried out on a Bull cluster with Tesla S1070 computing systems. A first series of computational results shows substantial speedup. The speedup factor is close to 28 with two GPUs.

chapter

Patterns of Inefficient Performance Behavior in GPU Applications

D Eschweiler, D Becker, F Wolf

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 262 - 266

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance...

chapter

Thread block compaction for efficient SIMT control flow

W W L Fung, T M Aamodt

2011 IEEE 17th International Symposium on High Performance Computer Architecture > 25 - 36

2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)

Manycore accelerators such as graphics processor units (GPUs) organize processing units into single-instruction, multiple data “cores” to improve throughput per unit hardware cost. Programming models for these accelerators encourage applications to run kernels with large groups of parallel scalar threads. The hardware groups these threads into warps/wavefronts and executes them in lockstep-dubbed...

chapter

In Situ Power Analysis of General Purpose Graphical Processing Units

M Z Shaikh, M Gregoire, W Li, M Wroblewski, more

2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing > 40 - 44

19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2011)

In this paper, an in situ power analysis profiling over time for general purpose graphics processing units (GPGPU) is presented. Based on this method the power consumption of different modes of operations like data transfer between GPU and host CPU, basic single precision floating point arithmetic operations (addition, subtraction, multiplication) on the multiprocessor units and instructions for shared...

Keywords:
KERNEL
COPROCESSORS

Publication date

Set your own date range

Content availability

Available (254)
None (8)

Keywords

GRAPHICS PROCESSING UNIT (158)
COMPUTER GRAPHIC EQUIPMENT (117)
INSTRUCTION SETS (92)
GPU (86)
CUDA (72)
COMPUTER ARCHITECTURE (59)
PARALLEL PROCESSING (50)
COMPUTATIONAL MODELING (47)
GRAPHICS PROCESSING UNITS (42)
GPGPU (40)
COMPUTER GRAPHICS (38)
HARDWARE (37)
YARN (37)
PROGRAMMING (36)
PARALLEL ARCHITECTURES (33)
OPTIMIZATION (30)
ARRAYS (27)
ACCELERATION (26)
PIXEL (24)
ALGORITHM DESIGN AND ANALYSIS (23)
COMPUTE UNIFIED DEVICE ARCHITECTURE (23)
MULTIPROCESSING SYSTEMS (23)
PERFORMANCE EVALUATION (23)
REGISTERS (22)
BENCHMARK TESTING (20)
FIELD PROGRAMMABLE GATE ARRAYS (20)
BANDWIDTH (18)
GRAPHICS (18)
HIGH PERFORMANCE COMPUTING (17)
MEMORY MANAGEMENT (17)
PARALLEL ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
SPARSE MATRICES (17)
LIBRARIES (16)
OPTIMISATION (15)
PARALLEL COMPUTING (15)
GRAPHIC PROCESSING UNIT (14)
RUNTIME (14)
IMAGE PROCESSING (13)
MATHEMATICAL MODEL (13)
PROGRAM PROCESSORS (13)
RANDOM ACCESS MEMORY (13)
CPU (12)
DATA MINING (12)
MATRIX MULTIPLICATION (12)
CENTRAL PROCESSING UNIT (11)
INDEXES (11)
THREE DIMENSIONAL DISPLAYS (11)
NVIDIA (10)
EQUATIONS (9)
MAGNETIC CORES (9)
MULTICORE PROCESSING (9)
OPENCL (9)
POWER AWARE COMPUTING (9)
THROUGHPUT (9)
BIOINFORMATICS (8)
CONVOLUTION (8)
FAST FOURIER TRANSFORMS (8)
ITERATIVE METHODS (8)
MULTI-THREADING (8)
STREAMING MEDIA (8)
SYNCHRONIZATION (8)
APPLICATION PROGRAM INTERFACES (7)
BIOLOGY COMPUTING (7)
COPROCESSOR (7)
DATA TRANSFER (7)
EMBEDDED SYSTEMS (7)
FINITE DIFFERENCE METHODS (7)
FLOATING POINT ARITHMETIC (7)
GENERAL PURPOSE GRAPHICS PROCESSING UNITS (7)
GRAPHICAL PROCESSING UNIT (7)
HEURISTIC ALGORITHMS (7)
LAYOUT (7)
LINUX (7)
PATTERN CLUSTERING (7)
PROCESSOR SCHEDULING (7)
SERVERS (7)
SHARED MEMORY SYSTEMS (7)
VECTORS (7)
CLOCKS (6)
COMPUTATIONAL COMPLEXITY (6)
COMPUTERISED TOMOGRAPHY (6)
CRYPTOGRAPHY (6)
DATABASES (6)
DECODING (6)
ENERGY CONSUMPTION (6)
FEATURE EXTRACTION (6)
FPGA (6)
GENERAL PURPOSE COMPUTERS (6)
GRAPHICS HARDWARE (6)
HISTOGRAMS (6)
IMAGE RECONSTRUCTION (6)
LINEAR ALGEBRA (6)
MATHEMATICS COMPUTING (6)
MEDICAL IMAGE PROCESSING (6)
MESSAGE SYSTEMS (6)
NVIDIA CUDA (6)
OPENMP (6)
more

INFONA - science communication portal

Search results

Performance and energy efficiency in material science simulation on heterogeneous architectures

Efficient Computation of the Phylogenetic Likelihood Function on the Intel MIC Architecture

Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor

Sparse matrix-vector multiply on the Texas Instruments C6678 Digital Signal Processor

A coarse-grained reconfigurable wavelet denoiser exploiting the Multi-Dataflow Composer tool

Software-managed automatic data sharing for Coarse-Grained Reconfigurable coprocessors

A scalable memory interface for multicore reconfigurable computing systems

An Architecture for Reconfigurable Multi-core Explorations

Parallel Fish Swarm Algorithm Based on GPU-Acceleration

An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

GPU Implementation of the LFT Shape Matching Algorithm

GPU Accelerated Lanczos Algorithm with Applications

Optimizing simulated annealing on GPU: A case study with IC floorplanning

GridCuda: A Grid-Enabled CUDA Programming Toolkit

Fast Two Dimensional Convex Hull on the GPU

Dense Dynamic Programming on Multi GPU

Patterns of Inefficient Performance Behavior in GPU Applications

Thread block compaction for efficient SIMT control flow

In Situ Power Analysis of General Purpose Graphical Processing Units

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options