Search results

Items from 1 to 12 out of 12 results

chapter

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Patrick MacArthur

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 103 - 110

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...

chapter

A generic execution framework for shared FPGA-based accelerators

Dumitru Laurentiu Alexandru, Rares Maniu

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP) > 803 - 808

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP)

FPGAs are continuously increasing in both chip size and operating frequency. Dynamic reconfiguration is easier and more stable with current generation of hardware and software tools. These characteristics have made them more accessible to generic acceleration tasks instead of specialized functions. As a consequence, FPGAs are being deployed in more computing clusters than in the past. This leads to...

chapter

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Miguel Tasende

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) > 894 - 897

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...

chapter

LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems

Prateek Nagar, Fengguang Song, Luoding Zhu, Lan Lin

2015 44th International Conference on Parallel Processing > 51 - 60

2015 44th International Conference on Parallel Processing (ICPP)

Deformable structures are abundant in various domains such as biology, medicine, life sciences, and ocean engineering. Our previous work created a numerical method, named LBM-IB method [1], to solve the fluid-structure interaction (FSI) problems. Our LBM-IB method is particularly suitable for simulating flexible (or elastic) structures immersed in a moving viscous fluid. Fluid-structure interaction...

chapter

The Changing Relevance of the TLB

Jessica R. Jones, James H. Davenport, Russell Bradford

2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science > 110 - 114

2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES)

A little over a decade ago, Goto and van de Geijn wrote about the importance of the treatment of the translation lookaside buffer (TLB) on the performance of matrix multiplication. Crucially, they did not say how important, nor did they provide results that would allow the reader to make his own judgement. In this paper, we revisit their work and look at the effect on the performance of their algorithm...

chapter

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications

Pedro Alonso, Rosa M. Badia, Jesus Labarta, Maria Barreda, more

2012 41st International Conference on Parallel Processing > 420 - 429

2012 41st International Conference on Parallel Processing (ICPP)

Understanding power usage in parallel workloads is crucial to develop the energy-aware software that will run in future Exascale systems. In this paper, we contribute towards this goal by introducing an integrated framework to profile, monitor, model and analyze power dissipation in parallel MPI and multi-threaded scientific applications. The framework includes an own-designed device to measure internal...

chapter

How to correctly deal with pseudorandom numbers in manycore environments: Application to GPU programming with Shoverand

Jonathan Passerat-Palmbach, David R. C. Hill

2012 International Conference on High Performance Computing & Simulation (HPCS) > 25 - 31

2012 International Conference on High Performance Computing & Simulation (HPCS)

Stochastic simulations are often sensitive to the source of randomness that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by relying more and more General Purpose Graphics Processing Units (GPGPUs) to speedup stochastic simulations. Such...

chapter

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

R Dietrich, T Ilsche, G Juckeland

2010 39th International Conference on Parallel Processing Workshops > 135 - 143

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without any changes...

chapter

BLAS Comparison on FPGA, CPU and GPU

S Kestur, J D Davis, O Williams

2010 IEEE Computer Society Annual Symposium on VLSI > 288 - 293

2010 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2010)

High Performance Computing (HPC) or scientific codes are being executed across a wide variety of computing platforms from embedded processors to massively parallel GPUs. We present a comparison of the Basic Linear Algebra Subroutines (BLAS) using double-precision floating point on an FPGA, CPU and GPU. On the CPU and GPU, we utilize standard libraries on state-of-the-art devices. On the FPGA, we have...

chapter

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

José Duato, Antonio J Peña, F Silla, R Mayo, more

2010 International Conference on High Performance Computing&Simulation > 224 - 231

2010 International Conference on High Performance Computing & Simulation (HPCS 2010)

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable...

chapter

NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java

D.A. Mallon, G.L. Taboada, J. Tourio, R. Doallo

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 181 - 190

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing

Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption...

chapter

Efficient one-copy MPI shared memory communication in Virtual Machines

Wei Huang, M.J. Koop, D.K. Panda

2008 IEEE International Conference on Cluster Computing > 107 - 115

2008 IEEE International Conference on Cluster Computing (CLUSTER)

Efficient intra-node shared memory communication is important for high performance computing (HPC), especially with the emergence of multi-core architectures. As clusters continue to grow in size and complexity, the use of virtual machine (VM) technologies has been suggested to ease the increasing number of management issues. As demonstrated by earlier research, shared memory communication must be...

Filter options

Data set:
ieee
Keywords:
KERNEL
LIBRARIES
HIGH PERFORMANCE COMPUTING

Publication date

Set your own date range

Keywords

BENCHMARK TESTING (4)
HARDWARE (4)
ACCELERATION (3)
GRAPHICS PROCESSING UNIT (3)
BLAS (2)
COMPUTATIONAL MODELING (2)
COMPUTER ARCHITECTURE (2)
COPROCESSORS (2)
DRIVER CIRCUITS (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
MESSAGE PASSING (2)
OPTIMISATION (2)
OPTIMIZATION (2)
PIPELINES (2)
PROTOCOLS (2)
RANDOM ACCESS MEMORY (2)
RUNTIME (2)
SERVERS (2)
TRACING (2)
ACCELERATORS (1)
ADDERS (1)
ARRAYS (1)
BASIC LINEAR ALGEBRA SUBROUTINES (1)
BIG DATA (1)
BLIS (1)
C/FORTRAN PARALLEL LIBRARY (1)
CLUSTER INTEGRATION (1)
CLUSTERS (1)
COMPUTATIONAL FLUID DYNAMICS (1)
COMPUTATIONAL FLUID DYNAMICS (CFD) (1)
COMPUTER GRAPHIC EQUIPMENT (1)
CPU (1)
CUDA (1)
CUDA ENVIRONMENT (1)
DATA TRANSFER (1)
DISTRIBUTED MEMORY PROGRAMMING (1)
DISTRIBUTED MEMORY SYSTEMS (1)
DOT-PRODUCT MULTIPLICATION (1)
DOUBLE-PRECISION FLOATING POINT (1)
DPDK (1)
EDUCATIONAL INSTITUTIONS (1)
EMBEDDED PROCESSORS (1)
ENERGY AND POWER MODELS (1)
ENERGY CONSUMPTION (1)
ENERGY EFFICIENCY (1)
ENERGY SAVING (1)
EVENT LOGGING (1)
FLOATING POINT VALUES (1)
FLUID-STRUCTURE INTERACTIONS (FSI) (1)
FORCE (1)
FPGA (1)
FPGA OPTIMIZATIONS (1)
GEMM (1)
GENERAL PURPOSE COMPUTING (1)
GENERATORS (1)
GP-GPU (1)
GPGPU (1)
GPU-BASED ACCELERATORS (1)
GRAPHICS PROCESSING UNITS (1)
HARDWARE ACCELERATION (1)
HIGH PERFORMANCE CLUSTERS (1)
HPC (1)
HPC APPLICATIONS (1)
HPL (1)
HYBRID ARCHITECTURES (1)
HYBRID SIMULATION (1)
IMMERSED BOUNDARY METHODS (IB) (1)
INSTRUMENTS (1)
INTRA-NODE SHARED MEMORY COMMUNICATION (1)
IWARP (1)
JAVA (1)
KERNEL BYPASS (1)
LARGE COMPUTING SYSTEMS (1)
LINPACK (1)
LINUX (1)
MANYCORE (1)
MANYCORE SYSTEMS (1)
MASSIVELY PARALLEL GPU (1)
MATHEMATICAL MODEL (1)
MATRIX MULTIPLICATION (1)
MATRIX-VECTOR MULTIPLICATION (1)
MEMORY MANAGEMENT (1)
MESSAGE-PASSING (1)
MONITORING (1)
MONITORING LIBRARIES (1)
MPI-2 LIBRARY (1)
MULTI-THREADING (1)
MULTICORE ARCHITECTURES (1)
MULTITHREADING SUPPORT (1)
MVAPICH2 (1)
NAS PARALLEL BENCHMARKS (1)
NAS PARALLEL BENCHMARKS IMPLEMENTATION (1)
NONINTRUSIVE PERFORMANCE ANALYSIS (1)
NPB-MPJ (1)
ONE-COPY MPI SHARED MEMORY COMMUNICATION (1)
OPENCL FRAMEWORK (1)
OPTIMIZATION TECHNIQUES (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options