Search results

Items from 1 to 20 out of 42 results

chapter

ORCHESTRA: An asynchronous wait-free distributed GVT algorithm

Tommaso Tocci, Alessandro Pellegrini, Francesco Quaglia, Josep Casanovas-Garcia, more

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT) > 1 - 8

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

Taking advantage of computing capabilities offered by modern parallel and distributed architectures is fundamental to run large-scale simulation models based on the Parallel Discrete Event Simulation (PDES) paradigm. By relying on this computing organization, it is possible to effectively overcome both the power and the memory wall, which are core limiting aspects to deliver high-performance simulations...

chapter

Offloading Communication Control Logic in GPU Accelerated Applications

Elena Agostini, Davide Rossetti, Sreeram Potluri

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 248 - 257

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...

chapter

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

Osama G. Attia, Alex Grieve, Kevin R. Townsend, Phillip Jones, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

In this paper, we study the design and implementation of a reconfigurable architecture for graph processing algorithms. The architecture uses a message-passing model targeting shared-memory multi-FPGA platforms. We take advantage of our architecture to showcase a parallel implementation of the all-pairs shortest path algorithm (APSP) for unweighted directed graphs. Our APSP implementation adopts a...

chapter

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution

Brunno F. Goldstein, Felipe M.G. Franca, Leandro A.J. Marzulo, Tiago A.O. Alves

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 103 - 108

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Linear Algebra Kernels have an important role in many petroleum reservoir simulators, extensively used by the industry. The growth in problem size, specially in pre-salt exploration, has caused an increase in execution time of those kernels, thus requiring parallel programming to improve performance and make the simulation viable. On the other hand, exploiting parallelism in systems with an ever increasing...

chapter

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

Ilya Ivanov, Jing Gong, Dana Akhmetova, Ivy Bo Peng, more

2015 IEEE International Conference on Cluster Computing > 760 - 767

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with...

chapter

Parallel Native-Simulation for Multi-processing Embedded Systems

Alejandro Nicolas, Pablo Sanchez

2015 Euromicro Conference on Digital System Design > 543 - 546

2015 Euromicro Conference on Digital System Design (DSD)

The number of cores in embedded systems is continuously growing, supporting increasingly complex concurrent applications. In order to verify that the systems comply specification requirements during the design process, fast simulations and performance analysis tools are required. These simulation frameworks typically use virtualization or host-compiled simulation techniques. On one hand, current host...

chapter

Pre-simulation elaboration of heterogeneous systems: The SystemC multi-disciplinary virtual prototyping approach

Cedric Ben Aoun, Liliana Andrade, Torsten Maehne, Francois Pecheux, more

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) > 278 - 285

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Designers of the upcoming digital-centric More-than-Moore systems are lacking a common design and simulation environment able to efficiently manage all the multi-disciplinary aspects of its components of various nature that closely interact with each other. A key to successful design and verification lies in a SystemC-based virtual prototyping environment that is able to simulate a complex heterogeneous...

chapter

Migration of CUDA Program Based on a Divide-and-Conquer Method

Nan Li, Jianmin Pang, Zheng Shan

2014 IEEE 17th International Conference on Computational Science and Engineering > 1685 - 1691

2014 IEEE 17th International Conference on Computational Science and Engineering (CSE)

Porting CUDA program to other heterogeneous and many-core platform especially native processor is very meaningful for extending the range of the CUDA application, taking advantage of many-core on target platform and supporting national industries. Traditional binary translation technique is not competent to this task. On the point of software reverse engineering, it is feasible to design a new migration...

chapter

Mainstream Components for Near Hard Real-Time Distributed Simulation and Testing

Fernand Quartier, Pierre Verhoyen, Nadie Rousse, Frederic Manon

2014 IEEE/ACM 18th International Symposium on Distributed Simulation and Real Time Applications > 11 - 17

2014 IEEE/ACM 18th International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

At CNES, each new satellite simulation and testing system increases significantly processing requirements and real-time constraints. While mainstream systems allow adding almost unlimited computing resources, whenever there are stronger timing constraints, we arrive in a much unknown territory. To prepare the future, several R&D projects have been carried out that were focusing on related...

chapter

Graph processing on GPUs: Where are the bottlenecks?

Qiumin Xu, Hyeran Jeon, Murali Annavaram

2014 IEEE International Symposium on Workload Characterization (IISWC) > 140 - 149

2014 IEEE International Symposium on Workload Characterization (IISWC)

Large graph processing is now a critical component of many data analytics. Graph processing is used from social networking web sites that provide context-aware services from user connectivity data to medical informatics that diagnose a disease from a given set of symptoms. Graph processing has several inherently parallel computation steps interspersed with synchronization needs. Graphics processing...

chapter

A comparison of parallel systemc simulation approaches at RTL

Bastian Haetzer, Martin Radetzki

Proceedings of the 2014 Forum on Specification and Design Languages (FDL) > 978-2-9530504-9-3 > 1 - 8

2014 Forum on Specification and Design Languages (FDL)

This paper presents a holistic comparison of different parallel SystemC simulation approaches at the register transfer level (RTL). The effect of RTL modeling styles and simulation strategies on performance will be evaluated to show potentials and limitations of state of the art parallel simulation techniques on shared memory machines. Experiments show that the simulation performance strongly depends...

chapter

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Felix Schmitt, Jonas Stolle, Robert Dietrich

2014 43rd International Conference on Parallel Processing Workshops > 186 - 195

2014 43nd International Conference on Parallel Processing Workshops (ICCPW)

Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...

chapter

SimParallel: A high performance parallel SystemC simulator using hierarchical multi-threading

Moo-Kyoung Chung, Jun-Kyoung Kim, Soojung Ryu

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 1472 - 1475

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

As the system complexity increases, the simulation performance becomes one of the most important issues in virtual prototyping. Parallel simulation is a fascinating technique for high-speed simulation utilizing state of the art multi-core processors on a host workstation, but the efficiency of the parallel simulation is low because of the synchronization and communication overhead and unbalanced workloads...

chapter

HSAemu - A full system emulator for HSA platforms

Jiun-Hung Ding, WeiChung Hsu, BaiCheng Jeng, ShihHao Hung, more

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

Heterogeneous System Architecture (HSA) is an open industry standard designed to support a large variety of data-parallel and task-parallel programming models. Currently, most of HSA hardware and software components are still in development. It is helpful to provide various heterogeneous simulation environments for HSA developers in developing HSA software stacks. This paper presents the design of...

chapter

A clustered manycore processor architecture for embedded and accelerated applications

Benoit Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, more

2013 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2013 IEEE High Performance Extreme Computing Conference (HPEC)

The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured...

chapter

A SystemC modeling and simulation methodology for fast and accurate parallel MPSoC simulation

Christoph Roth, Harald Bucher, Simon Reder, Florian Buciuman, more

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI) > 1 - 6

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI)

Due to the growing complexity of embedded systems, simulation becomes an increasingly time-consuming task. Especially detailed simulation of so called Multi-Processor System-on-Chips (MPSoCs) is afflicted with extremely long runtimes and makes verification and debugging extraordinary expensive. In this work, a SystemC/TLM based methodology for accelerating simulation of NoC-based MPSoCs is presented...

chapter

Assessing load-sharing within optimistic simulation platforms

Roberto Vitali, Alessandro Pellegrini, Francesco Quaglia

Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC) > 1 - 13

2012 Winter Simulation Conference - (WSC 2012)

The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture...

chapter

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Alessandro Pellegrini, Roberto Vitali, Sebastiano Peluso, Francesco Quaglia

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems > 134 - 141

2012 IEEE 20th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)

Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore...

chapter

Towards Symmetric Multi-threaded Optimistic Simulation Kernels

Roberto Vitali, Alessandro Pellegrini, Francesco Quaglia

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation > 211 - 220

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS)

In this article we address the reshuffle of the design of optimistic simulation kernels in order to fit multi-core/multi-processor machines. This is done by providing a reference optimistic simulation architecture based on the symmetric multi-threaded paradigm, where each simulation kernel instance is allowed to run a dynamically changing set of worker threads that share the whole load of LPs hosted...

chapter

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Rohit Sinha, Aayush Prakash, Hiren D. Patel

17th Asia and South Pacific Design Automation Conference > 455 - 460

2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC)

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional SystemC...

Data set:
ieee
Keywords:
KERNEL
COMPUTATIONAL MODELING
SYNCHRONIZATION

Publication date

Set your own date range

Publication type

book (41)
article (1)

Keywords

COMPUTER ARCHITECTURE (9)
INSTRUCTION SETS (9)
GRAPHICS PROCESSING UNITS (6)
PROGRAMMING (5)
ANALYTICAL MODELS (4)
GRAPHICS PROCESSING UNIT (4)
LIBRARIES (4)
LOAD MODELING (4)
MESSAGE SYSTEMS (4)
PARALLEL SIMULATION (4)
PROGRAM PROCESSORS (4)
CUDA (3)
DATA MODELS (3)
DATA STRUCTURES (3)
HARDWARE (3)
MATHEMATICAL MODEL (3)
MPI (3)
MULTICORE PROCESSING (3)
PARALLEL ARCHITECTURES (3)
PORTS (COMPUTERS) (3)
SIMULATION (3)
SYNCHRONISATION (3)
SYSTEMC (3)
ADAPTATION MODEL (2)
ADAPTATION MODELS (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ARRAYS (2)
COMPUTATIONAL FLUID DYNAMICS (2)
ELECTRONIC ENGINEERING COMPUTING (2)
EMBEDDED SYSTEMS (2)
ENGINES (2)
ESTIMATION (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
INTEGRATED CIRCUIT MODELING (2)
JAVA (2)
LOGIC GATES (2)
OPTIMIZATION (2)
ORGANIZATIONS (2)
PARALLEL ARCHITECTURE (2)
PERFORMANCE ANALYSIS (2)
PROCESSOR SCHEDULING (2)
PROPOSALS (2)
PROTOCOLS (2)
SCHEDULING (2)
SEMANTICS (2)
SEMICONDUCTOR PROCESS MODELING (2)
SYSTEM-ON-CHIP (2)
ABSTRACTS (1)
ACCURACY ADAPTIVE TRANSACTION LEVEL MODEL (1)
AD HOC NETWORKS (1)
AD-HOC DEVELOPMENT (1)
AD-HOC WIRELESS (1)
ADAPTIVE GRID (1)
ADAPTIVE MODELS SYSTEMATIC DEVELOPMENT (1)
ADAPTIVE TLM (1)
ADAPTIVITY MECHANISMS (1)
ALGORITHMS (1)
ALL-PAIRS SHORTEST PATH (1)
ANT COLONY OPTIMIZATION (1)
APPLICATION PROGRAM INTERFACE (1)
APPLICATION PROGRAM INTERFACES (1)
ASYMMETRIC CONCURRENCY (1)
ASYMPTOTIC ANALYSIS (1)
ASYNCHRONOUS COMMUNICATION (1)
ASYNCHRONOUS COMMUNICATIONS (1)
ATMOSPHERIC MODELING (1)
ATOMIC (1)
AUTOMATA (1)
BACKPLANES (1)
BENCHMARK TESTING (1)
BINARY TRANSLATION (1)
BLOCKING MECHANISM (1)
BOUNDED MODEL CHECKING (1)
CACHE (1)
CELL (1)
CELL PROCESSOR (1)
CELL-DEVS (1)
CENTRAL PROCESSING UNIT (1)
CHANNEL ALLOCATION (1)
CHANNEL-AWARE SCHEDULING (1)
CLOCKS (1)
COMMUNICATION KERNEL (1)
COMPILE FLOW (1)
COMPUTATION MODELS (1)
COMPUTATIONAL MODEL (1)
COMPUTATIONAL POWER (1)
COMPUTATIONALLY EXTENSIVE MODELS (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA) (1)
COMPUTER AIDED ENGINEERING (1)
COMPUTER DEBUGGING (1)
COMPUTING MODEL MAPPING (1)
CONCEALMENT MECHANISM (1)
CONCURRENCY CONTROL (1)
CONCURRENT OBSERVATIONS (1)
CONCURRENT PROCESSING UNITS (1)
CONNECTORS (1)
CONSERVATIVE WARPED (1)
more

INFONA - science communication portal

Search results

ORCHESTRA: An asynchronous wait-free distributed GVT algorithm

Offloading Communication Control Logic in GPU Accelerated Applications

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

Exploiting Parallelism in Linear Algebra Kernels through Dataflow Execution

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

Parallel Native-Simulation for Multi-processing Embedded Systems

Pre-simulation elaboration of heterogeneous systems: The SystemC multi-disciplinary virtual prototyping approach

Migration of CUDA Program Based on a Divide-and-Conquer Method

Mainstream Components for Near Hard Real-Time Distributed Simulation and Testing

Graph processing on GPUs: Where are the bottlenecks?

A comparison of parallel systemc simulation approaches at RTL

CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

SimParallel: A high performance parallel SystemC simulator using hierarchical multi-threading

HSAemu - A full system emulator for HSA platforms

A clustered manycore processor architecture for embedded and accelerated applications

A SystemC modeling and simulation methodology for fast and accurate parallel MPSoC simulation

Assessing load-sharing within optimistic simulation platforms

Transparent and Efficient Shared-State Management for Optimistic Simulations on Multi-core Machines

Towards Symmetric Multi-threaded Optimistic Simulation Kernels

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options