Advanced search

chapter

Memristive logic: A framework for evaluation and comparison

John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, more

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS) > 1 - 8

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS)

Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...

chapter

Directive-Based Pipelining Extension for OpenMP

Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-Chun Feng

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 481 - 484

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...

chapter

Low-cost bitwise 2D ising model representation using Monte Carlo and metropolis algorithm

T Mathialakan, S Mahesan

2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer) > 106 - 109

2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer)

The 2D Ising model obviously deals with mass data to study the behavior of magnetization and energy related with temperature. This simulation should be worked out efficiently in terms of cost, time and memory consumption. In this paper, we introduce an innovative technique that executes bits instead of integers in order to reduce memory usage (1/32) and turnaround time (0.53). This approach has been...

chapter

Parallel algorithm mapping to memory multidimensional signals

Florin Balasa, Ilie I. Luican, Hongwei David Zhu

2016 IEEE 7th Latin American Symposium on Circuits & Systems (LASCAS) > 295 - 298

2016 IEEE 7th Latin American Symposium on Circuits & Systems (LASCAS)

The signal processing algorithms are typically described in a high-level programming language. In data-dominated applications, particularly in the multimedia and telecommunication domains, the code of these behavioral specifications is organized in sequences of loop nests; the main data structures are multidimensional arrays. This paper proposes a memory management algorithm for mapping multidimensional...

chapter

Memory partition for SIMD in streaming dataflow architectures

Xiaowei Shen, Xiaochun Ye, Xu Tan, Da Wang, more

2016 Seventh International Green and Sustainable Computing Conference (IGSC) > 1 - 8

2016 Seventh International Green and Sustainable Computing Conference (IGSC)

The high parallelism feature of scientific applications makes SIMD very suitable for streaming dataflow architectures. However, the splitting of SIMD memory requests increases the messages in on-chip networks and decreases the efficiency of streaming dataflow architectures. To process SIMD memory requests without splitting, a memory partition mechanism is proposed for SIMD in streaming dataflow architectures...

chapter

Scalability Challenges in Current MPI One-Sided Implementations

Xin Zhao, Pavan Balaji, William Gropp

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC) > 38 - 47

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC)

MPI one-sided or remote memory access (RMA) communication provides a different execution model from traditional two-sided or group communication and is better suited for some classes of applications. However, current implementations of MPI RMA are notorious for their inability to scale to large systems or problem sizes. In this paper, we present a study of the RMA infrastructure in popular open-source...

chapter

Memory-efficient particle annihilation algorithm for Wigner Monte Carlo simulations

P. Ellinghaus, M. Nedjalkov, S. Selberherr

2015 International Workshop on Computational Electronics (IWCE) > 1 - 4

2015 International Workshop on Computational Electronics (IWCE)

The Wigner Monte Carlo solver, using the signedparticle method, is based on the generation and annihilation of numerical particles. The memory demands of the annihilation algorithm can become exorbitant, if a high spatial resolution is used, because the entire discretized phase space is represented in memory. Two alternative algorithms, which greatly reduce the memory requirements, are presented here.

chapter

Novel source-to-source compiler approach for the automatic parallelization of codes based on the method of moments

Hipolito Gomez-Sousa, Manuel Arenaz, Oscar Rubinos-Lopez, Jose Angel Martinez-Lorenzo

2015 9th European Conference on Antennas and Propagation (EuCAP) > 1 - 6

2015 9th European Conference on Antennas and Propagation (EuCAP)

In computational electromagnetics, surface integral equation (SIE) formulations are widely used to predict the electromagnetic scattering from arbitrary structures. These SIE formulations are discretized into a matrix form by the well-known method of moments (MoM). Up to now, the lack of proper compilers made it necessary for the MoM codes to be parallelized by hand in order to obtain reasonable performance...

chapter

Efficient representation of distributions for background subtraction

Yedid Hoshen, Chetan Arora, Yair Poleg, Shmuel Peleg

2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance > 276 - 281

2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

Multi dimensional probability distributions are used in many surveillance tasks such as modeling color distribution of background pixels for Background Subtraction. Accurate representation of such distributions, e.g. in a histogram, requires much memory that may not be available when a histogram is computed for each pixel. Parametric representations such as Gaussian Mixture Models (GMM) are very efficient...

chapter

Deterministic implementation of periodic-delayed communications and experimentation in AADL

Fabien Cadoret, Thomas Robert, Etienne Borde, Laurent Pautet, more

16th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2013) > 1 - 8

2013 IEEE 16th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC)

The design of hard real-time embedded systems has to comply with strong requirements with respect to time determinism and resource consumption. However, interacting tasks may induce pessimism in schedulability analysis or introduce significant overheads in memory usage. In this paper, we restrict the execution and communication models to enforce an efficient and predictable implementation. To ensure...

chapter

Optimal 2D Data Partitioning for DMA Transfers on MPSoCs

Selma Saidi, Pranav Tendulkar, Thierry Lepley, Oded Maler

2012 15th Euromicro Conference on Digital System Design > 584 - 591

2012 15th Euromicro Conference on Digital System Design (DSD)

Reducing the effects of off-chip memory access latency is a key factor in exploiting efficiently embedded multicore platforms. We consider architectures that admit a multi-core computation fabric, having its own fast and small memory to which the data blocks to be processed are fetched from external memory using a DMA (direct memory access) engine, employing a double- or multiple-buffering scheme...

chapter

Memory-Aware Loop Paralleling for Coarse-Grained Reconfigurable Architectures

Ziyu Yang, Peng Zhao, Dawei Wang, Sikun Li

2012 International Conference on Computer Science and Service System > 2223 - 2226

2012 International Conference on Computer Science and Service System (CSSS)

The parallelization of sequential programs and the optimization of critical loops are challenging issues in the time of multi-core architectures. Coarse-Grained Reconfigurable Architecture (CGRA) is introduced to accelerate these data-intensive applications, while the access delay introduced by the massive memory accesses contained in those loops has become the bottleneck of CGRA's performance. In...

chapter

Diamond-Like Tiling Schemes for Efficient Explicit Euler on GPUs

Matthias Korch, Julien Kulbe, Carsten Scholtes

2012 11th International Symposium on Parallel and Distributed Computing > 259 - 266

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

GPU computing offers a high potential of raw processing power at comparatively low costs. This paper investigates optimization techniques for solving initial value problems (IVPs) of ordinary differential equations (ODEs) on GPUs. Different techniques, especially for exploiting the GPU memory hierarchy, are discussed, and corresponding OpenCL implementations of the explicit Euler method are compared...

chapter

Memory-Efficient Implementation of a Rigid-Body Molecular Dynamics Simulation

Wolfgang Eckhardt, Tobias Neckel

2012 11th International Symposium on Parallel and Distributed Computing > 103 - 110

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

Molecular dynamics simulations are usually optimized with regard to runtime rather than memory consumption. In this paper, we investigate two distinct implementational aspects of the frequently used Linked-Cell algorithm for rigid-body molecular dynamics simulations: the representation of particle data for the force calculation, and the layout of data structures in memory. We propose a low memory...

chapter

An experimental GPU global memory performance estimation and optimization

Zhu Junfeng, Chen Gang, Zhang Keliang, Wu Baifeng

2012 International Conference on Systems and Informatics (ICSAI2012) > 910 - 914

2012 International Conference on Systems and Informatics (ICSAI)

The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access...

chapter

A formal model of a large memory that supports efficient execution

Warren A. Hunt, Matt Kaufmann

2012 Formal Methods in Computer-Aided Design (FMCAD) > 60 - 67

2012 Formal Methods in Computer-Aided Design (FMCAD)

The validation and application of formal processor models benefits fundamentally from both efficient execution and automated reasoning about the models. We present a memory model written in the ACL2 logic, with both reasoning support and a runtime environment, that accomplishes these objectives. Our memory model provides a space-efficient implementation for an address space of 2⁴⁸ bytes, and is used...

chapter

Efficient processing of large 3D point clouds

Jan Elseberg, Dorit Borrmann, Andreas Nuchter

2011 XXIII International Symposium on Information, Communication and Automation Technologies > 1 - 7

2011 XXIII International Symposium on Information, Communication and Automation Technologies (ICAT)

Autonomous robots equipped with laser scanners acquire data at an increasingly high rate. Registration, data abstraction and visualization of this data requires the processing of a massive amount of 3D data. The increasing sampling rates make it easy to acquire Billions of spatial data points. This paper presents algorithms and data structures for handling this data. We propose an efficient octree...

chapter

Designing Efficient Parallel Prefix Sum Algorithms for GPUs

Gabriele Capannini

2011 IEEE 11th International Conference on Computer and Information Technology > 189 - 196

2011 IEEE 11th International Conference on Computer and Information Technology (CIT)

This paper presents a novel and efficient method to compute one of the simplest and most useful building block for parallel algorithms: the parallel prefix sum operation. Besides its practical relevance, the problem achieves further interest in parallel-computation theory. We firstly describe step-by-step how parallel prefix sum is performed in parallel on GPUs. Next we propose a more efficient technique...

chapter

An Emulation Model of IA-32 Memory Management

Hai-feng Chen, Lie-hui Jiang, Wei-yu Dong, Li-xin Wang

2011 International Conference on Intelligence Science and Information Engineering > 321 - 324

2011 International Conference on Intelligence Science and Information Engineering (ISIE)

System emulation provides a new solution for software migrating on heterogeneous platform. As one of the important components of system emulation, memory emulation directly affects the performance of system. This paper presents a universal emulation model of IA-32 memory management with Software MMU, virtual TLB and virtual MMIO. And an IA-32 memory management emulator prototype is implemented successfully...

chapter

Hardware Implementation of Cellular Automata on Systolic Array

A Yarahmadi, N Moarefi, S Setayeshi

2011 UkSim 13th International Conference on Computer Modelling and Simulation > 426 - 429

2011 UkSim 13th International Conference on Computer Modelling and Simulation (UKSim 2011)

Cellular Automata is one of the ways of performing computations which necessitates extremely the processing of data at high speeds. Implementing cellular automata on serial bases does not provide the required speed. Conventional processors can't process this enormous amount of data in a short period of time, so a new approach is required to improve computational complexity. Systolic array is a kind...

INFONA - science communication portal

Advanced search

Advanced search in people

Memristive logic: A framework for evaluation and comparison

Directive-Based Pipelining Extension for OpenMP

Low-cost bitwise 2D ising model representation using Monte Carlo and metropolis algorithm

Parallel algorithm mapping to memory multidimensional signals

Memory partition for SIMD in streaming dataflow architectures

Scalability Challenges in Current MPI One-Sided Implementations

Memory-efficient particle annihilation algorithm for Wigner Monte Carlo simulations

Novel source-to-source compiler approach for the automatic parallelization of codes based on the method of moments

Efficient representation of distributions for background subtraction

Deterministic implementation of periodic-delayed communications and experimentation in AADL

Optimal 2D Data Partitioning for DMA Transfers on MPSoCs

Memory-Aware Loop Paralleling for Coarse-Grained Reconfigurable Architectures

Diamond-Like Tiling Schemes for Efficient Explicit Euler on GPUs

Memory-Efficient Implementation of a Rigid-Body Molecular Dynamics Simulation

An experimental GPU global memory performance estimation and optimization

A formal model of a large memory that supports efficient execution

Efficient processing of large 3D point clouds

Designing Efficient Parallel Prefix Sum Algorithms for GPUs

An Emulation Model of IA-32 Memory Management

Hardware Implementation of Cellular Automata on Systolic Array

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options