The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...
Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...
The 2D Ising model obviously deals with mass data to study the behavior of magnetization and energy related with temperature. This simulation should be worked out efficiently in terms of cost, time and memory consumption. In this paper, we introduce an innovative technique that executes bits instead of integers in order to reduce memory usage (1/32) and turnaround time (0.53). This approach has been...
The signal processing algorithms are typically described in a high-level programming language. In data-dominated applications, particularly in the multimedia and telecommunication domains, the code of these behavioral specifications is organized in sequences of loop nests; the main data structures are multidimensional arrays. This paper proposes a memory management algorithm for mapping multidimensional...
The high parallelism feature of scientific applications makes SIMD very suitable for streaming dataflow architectures. However, the splitting of SIMD memory requests increases the messages in on-chip networks and decreases the efficiency of streaming dataflow architectures. To process SIMD memory requests without splitting, a memory partition mechanism is proposed for SIMD in streaming dataflow architectures...
MPI one-sided or remote memory access (RMA) communication provides a different execution model from traditional two-sided or group communication and is better suited for some classes of applications. However, current implementations of MPI RMA are notorious for their inability to scale to large systems or problem sizes. In this paper, we present a study of the RMA infrastructure in popular open-source...
The Wigner Monte Carlo solver, using the signedparticle method, is based on the generation and annihilation of numerical particles. The memory demands of the annihilation algorithm can become exorbitant, if a high spatial resolution is used, because the entire discretized phase space is represented in memory. Two alternative algorithms, which greatly reduce the memory requirements, are presented here.
In computational electromagnetics, surface integral equation (SIE) formulations are widely used to predict the electromagnetic scattering from arbitrary structures. These SIE formulations are discretized into a matrix form by the well-known method of moments (MoM). Up to now, the lack of proper compilers made it necessary for the MoM codes to be parallelized by hand in order to obtain reasonable performance...
Multi dimensional probability distributions are used in many surveillance tasks such as modeling color distribution of background pixels for Background Subtraction. Accurate representation of such distributions, e.g. in a histogram, requires much memory that may not be available when a histogram is computed for each pixel. Parametric representations such as Gaussian Mixture Models (GMM) are very efficient...
The design of hard real-time embedded systems has to comply with strong requirements with respect to time determinism and resource consumption. However, interacting tasks may induce pessimism in schedulability analysis or introduce significant overheads in memory usage. In this paper, we restrict the execution and communication models to enforce an efficient and predictable implementation. To ensure...
Reducing the effects of off-chip memory access latency is a key factor in exploiting efficiently embedded multicore platforms. We consider architectures that admit a multi-core computation fabric, having its own fast and small memory to which the data blocks to be processed are fetched from external memory using a DMA (direct memory access) engine, employing a double- or multiple-buffering scheme...
The parallelization of sequential programs and the optimization of critical loops are challenging issues in the time of multi-core architectures. Coarse-Grained Reconfigurable Architecture (CGRA) is introduced to accelerate these data-intensive applications, while the access delay introduced by the massive memory accesses contained in those loops has become the bottleneck of CGRA's performance. In...
GPU computing offers a high potential of raw processing power at comparatively low costs. This paper investigates optimization techniques for solving initial value problems (IVPs) of ordinary differential equations (ODEs) on GPUs. Different techniques, especially for exploiting the GPU memory hierarchy, are discussed, and corresponding OpenCL implementations of the explicit Euler method are compared...
Molecular dynamics simulations are usually optimized with regard to runtime rather than memory consumption. In this paper, we investigate two distinct implementational aspects of the frequently used Linked-Cell algorithm for rigid-body molecular dynamics simulations: the representation of particle data for the force calculation, and the layout of data structures in memory. We propose a low memory...
The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access...
The validation and application of formal processor models benefits fundamentally from both efficient execution and automated reasoning about the models. We present a memory model written in the ACL2 logic, with both reasoning support and a runtime environment, that accomplishes these objectives. Our memory model provides a space-efficient implementation for an address space of 248 bytes, and is used...
Autonomous robots equipped with laser scanners acquire data at an increasingly high rate. Registration, data abstraction and visualization of this data requires the processing of a massive amount of 3D data. The increasing sampling rates make it easy to acquire Billions of spatial data points. This paper presents algorithms and data structures for handling this data. We propose an efficient octree...
This paper presents a novel and efficient method to compute one of the simplest and most useful building block for parallel algorithms: the parallel prefix sum operation. Besides its practical relevance, the problem achieves further interest in parallel-computation theory. We firstly describe step-by-step how parallel prefix sum is performed in parallel on GPUs. Next we propose a more efficient technique...
System emulation provides a new solution for software migrating on heterogeneous platform. As one of the important components of system emulation, memory emulation directly affects the performance of system. This paper presents a universal emulation model of IA-32 memory management with Software MMU, virtual TLB and virtual MMIO. And an IA-32 memory management emulator prototype is implemented successfully...
Cellular Automata is one of the ways of performing computations which necessitates extremely the processing of data at high speeds. Implementing cellular automata on serial bases does not provide the required speed. Conventional processors can't process this enormous amount of data in a short period of time, so a new approach is required to improve computational complexity. Systolic array is a kind...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.