The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...
The high parallelism feature of scientific applications makes SIMD very suitable for streaming dataflow architectures. However, the splitting of SIMD memory requests increases the messages in on-chip networks and decreases the efficiency of streaming dataflow architectures. To process SIMD memory requests without splitting, a memory partition mechanism is proposed for SIMD in streaming dataflow architectures...
Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the...
Cool Mega Array (CMA) is an energy efficient reconfigurable accelerator consisting of a large PE array with combinatorial circuits and a small microcontroller. In order to enhance the energy efficiency of the total system, a coprocessor design of CMA(Cool Mega Array), called CMA-Geyser is proposed. By replacing the programmable microcontroller by the host processor Geyser with a dedicated hardware...
This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique inserts allocation, free and data transfers into the code for using the memory layer and avoids memory overflows by adding a limited number of additional...
This paper presents a design that realizes a dual-channel Data Record Card which achieves 33 MB/s data transfer rate and 4 GB capacity per channel with CF Card and SDRAM. According to the characteristics of data transfer with CF Card, an advanced data transfer method is provided, which reduces the requirements of memory resources in FPGA without influencing the data transfer rate. And it realizes...
In embedded SoC design, memory hierarchies are playing increasingly important roles for system performances. There is a significant latency gap between internal and external memory accesses. The external memory access might downgrade the performance of embedded systems. Application developers must explicitly handle data transfer between external and internal memories. That is a burden for programmers...
Many signal processing systems, particularly in the multimedia and telecom domains, are synthesized to execute data-dominated applications. In such systems, data transfer and storage have a significant impact on both the system performance and the major cost parameters - power consumption and chip area. This paper presents a software tool for system-level exploration, where several memory management...
Loop tiling is an effective loop transformation technique that tiles the iteration space of loop nests to improve the data locality. The appropriate data layout and transfer strategies are also important to assist loop tiling. This paper describes an approach to enhance data reuse and reduce off-chip memory access after loop tiling. Data tiles due to loop tiling may have overlapped elements, which...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.