The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents ReMAP, a reconfigurable architecture geared towards accelerating and parallelizing applications within a heterogeneous CMP. In ReMAP, threads share a common reconfigurable fabric that can be configured for individual thread computation or fine-grained communication with integrated computation. The architecture supports both fine-grained point-to-point communication for pipeline...
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for future manycore accelerators that employ bulk-synchronous parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is "throughput-effective"...
This paper proposes Flex Core, a hybrid processor architecture where an on-chip reconfigurable fabric (FPGA) is tightly coupled with the main processing core. Flex Core provides an efficient platform that can support a broad range of run-time monitoring and bookkeeping techniques. Unlike using custom hardware, which is more efficient but often extremely difficult and expensive to incorporate into...
Modern processor architectures sacrifice timing predictability to improve average performance. Branch prediction, out-of-order execution, and multi-level cache hierarchies complicate accurate execution time estimates. The timing demands of Cyber Physical Systems (CPS) have led some to propose new processor architectures, including Precision Timed (PRET) processors, which simplify analysis of execution...
Chips are moving from single-core systems to much more complex, heterogeneous many core systems. While heterogeneous architectures promise high performance, they are also challenging our ability to port our existing operating systems to abstract the heterogeneous components into a unified architecture. Baseline solutions to resolve heterogeneity issues within many cores use Remote Procedure Calls...
We propose a minimalistic processor architecture tailoring Wave Field Synthesis (WFS)-based audio applications to configurable hardware. Eleven high-level instructions provide the required flexibility for embedded WFS customization. We describe the implementation of the proposed instructions and apply them to a multi-core reconfigurable WFS architecture. Our approach combines software programming...
Cloud Computing is one of the hottest topics researched today, with the objective of taking advantage of data center computational resources. Hardware and software virtualization make the environment scalable, redundant, and lower cost. This paper intends to characterize scientific and transactional applications in Cloud infrastructures IaaS, identifying the best virtual machine configuration in terms...
In this paper, we focus on solving the problem of removing inter-core communication overhead for streaming applications on chip multiprocessors. The objective is to totally remove inter-core communication overhead while minimizing the overall memory usage. By totally removing inter-core communication overhead, a shorter period can be applied and system throughput can be improved. Our basic idea is...
We propose a programmable heterogeneous multi-processor system-on-chip (MPSoC) platform architecture for flexible radio processing that aims at striking a balance between performance (as provided by ASICs) and flexibility (as provided by SDR). Based on a novel hardware-oriented Virtual Flow Pipelining (VFP) framework, the key highlights of this solution are a simple task-level programming model for...
This paper presents our integrated system-level design tool set, named Advanced SystemBuilder. Advanced SystemBuilder supports overall methodology for system design and design space exploration, and provides programming model of systems, automatic synthesis capabilities for FPGA-based prototyping, cosimulation and execution profiling. A case study of MPEG4 decoder design shows the effectiveness of...
Real-time systems need time-predictable platforms to enable static worst-case execution time (WCET) analysis. Improving the processor performance with superscalar techniques makes static WCET analysis practically impossible. However, most real-time systems are multi-threaded applications and performance can be improved by using several processor cores on a single chip. In this paper we present a time-predictable...
In this paper we examine the idea of implementing communicating sequential processes (CSP) constructs on a Java embedded chip multiprocessor (CMP). The approach is intended to reduce the memory bandwidth pressure on the shared memory, by employing a dedicated network-on-chip (NoC). The presented solution is scalable and also specific for our limited resources and real-time predictability requirements...
Massive parallel computing performed on many-core Network-on-Chips (NoCs) is the future of the computing. One feasible approach to implement parallel computing is to deploy multiple applications on the NoC simultaneously. In this paper, we propose a multi-application mapping method starting with the application mapping which finds a region on the NoC for each application and then task mapping which...
This paper proposes a novel multi-core processor with SIMD(Single Instruction Multiple Data) ISA (Instruction Set Architecture) and extended register file for communication applications. To acquire better parallel computing capability, we implement SIMD ISA and increase the number of register file from 32 to 64. 5×5 homogeneous 2-D mesh NoC (Network-on-Chip) topology is adopted to further enhance...
With the number of processor cores increasing in chip multi-processors (CMPs) and global wire delays increasing, networks on chip have been gaining wide acceptance for on-chip inter-core communication. This paper introduces a low latency Dynamic Virtual Output Queues Router (DVOQR), which can reduce the router latency to two cycles by leveraging look-ahead routing computation and virtual output address...
Modern embedded processors are often customized to accelerate native code. However, the design space exploration of hardware/software trade-offs is often time-intensive. To explore the design space of a processor's instruction set, simulations are utilized. Instruction set extension identification is usually performed by analyzing the basic blocks of an application in a linear fashion. We present...
Fused Multiply-Add (FMA) units are quite popular in floating-point execution units in state-of-the-art multicore processors. It has been shown that, for division operations, using digit-recurrence units consumes much less power and energy than using FMA units which are based on Newton-Raphson approximation algorithms. In this work, we show that digit-recurrence division units can also reduce on chip...
Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. I/O forwarding is a paradigm that attempts to bridge the increasing performance and scalability gap between the compute and I/O components of leadership-class machines to meet the requirements of data-intensive applications by shipping I/O calls from compute nodes to dedicated...
Providing realistic, high-resolution and high fidelity representation of motions ia essential in the cloth simulation problem. In order to make high resolution simulations tractable, several algorithms have been developed that manage cloth-object interactions efficiently through specialized data structures such as AABB trees. However, implementation restrictions on single CPU architectures impose...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.