The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper believes that “broken layers” and“application-driven” will be new trends of microprocessor architecture. After discussion of parallel technologies at several levels, and schemes to manage data locality and communication, a practical architecture-customizing flow has been delineated in details, from which designer will understand how to accomplish high-performance microprocessor hardware...
The α-β algorithm is an efficient technique for searching game trees. In this paper, we present the detailed implementation of parallel α-β algorithm on our Engineering and Scientific Computation Accelerator (ESCA) system, which is a heterogeneous multi-core SIMD (Single Instruction stream Multiple Data stream) architecture to accelerate the compute-intensive parallel computing in high performance...
As the process technology scales down, interconnects become the performance bottleneck when designing multi-core processors. 3D IC can be a good solution for reducing the interconnection delay in the multi-core processors by stacking multiple layers connected through TSVs. However, 3D technology magnifies the thermal challenges in 3D multi-core processors. For this reason, 3D multi-core architecture...
In the recent years, multicore processor designs have become increasingly popular for embedded applications, but diversified inter-core communication mechanisms have led to the difficulties in software development, integration and migration. A unified, portable, and efficient inter-core communication mechanism would have helped reduce these difficulties significantly, but such a solution did not exist...
In this paper the components required to implement a central processing unit (CPU) and its arithmetic logic unit (ALU) are presented using a novel medium grain reconfigurable hardware architecture. The CPU can be configured to match the application's requirements in terms of word-size, number and type of units, and instruction set. The MIPS instruction set has been used to show the potential of the...
This paper proposes an FPGA-based System-on-Chip (SoC) architecture with support for dynamic runtime reconfiguration. The SoC is divided into two parts, the static embedded CPU sub-system and the dynamically reconfigurable part. An additional bus system connects the embedded CPU sub-system with modules within the dynamic area, offering a flexible way to communicate among all SoC components. This makes...
Considering the ability to perform multi-processor architecture systems on FPGA, partial reconfiguration is an opportunity to improve weak soft-core performances by specializing coprocessors according to context-dependent application needs. But at the application level, there is a need for straightforward programming models that allow applications to be easily mapped on an ad hoc architecture without...
We present a reconfigurable architecture that can perform highly parallel regular expression matching. The system can be configured on programmable devices such as FPGAs as a set of instances of a predefined core called REMA. Each core addresses one of the subtasks into which the regular expression matching problem can be partitioned. These cores work in parallel on the same string analyzing different...
The evolution of Artificial Intelligence has passed through many phases over the years, going from rigorous mathematical grounding to more intuitive bio-inspired approaches. Despite the abundance of AI algorithms and machine learning techniques, the state of the art still fails to capture the rich analytical properties of biological beings or their robustness. Most parallel hardware architectures...
Matrix multiplication is one of the most common numerical operations in the field of scientific computing, which is the kernel routine of Level 3 BLAS. The STI CELL processor is a heterogeneous multiprocessor with a unique design to achieve high peak floating point performance. As matrix multiplication operation is essential for a wide range of numerical algorithms, so performance improvements to...
Embedded high performance computing applications, like for example image processing in surveillance systems, are very compute intensive due to the complexity of the algorithms. Additionally to the computing intensive data processing, the power consumption for such systems needs to be minimized in order to keep them lightweight and mobile operational. One solution for achieving these goals is to exploit...
Unlike traditional SoC (System-on-chip) chip, multiprocessor chip that contains multiple independent processors, each processor owns different applications, so we need to make a reasonable multiprocessor chip initialization program. This paper proposes a design for the multiprocessor system initialization. The main contribution is as follows: Firstly, one method for the implementation of multiprocessor...
Advances in DSM technologies have a negative impact on yield and reliability of digital circuits. On-line self-testing is an interesting solution for detecting permanent and intermittent faults in non safety critical and real-time embedded multiprocessors. In this paper, we describe and evaluate three scheduling and allocation policies for on-line self-testing. We show that a policy that periodically...
Aiming at those problem that it was difficult to improve the processor performance only by improving the single core frequency, as well as superscalar pipeline stall when process a branch instruction, the architecture of heterogeneous multi-core processor which used B-Cache structure and C-Core processor controller was introduced in this paper. The new architecture avoided the pipeline flushed due...
This paper describes the design principles of a software based on-line testing application used to monitor manycore architectures running multi thread functional applications. The key idea is to have a non intrusive monitoring application running in parallel with the functional one. The monitoring application aims at detecting and reacting to software or hardware malfunctions, and can be seen as a...
Chip Multicore Processor (CMP) has become the mainstream microprocessor architecture in nowadays industry and academic literature. With the progress of CMP hardware developing and researching, software issues become more and more prominent. Coupled with these developments, many institutes and universities change their curriculums of computer architecture related courses. But the problem is do we really...
In this paper, we present a flexible and distributed homogeneous Software Defined Radio (SDR) platform. This platform is an array of processing elements, called Smart ModEm Processors (SMEP), interconnected by a Network-on-Chip. Implemented in ST65nm, each processing element performs 3.2 GMAC/s with 77 GBits/s internal bandwidth while dissipating 110mW. Each SMEP unit contains a MIPS processor for...
Technology trends enable the integration of many processor cores in a System-on-Chip (SoC). In these complex architectures, several architectural parameters can be tuned to find the best trade-off in terms of multiple metrics such as energy and delay. The main goal of the MULTICUBE project consists of the definition of an automatic Design Space Exploration framework to support the design of next generation...
The main goals of the 2PARMA project are: the definition of a parallel programming model combining component-based and single-instruction multiple-thread approaches, instruction set virtualisation based on portable byte-code, run-time resource management policies and mechanisms as well as design space exploration methodologies for many-core computing architectures.
Evaluating the system in early design steps is critical for an efficient design of Multi-Processor SoCs (MPSoC). When the number of processors grows, the simulation time tends to increase exponentially. Native co-simulation has been proposed to obtain performance estimations with sufficient accuracy while requiring short simulation times. In MPSoC architectures buses often become the most important...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.