The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable...
This paper presents a reconfigurable mobile stream processor for ray tracing. The processor is implemented with 16mm2 area in 0.13μm CMOS technology. The processor adopts a single instruction, multiple thread (SIMT) architecture in order to exploits instruction-level and thread-level parallelism. The SIMT architecture consists of 12 stream processors (SPs). A low hardware utilization caused by a branch...
Reliability issues such as a soft error and NBTI (negative bias temperature instability) have become a matter of concern as integrated circuits continue to shrink. It is getting more and more important to take reliability requirements into account even for consumer products. This paper presents a dynamic control flow checking (DCFC) technique for high reliable computer systems. The DCFC technique...
Application Specific Instruction-set Processors (ASIPs) are needed to handle the future demand of flexible yet high performance computation in mobile devices. However designing an ASIP is complicated by the fact that not only the processor but, also tools such as assemblers, simulators, and compilers have to be designed. Novel Generator of Accelerators And Processors (NoGap), is a design automation...
Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter...
We propose a performance estimation technique for a multi-core segmented bus platform, SegBus. The technique enables us to assess the performance aspects of any specific application on a particular platform configuration, modeled in Unified Modeling Language (UML). We present methods to transform Packet Synchronous Data Flow (PSDF) and Platform Specific Model (PSM) models of the application into Extensible...
The speed of the memory subsystem often constrains the performance of large-scale parallel applications. Experts tune such applications to use hierarchical memory subsystems efficiently. Hardware accelerators, such as GPUs, can potentially improve memory performance beyond the capabilities of traditional hierarchical systems. However, the addition of such specialized hardware complicates code porting...
This paper proposes a reconfigurable multi-core architecture, called hyperscalar that enables many scalar cores to be united dynamically as a larger superscalar processor to accelerate a thread. To accomplish this, we propose the virtual shared register files (VSRF) that allow the instructions of a thread executed in the united cores to logically face a uniform set of register files. We also propose...
Virtualization enables multiple guest operating systems run on a single physical platform. These virtual machines may host any types of application, including concurrent HPC programs. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains, rarely consider VCPUs' behaviors. However, this can result in poor application performance to overcommitted domains...
Multi-core trends are becoming dominant, creating sophisticated and complicated cache structures. Also, the bigger shared level-2 (L2) caches are demanded for higher cache performance. One of the easiest ways to design cache memory for increased performance is to double the cache size. However, the big cache size is directly related to the area and power consumption. Especially in mobile processors,...
This paper describes an efficient data structure called the Bucket-Heap (BH) for accelerating the widely employed Dijkstra's shortest path algorithm in hardware. We adopt an architecture model consisting of a computational core and memory unit that maintains the network topology. It has been shown that the proposed data structure leads notable reduction in the memory I/O accesses required to perform...
The PERCS system was designed by IBM in response to a DARPA challenge that called for a high-productivity high-performance computing system. A major innovation in the PERCS design is the network that is built using Hub chips that are integrated into the compute nodes. Each Hub chip is about 580 mm2 in size, has over 3700 signal I/Os, and is packaged in a module that also contains LGA-attached optical...
The embedded multi-media terminal was designed and developed, which using SAMSUNG Corporation's S3C2410 chip as core processor. Firstly, an embedded Linux operating platform has been built in the UP-NETARM2410-S target machine according to system requirements, which includes boot-loader, kernel, file system, and related device drivers. Then the upper computer equipped Qt/Embedded as SDK(Software Development...
This paper introduces a novel micro-scheduling system based on Cell processor. The improved Lawler's algorithm is applied to solve the multi-task scheduling problems in Cell software system development process, and high efficiency is achieved. The multilayer decoupled software design model is adopted. The sample applications are explored and tests show that the system could satisfy many high performance...
The core of the hardware structure of the node of wireless communication network is microprocessor, while choosing microprocessor must consider the compatibility to communication protocols. In design, the new protocol widely used recently is SimpliciTI, and the microprocessor compatible to the protocol selected is MSP430F5438, which features low power consumption and low cost. The hardware composition...
In order to realize high-speed and real-time collection of data and facilitate data analysis and processing, a USB data acquisition system with acquisition, display and storage function was designed in this paper. The hardware of the system consists mainly of AT89S51 as the local processor and PDIUSBD12 as the USB interface device, and the underlying data communication is operated by the firmware...
In this paper motion compensation IP core design based on SOPC technology is researched, which achieves the software hardware co-design method in video decoding to overcome the drawbacks of the software decoding and hardware decoding. The design of hardware modularization which is based on the motion compensation algorithm in MPEG-4 video decoding standard is completed by using verilog HDL language...
This system designed IC RF card lab management system on the basis of microprocessor AT89C52 which was produced by Atmel company. This text introduced the influence of hardware environment to IC RF reading and writing system, analysed and designed the whole IC RF card read-write equipment from the angle of security, confidentiality and stability of the system.
The paper analyzes the problems of present single-chip microcomputer experiment teaching and the features of Proteus software. Teaching reform that brings EDA technology into the experimental teaching of MCU is proposed. It discusses how to use Proteus software to construct a single-chip virtual experiment platform. Through the simulation in Proteus soft and project designing & making based on...
In this paper we present a rapid prototyping platform on a single Field Programmable Gate Array (FPGA) with support for software transactional memory. The system is composed only by off-the-shelf cores and is useful for porting and early validation of programs to the transactional memory programming model. We discuss the implementation of the software layer of this platform, propose an analysis of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.