The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...
Deep convolutional neural networks (CNN) have shown great accuracy on object recognition and classification tasks. Deep CNNs are computation intensive algorithms, hence many customized RRAM crossbar-based accelerators are proposed to meet the computing demands in deep CNNs, but the area costs and the power consumption are still great challenges for RRAM crossbar-based accelerators. In this work, we...
An energy-efficient hybrid neural network (NN) processor is implemented in a 65nm technology. It has two 16×16 reconfigurable heterogeneous processing elements (PEs)arrays. To accelerate a hybrid-NN, the PE array is designed to support on demand partitioning and reconfiguration for parallel processing different NNs. To improve energy efficiency, each PE supports bit-width adaptive computing to meet...
Automatic and accurate human upper-body detection and orientation estimation have great practical value in several computer vision applications. Most previous works on human upper-body orientation estimation assume that the human upper-body region is already detected and aligned. However, this is not the case in many real-world scenarios. Additional human detector is essential which is usually much...
The coarse-grained reconfigurable architecture (C-GRA) is a promising platform that provides both high performance and high power-efficiency. Dataflow graph (DFG) mapping is critical to tap the potentials of CGRAs. Inspired from the great progress made in tree search game using deep neural network, we proposed a frame work for learning convolutional neural network for mapping DFGs onto spatial programmable...
High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...
Due to low masking-complexity property of the addition chain, it has been widely researched for evaluating the S-boxes in the recent literatures. This paper summarizes four main addition chains developed for the AES S-box in the existing literatures and chooses the most area-efficient addition chain. To further reduce the masking complexity, this paper proposes an improved algorithm for evaluating...
Minimum mean square error hometd has proved its superiority for signal detection in massive multiple-input multiple-output (MIMO) systems for its near-optimal performance. However, the detection efficiency is restrained by a high computation complexity and low parallelism operation of matrix inversion. This paper presented a hardware efficient signal detector based on low complexity Lanczos Method,...
Convolutional neural networks (CNNs) haveachieved great success in many applications. Recently, variousFPGA-based accelerators have been proposed to improve theperformance of CNNs. However, current most FPGA-basedmethods use single bit-width selection for all CNN layers, which lead to very low resource utilization efficiency anddifficulty in further performance improvement. In this paper, we propose...
In this work, a fast shape searching face alignment (F-SSFA) algorithm based accelerator is proposed to achieve real-time processing. Firstly, a learning based low-dimensional SURF feature is introduced to reduce the computation cost in the cascaded regression. Then the Euclidean distance and shape affine transformation are utilized to accelerate the shape searching procedure. F-SSFA therefore greatly...
The pipeline stall in distributed-controlled coarse-grained reconfigurable arrays is a major source stumbling performance. This work presents a Triggered-Issue and Triggered-Execution (TITE) paradigm motivated from the Triggered Instruction Architecture (TIA) which converts control and data dependencies into predicate dependencies as triggers for spatial parallelism. TITE separately triggers the issuing...
The superior controllability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to the motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. Here, we propose an efficient hardware design for cerebellar models using approximate...
Coarse Grained Reconfigurable Architectures (C-GRAs) have been paid an increasing attention due to their inherent advantages of high performance and energy efficiency. As we know, multi-Vdd technique is popularly used to reduce energy consumption, and modulo scheduling is one of widely-used pipeline techniques to improve performance. To achieve both high performance and energy-efficiency simultaneously,...
The coarse-grained reconfigurable architecture (CGRA) is a promising platform for mobile computing. In this work, based on the battery nonlinear effects, we propose a method to achieve co-optimization of task partition and multi-cell battery scheduling with dynamical voltage scaling (DVS) for CGRA computing platform. Experimental results show that average 33.6% improvement in battery runtime over...
Hardware Trojans have become a significant threat to computing reliability and data security in reconfigurable hardware. One of the most effective techniques of run-time detection and recovery is based on Triple Modular Redundancy (TMR) mechanism; however, this mechanism causes a large resource overhead because the protected circuit needs to be totally duplicated twice for detection stage and decision...
For large-scale multiple-input multiple-output (MIMO) systems, linear minimum mean square error (MMSE) method is one of the most near-optimal ways for signal detection. However, MMSE involves matrix inversion which is of high complexity for computation. In this paper, a Lanczos-based method is proposed to solve the problem by transferring the matrix inversion computation into an iteration process...
Temperature evaluation is a key point to the static power calculation and thermal management for application mapping. This paper proposes a design-time (offline) Heat Conduction Grid Model (HCGM) to evaluate the temperature of network-on-chip, which is based on the temperature dependency on heat flux density produced by ambient tiles as well as itself. This model incorporates (1) short running time...
In this paper, we propose a very large scale integration design method for a large-scale multiple-input multiple-output detection algorithm. Our design uses a modified version of the Successive Over Relaxation (SOR) method, which substantially reduces the highly computational complexity of data detection and achieves the near-optimal performance. We use a reconfigurable Processing Elements Array (PEA)...
Artificial Neural Network (ANN) is widely used in machine learning and artificial intelligence areas. But ANN requires a long running time and induces a high power consumption when running on a GPU or CPU which may hinder its application in embedded system. This paper proposes a hardware accelerator design guideline for ANN with arbitrary scales and depths. We take full consideration of the hardware...
More and more mobile phones are equipped with multiple sensors today. This creates a new opportunity to analyze users’ daily behaviors and evolve mobile phones into truly intelligent personal devices, which provide accurate context-adaptive and individualized services. This paper proposed a MAST (Movement, Action, and Situation over Time) model to explore along this direction and identified key technologies...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.