The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The widely accepted block-matching technique, which is required to identify motion vectors, fails in cases in which texture is not existent. In [1], we proposed a hardware-oriented cellular-automaton algorithm that generates spatial patterns on textureless objects and backgrounds, aiming at motion-vector estimation of textureless moving objects. This demonstration presents a field-programmable gate...
The expanding use of deep learning algorithms causes the demands for accelerating neural network (NN) signal processing. For the NN processing, in-memory computation is desired, in which expensive data transfer can be eliminated. In reflection of recently proposed binary neural networks (BNNs), which can reduce the computation resource and area requirements, we designed an in-memory BNN signal processor...
A versatile reconfigurable accelerator for binary/ternary deep neural networks (DNNs) is presented. It features a massively parallel in-memory processing architecture and stores varieties of binary/ternary DNNs with a maximum of 13 layers, 4.2 K neurons, and 0.8 M synapses on chip. The 0.6 W, 1.4 TOPS chip achieves performance and energy efficiency that is 10–102 and 102–104 times better than a CPU/GPU/FPGA.
In this paper, we present a new architecture forFPGA checkpointing along with an efficient mechanism. Wethen provide a static analysis of original HDL source code toreduce the cost of hardware for checkpointing functionality. Ourevaluations show that with the proposals, checkpointing hardwarecauses small degradation in maximum clock frequency (less than10%). The LUT overhead varies from 14.4% (Dijkstra)...
FPGAs provide reconfigurability and high performance for parallel applications. Modern FPGAs can be integrated in computing systems as accelerators so that they can combine with host CPU to execute offload applications. This integration puts more pressure on the fault tolerance of computing systems and the question how to improve the dependability becomes crucial. Similar to CPU-based system, checkpoint/restart...
Power-constrained computing is now becoming essential paradigm in both high performance computing and embedded systems. Power budget is dynamically assigned to each computing resource for improving energy efficiency and system throughput. Modern computer systems have accelerator devices, such as GPUs and FPGAs, for higher energy efficiency and performance. Therefore, power management mechanisms of...
Modern microprocessors have a number of cores and complicated structures, such as multi-level caches. Behavior analysis of modern complicated processors is important for software performance optimizations, processor architecture researches, and education purposes. Currently, a number of tools are available for checking the behavior of processors such as processor simulators, debuggers and profilers...
Stencil computation is one of the basic but important operation patterns for various applications, such as image processing. Various GPU-based and application-specific hardware approaches have been recently proposed. However, available absolute energy capacity and hardware size are limited in embedded systems. Therefore, energy efficient, small footprint, and high performance accelerator is necessary...
This research is our first step on the purpose of developing low-complex Viterbi decoder for IoT applications. We evaluate how the values of Viterbi decoder's parameters such as trace back length (L), input data bit-width (D), and LLR truncated value (E), affects to BER and PER of a communication system. The IEEE 802.11ah simulator is used with AWGN channel and BPSK modulation. Our simulation results...
Convolutional neural network (CNN) is an emerging approach for achieving high recognition accuracy in various machine learning applications. To accelerate CNN computations, various GPU-based or application-specific hardware approaches have been recently proposed. However, since they require large computing hardware and absolute energy amount, they are not suitable for embedded applications. In this...
EMAX: Energy-aware Multimode Accelerator Extension is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data and image processing and also to achieve low power consumption. However, before mapping algorithms on the accelerator, application developers should have sufficient...
Rapid prototyping using FPGAs is a widely-applied approach for efficient evaluation of hardware structures. We present a rapid prototyping framework by virtually enlarging available FPGA resources. In order to mitigate the development complexity of FPGA-based hardware prototype, the framework provides two abstractions of resources on FPGA platforms: Memory systems and inter-FPGA interconnections on...
FPGA-based rapid prototyping is widely applied for fast simulations of hardware structure verifications. In this paper, we propose flipSyrup, a prototyping framework for cycle-accurate hardware simulations on abstract FPGA platforms. In order to mitigate the development complexity of FPGA-based simulators, the framework provides two abstractions of resources on FPGA platforms: Memory systems and inter-FPGA...
Soft processors have been commonly used in FPGAbased designs to perform various useful functions. Some of these functions are not performance-critical and required to be implemented using very few FPGA resources. For such cases, it is desired to reduce circuit area of the soft processor as much as possible. This paper proposes Ultrasmall, a small soft processor for FPGAs. Ultrasmall supports a subset...
We have proposed the effective stencil computation method and the architecture by employing multiple small FPGAs with 2D-mech topology. In this paper, we show that our proposed architecture works correctly on the real 2D-mesh connected FPGA array. We developed a software simulator in C++, which emulates our proposed architecture, and implemented two prototype systems in Verilog HDL. One prototype...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.