The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, a low-cost accelerator for the ηT pairing in characteristic three over the super-singular elliptic curves is designed. As the critical operations of ηT pairing, the cubing and sparse multiplications over GF(36m) in the Miller's algorithm are merged and their arithmetic are modified and scheduled to reduce the intermediate data related overhead. With these optimizations, the Miller's...
Hash functions represent a fundamental building block of many network security protocols. The SHA-3 hashing algorithm is the most recently developed hash function, and the most secure. Implementation of the SHA-3 hashing algorithm in Hardware Description Language (HDL) is time demanding and tedious to debug. On the other hand, High-Level Synthesis (HLS) tools offer potential solutions to the hardware...
This paper presents DeepPump, an approach that generates CNN hardware designs with multi-pumping, which have competitive performance when compared with previous designs. Future work includes integrating DeepPump with other optimisations, and providing further evaluations on various FPGA platforms.
In recent years, approximate computing has emerged as a promising approach to trade off quality of computed outputs for energy savings. In this paper, we present an approximate high-level synthesis (AHLS) approach that outputs a quality-energy optimized register-transfer-level implementation from an accurate high-level C description. Existing AHLS work only considers switching activity for energy...
In this paper, a novel Greybox design methodology is proposed to establish a design and co-optimization flow across the boundary of conventional software and hardware design. The dynamic timing of each software instruction is simulated and associated with processor hardware design, which provides the basis of ultra-dynamic clock management. The proposed scheme effectively implements the instruction-based...
Sigmoid and Hyperbolic Tangent are widely used as activation functions in artificial neural networks. Exponential term and division are basic building blocks of these functions. This paper proposes precise and efficient hardware implementations for sigmoid and hyperbolic tangent functions using exponential function approximation. Performance of both functions has been verified which shows that the...
In High Efficiency Video Coding (HEVC) and H.264/AVC video coding standards, Interpolation filtering used for sub-pixel interpolation is one of the most computational intensive parts of the standards. Video processing systems are becoming more complex thus decreasing the productivity of the hardware designers and the software programmers, producing design productivity gap. To fill this productivity...
This paper presents an efficient ASIC implementation of the low area and ultra-low power AES encryption core with an optimized S-box, Rcon and control blocks optimization, combined with a simple clock gating technique using an ultra-low power 65nm SOTB CMOS technology. The ASIC implementation results show that the proposed AES encryption core requires a small number of clock cycles with ultra-low...
High level synthesis tools are an attractive option for rapid prototyping and implementation of hardware designs. In this paper we present a case study of using such a tool for the design and implementation of an FFT core for use in a wireless modem. The optimizations used for directing the conversion of C code to hardware are discussed and the impact of the different directives is analyzed. The resulting...
Loop pipelining is widely adopted as a key optimization method in high-level synthesis (HLS). However, when complex memory dependencies appear in a loop, commercial HLS tools are still not able to maximize pipeline performance. In this paper, we leverage parametric polyhedral analysis to reason about memory dependence patterns that are uncertain (i.e., parameterised by an undetermined variable) and/or...
The adoption of HLS has been driven by the need to tackle growing verification costs in traditional RTL design flows. This paper presents an overview of design, optimization and verification using HLS. It also outlines some of the requirements for HLS design to fit into existing design and verification flows and ways in which such flows might be adapted as HLS is more widely deployed.
Field Programmable Gate Arrays (FPGAs) have been extensively used in accelerating applications in many digital domains, examples include image and signal processing. These applications have been abundantly tested in high level languages like C, C++ and Matlab programming. Many standard libraries exist for image processing applications like OpenCV for end to end solutions. Applications centered around...
Complex multiplications are the backbones of almost all Digital Signal Processing (DSP) algorithms and several other scientific applications. Complexity Reduction of these operations at architectural level or algorithmic level can certainly save the chip area, which ultimately can be a driver parameter for selection of power or speed optimized architectures. Improvement in these performance parameters...
Design space exploration (DSE) is now an important phase of the SoC design process, in order to realize high-efficiency design. In conventional DSE, design metrics such as speed, power and area are extensively used to evaluate various design options. As IP-reuse is widely adopted, protection of hardware IPs has been paid more and more attention at advanced design processes. This paper considers IP...
In this work is presented a new hardware implementation of a high speed logic analyzer inside FPGA (Field Programmable Gate Array) chips that is fully autonomous by directly driving a VGA compatible computer monitor for multiple signals display. It can be used as a very low cost and real time testing instrument for both external hardware and internal FPGA designs. The implementation is optimized at...
The paper discusses possibilities of rearranging test decompress or internal structure and linking its outputs with the parallel scan chain inputs in order to obtain better compression efficiency while the hardware overhead is not increased. We have experimentally verified that the controllability of decompress or outputs can be used as a simple and easily computable measure of the decompress or efficiency...
In the past decade, we observed the trend of technological advancement towards the field of portable electronics. As electronic devices shrink in size, constraints emerge in the form of limited power supply and area for the implementation of information security mechanisms. In this work, our goal is to produce a complete AES block cipher for data encryption and perform optimization in terms of power...
Testing and verifying wireless systems in a real world environments is a challenging but an important problem. This is particular true for the Joint Tactical Radio System (JTRS) where the modulation techniques are optimized towards environments that are difficult to reproduce (e.g., ship to plane, plane to satellite communications). Such cases necessitate a wireless channel emulator to facilitate...
Meeting the 20MW power envelope sought for exascale is one of the greatest challenges in designing those class of systems. Addressing this challenge requires over-provisioned and dynamically reconfigurable system with fine-grained control on power and speed of the individual cores. In this paper, we present EfficientSpeed (ES), a library that improves energy efficiency in scientific computing by carefully...
Vector dot product is an important computation which needs hardware accelerators. We present an optimized accelerator chip that has larger capacity than our prior designs. This design can compute product for 10000 component vectors within 1000 clock cycles, with average being 80 cycles. Our design has superior speed compared to other accelerators.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.