The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The HEVC is one of the most recent video coding standards, developed in order to face upcoming challenges, due to higher video quality and resolution. One of the HEVC components is the entropy encoder, which consists only of the Context Adaptive Binary Arithmetic Coding (CABAC) algorithm. The CABAC algorithm imposes some severe difficulties in order to achieve increasing throughput, due to the high...
Polar codes are a family of error correcting codes that achieves the symmetric capacity of memoryless channels when the code length N tends to infinity. However, moderate code lengths are required in most of wireless digital applications to limit the decoding latency. In some other applications, such as optical communications or quantum key distribution, the latency introduced by very long codes is...
In this paper, compact memory strategies for partially parallel Quasi-cyclic LDPC (QC-LDPC) decoder architecture are proposed. By compacting several adjacent rows hard decisions and extrinsic messages into one memory entry, which not only reduces the number of memory banks for hard decisions, but also facilitates multiple data accesses per clock cycle, the throughput of the decoder is increased. We...
In this paper, we propose two different hardware structure of SHA-3 hash algorithm for different width of circuit interface. They both support the four functions SHA3-224/256/384/512 of SHA-3 algorithm. The padding unit of our design is also implemented by hardware instead of software. Besides, a 3-round-in-1 structure is proposed to speed up the throughput of our circuit. We conduct an implementation...
In today's high performance computing (HPC) environments, analyzing and predicting the performance of multiple-processor systems (clusters cores) on critical workloads remains a challenge. This is as a result of the key metrics that influences system's behavior. Busty arrivals in HPCs demand either a shared memory-parallel architecture or pipelined dataflow architecture. At present, a processor model...
Polar codes have been selected for use within 5G networks, and are being considered for data and control channel for additional 5G scenarios, like the next generation ultra reliable low latency channel. As a result, efficient fast polar code decoder implementations are essential. In this work, we present a new fast simplified successive cancellation (Fast-SSC) decoder architecture. Our proposed solution...
When encrypting a single file in the CBC mode of 3DES, there is a feedback path which brings data dependency. Even much more resources are given, it does not help matters to increase the throughput of 3DES. In this paper, we propose a logic simplifying method to accelerate the throughput in the CBC mode. In the datapath, 15 levels of XORs from the critical path can be moved to the non-critical path...
This paper introduces an accuracy/energy-flexible configurable 2D Gabor filter based on stochastic computation, where bit streams representing information are used. The Gabor filters show a powerful feature extraction capability, but the calculation based on binary computation is complicated. As opposed to traditional memory-based methods that use fixed Gabor coefficients calculated by software in...
CPU-GPU heterogeneous systems are emerging are emerging as architectures of choice for high-performance energy-efficient computing. Designing on-chip interconnects for such systems is challenging: CPUs typically benefit greatly from optimizations that reduce latency, but rarely saturate bandwidth or queueing resources. In contrast, GPUs generate intense traffic that produces local congestion, harming...
Ultra-deep sub-micron technology is shifting the design paradigm from area optimization to power optimization. In the context of Network-on-Chip (NoC) based design, energy consumption due to data transfer among network nodes is no longer negligible. Starting from the observation that, among the two brain hemispheres around 1 out of 106 synapses are active at the same time, in this paper we propose...
The degree to which Turbo-Code decoder architectures can be parallelized is constrained by requirements for flexibility with respect to code block sizes and code rates. At the same time throughput requirements are expected to increase by a factor of up to 20x for 5G networks, which are currently undergoing standardization. The limiting factors for the throughput of a Turbo-Code decoder are maximum...
System-Level simulator is proposed to determine the ability of synchronous and asynchronous NoCs to alleviate the process variation effect. Throughput variation and different delay components variation are provided by the newly developed framework. System-Level simulation shows similarities with circuit-level simulation in terms of behavior and performance variation trend when moving from one technology...
Reducing the configuration time of portions of an FPGA at run time is crucial in contemporary FPGA-based accelerators. In this work, we propose a method to increase the throughput for FPGA dynamic partial reconfiguration by using standard IP blocks. The throughput is increased by over-clocking the configuration bitstream circuitry beyond the limits stated in the specifications of these standard blocks...
Today's network traffic are dynamic and fast. Conventional network traffic classification based on flow feature and data mining are not able to process traffic efficiently. Hardware based network traffic classifier is needed to be adaptable to dynamic network state and to provide accurate and updated classification at high speed. In this paper, a hardware architecture of online incremental semi-supervised...
Hash functions represent a fundamental building block of many network security protocols. The SHA-3 hashing algorithm is the most recently developed hash function, and the most secure. Implementation of the SHA-3 hashing algorithm in Hardware Description Language (HDL) is time demanding and tedious to debug. On the other hand, High-Level Synthesis (HLS) tools offer potential solutions to the hardware...
A Decoder working on the logic of LDPC is designed for a 8 bit Logical ALU. The Simulation has been done to minimize the Voltage Leakage and Maximum throughput.
In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the...
The 1588 Precision Timing Protocol (1588-PTP) states that a timestamp event is generated at the time of transmission and reception of any event message and that the timestamp event occurs when the message's timestamp point crosses the boundary between the node and the network (event generation points). The protocol defines the message timestamp point for an event message as the beginning of the first...
In this paper, we have first characterized candidates of the Competition for Authenticated Encryption, Security, Applicability, and Robustness (CAESAR) from the point of view of their suitability for parallel processing of multiple blocks of associated data, message, and ciphertext. Then, we have chosen seven candidates from the Round 2 and Round 3 submissions, namely SCREAM, AES-COPA, Minalpher,...
A number of critical design decisions, such as network topology, buffer sizes, flow control mechanism and so on so forth, have to be evaluated in any NoC the design. Designs and verifications of NoCs are based on either software simulations, which are extremely slow and inaccurate for complex models, or hardware emulations using low/mid-class FPGAs, where the scalability of the NoC system is intensively...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.