The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today's network traffic are dynamic and fast. Conventional network traffic classification based on flow feature and data mining are not able to process traffic efficiently. Hardware based network traffic classifier is needed to be adaptable to dynamic network state and to provide accurate and updated classification at high speed. In this paper, a hardware architecture of online incremental semi-supervised...
Exploiting resource reusability and low precision in neural networks is a promising approach to achieve energy efficient computational platforms. This research presents two generalizable approaches to reuse resources in feed-forward neural networks and demonstrated on extreme learning machines. In the first approach, coalescing, a single stack of neuronal units perform both feature extraction and...
Hash functions represent a fundamental building block of many network security protocols. The SHA-3 hashing algorithm is the most recently developed hash function, and the most secure. Implementation of the SHA-3 hashing algorithm in Hardware Description Language (HDL) is time demanding and tedious to debug. On the other hand, High-Level Synthesis (HLS) tools offer potential solutions to the hardware...
Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...
A number of critical design decisions, such as network topology, buffer sizes, flow control mechanism and so on so forth, have to be evaluated in any NoC the design. Designs and verifications of NoCs are based on either software simulations, which are extremely slow and inaccurate for complex models, or hardware emulations using low/mid-class FPGAs, where the scalability of the NoC system is intensively...
Belief propagation (BP) polar code decoder is well-studied from many aspects. This study proposes a hardware optimization to improve performance of polar BP decoder by modifying both processing element (PE) and early stopping criterion (ESC). PE is optimized by using high-speed parallel-prefix Ling adder instead of carry ripple adder and WIB ESC introduced in literature is optimized by removing unnecessary...
Nowadays, many emerging technologies, such as Augmented and Virtual reality, require extremely high-rate data transmissions. This imposes an increasing demand on the network throughput, which currently surpasses the capabilities of commercially available wireless communication systems. To address this constraint, some companies are considering the implementation of high-throughput wired technologies,...
There are many available NAT64 implementations, but we can not measure their performance per the standards, due to the lack of complaint testers. The aim of our effort is to design and write the first implementation of a test program that could provide the first answer to these needs. For benchmarking Network Interconnect Devices we could use the recommendation of the 2544 (IP version independent)...
Datacenters should provide bandwidth guarantees to tenants for performance predictability. Ideally, this process should attain three important characteristics: work conservation, fairness, and simplicity. The first one indicates that tenants can utilize unused bandwidths effectively without harming the bandwidth guarantee. The second one means that tenants share the unused bandwidth following a certain...
Using a new input restructuring sequence and an appropriate reordering of the elements involved, a new VLSI algorithm that uses short length pseudo-cycle convolution structures for the VLSI implementation of discrete sine transform is presented. It uses a new parallel decomposition of discrete sine transform (DST) that leads to a high throughput VLSI implementation with a low hardware cost. The proposed...
Providing the optimal configuration for a software router poses a lot of technical challenges that do not present in the dedicated hardware router. One of them is how to characterize performance varying due to different configurations on commodity hardware. This paper addresses the problem of configuring a software router that provides the minimum of average packet latency. Since changing all combinations...
Network Function Virtualization promises to reduce the overall operational and capital expenses experienced by the network operators. Running multiple network functions on top of a standard x86 server instead of dedicated appliances can increase the utilization of the underlying hardware and reduce the maintenance and management costs. However, total cost of ownership calculations are typically a...
Today's data center servers are equipped with high speed and complex network adaptors, featuring an array of functions, e.g. hardware TX/RX queues, packet filters, rate limiters, etc. Recent work like IX, Arrakis, MultiStack has made us rekindle the user-level network stacks' innovation utilizing these commodity network adaptors. In this paper, we revisit the idea to move stacks' design from in-kernel...
Field Programmable Gate Arrays (FPGAs) excel at the implementation of local operators in terms of throughput per energy since the off-chip communication can be reduced with an application-specific on-chip memory configuration. Furthermore, data-level parallelism can efficiently be exploited through socalled loop coarsening, which processes multiple horizontal pixels simultaneously. Moreover, existing...
Many scientific applications rely on evaluation of elementary functions. Nowadays, high-level programming languages provide their own elementary function libraries in software by using lookup table and/or polynomial approximation. However, one downside is slow since lookup tables could keep cache thrashing and polynomial approximations require a number of iterations to converge. Thus, elementary functions...
RISC-V is a new open-source general-purpose instruction set architecture (ISA) developed by the University of California, Berkeley. It allows everyone to design their hardware circuits based on application characteristics and can be used in embedded devices, desktop computer and high-performance servers. In this paper, we use the RISC-V processor to design a fast network packet processing system....
Screen content coding (SCC) extension to High Efficiency Video Coding (HEVC) offers substantial compression efficiency over the existing HEVC standard for computer generated content. However, this gain in compression efficiency is achieved at the expense of further computational complexity with several resource hungry coding tools. Hence, extension of SCC to HEVC hardware encoders can be challenging...
Cloud storage services are associated with high latency variance, and degraded throughput which is problematic when users are fetching and storing content for interactive applications. This can be attributed to performance hotspots created by slow nodes in a storage cluster, and performance interference caused by multi-tenancy, and background tasks such as data scrubbing, backfilling, recovery, etc...
In this paper we present a complete, open-source GZIP compressor implementation for FPGA based on a systolic array architecture. GZIP is one of the most utilized compression algorithms. Besides the usual use-case of compression for data storage, distributed computing systems such as Hadoop utilize compression to reduce the amount of data which is transferred between computing nodes in a cluster. However,...
This paper presents an ultra-high-speed/area-efficient Polar encoder design with very high system throughput for emerging next-generation 5G applications. In a demonstrated design example, the proposed hardware architecture is mainly based on 16-parallel radix-2 processing engines. An 8192-point Polar encoder is designed and synthesized with TSMC 40-nm CMOS technology, operating at clock frequency...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.