The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Real-time, low-latency, image processing with high throughput is vital for many time-critical applications in fields such as medical imaging, robotics, and wearable computers. Traditionally, FPGAs have often been employed to meet these requirements. However, due to the productivity challenges, using FPGAs may not be viable in some cases. Alternatively, the typical approach of processing an image on...
Today's datacenter is shared among various applications with different QoS requirements, which poses a great challenge to deliver low delay transport with high throughput. Most of works address this challenge by reducing the in-network delay, but assumes a negligible local delay. However, we show that this assumption does not hold for a multi-tenant datacenter that a physical machine is shared by...
Achieving low and predictable execution time of short jobs in Hadoop clusters has gained a great attention due to their importance on system productivity and user experience. However, one major contributor that makes it challenging is diskI/O interference. We observed that disk writes unintentionally block latency-sensitive short jobs and cause unexpected high latency. Unfortunately, previous research...
This paper presents a polar code design for block fading channels when no channel state information is available at the transmitter, which involves that the frozen bits cannot be changed dynamically with the fading realizations. An outer parallel code is concatenated with an inner polarization kernel that changes the properties of the block fading channel. The rate-splitting between the parallel outer...
With development of wireless communication technologies, mobile devices are commonly equipped with multiple network interfaces and ready to adopt emerging transport layer protocols such as multipath TCP (MPTCP). The protocol is specifically useful for Internet of Things streaming applications with critical latency and bandwidth demands. To achieve full potential of MPTCP, major challenges on congestion...
High performance filtering has been in ever increasing demand for a range of applications, especially for real-time image/video processing. Guided image filter is one of the widely used image filters. Among them, the gradient domain guided image filter for edge-preserving smoothing and for mitigating the halo-artifacts problem existed in the current guided image filters is reported recently. Due to...
File system metadata is indispensable in both describing the data and maintaining the file system. Despite the importance of metadata in the file system, the overhead of maintaining the metadata cannot be taken lightly. It is because the metadata also have to be persisted on the storage device and it consumes IO bandwidth as well as creates journaling overhead. In this paper, we find that the random...
Accelerators, such as Graphic Processing Units (GPUs), are popular components of modern parallel systems. Their energy-efficient performance make them attractive components for modern data center nodes. However, they lack control for fair resource sharing amongst multiple users. This paper presents a runtime and Just In Time compiler that enables resource sharing control and software managed scheduling...
Software-based network packet processing on standard high volume servers promises better flexibility, manageability and scalability, thus gaining tremendous momentum in recent years. Numerous research efforts have focused on boosting packet processing performance by offloading to discrete Graphics Processing Units (GPUs). While integrated GPUs, residing on the same die with the CPU, offer many advanced...
While feasibility and obtaining a solution of a given network coding problem are well studied, the decoding procedure and complexity have not garnered much attention. We consider the decoding problem in a network wherein the sources generate multiple messages and the sink nodes demand some or all of the source messages. We consider both linear and non-linear network codes over a finite field and propose...
View-interpolation-based refocusing achieves realistic quality for sparse light fields but requires lots of computation. In this paper, we aim to reduce the computation load while maintaining the superior refocusing quality. The idea is to interpolate only few views for infocused regions and to perform refocusing on downsampled pixels for defocused area. This is achieved b y a proposed block-based...
Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...
Network Function Visualization (NFV) and Software Defined Network (SDN) currently play a key role to transform the network architecture from hardware-based to software-based. Along with cloud computing, NFV and SDN are moving network functions from dedicated hardware to software implementation (Virtual Network Functions — VNF), on Virtual Machine (VM) or other virtualization technology such as containers,...
Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread use of such many-core architectures to accelerate general purpose applications. Nevertheless, tuning applications to efficiently exploit the GPU potentiality is a very challenging task, especially for inexperienced programmers. This is due to the difficulty of developing a SW application for the specific...
The P4 language is an emerging domain-specific language for describing the data plane processing at a network device. P4 has been mapped to a wide range of forwarding devices including NPUs, programmable NICs and FPGAs, except for General Purpose Graphics Processing Unit (GPGPU) which is a salient parallel architecture for processing network flows. In this work, we design a heterogeneous architecture...
Traditional hierarchical RAID causes huge GPU overhead and does not support node failure. To resolve this problem, this paper proposes a new hierarchical redundant array of inexpensive disks (RAID)'s parity generation using pass-through GPU in multi virtual-machine (VM) environment. The proposed method reduces GPU overhead and parity generation time, and supports node failure compared to the traditional...
This work shows that behavioral IPs (BIPs) are often over-designed when used in heterogenous Multi-Procesosr SoCs (MPSoCs) mainly because they are designed and optimized separately. When instantiated in an MPSoC, these IPs often haven to wait for data from the master and also wait to gain access to the bus to return the results. Behavioral IPs have the advantage over traditional RTL-based IPs that...
Graphics Processing Units (GPUs) are used today as affordable energy-efficient method of acceleration for computationally exhaustive algorithms to decrease execution time exploiting the power of parallel programing techniques. In the field of medical imaging, GPUs became crucial acceleration method for computationally exhaustive algorithms. This paper presented the effect of memory optimization on...
Computational cost presents a barrier in the application of machine learning algorithms to large-scale real-time learning problems. Kernel adaptive filters (KAFs) have low computational cost with the ability to learn online and are hence favoured for such applications. Unfortunately, dependencies of the outputs on the weight updates prohibit pipelining.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.