The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
When a standard TCP implementation using the minimum retransmission timeout (RTOmin) of 200 ms is used in distributed file systems in data centers, a well-known throughput degradation called TCP Incast occurs, because 200 ms is too large as an RTOmin in data centers. In order to avoid TCP Incast, a TCP implementation using a much smaller RTOmin attained by a fine-grained kernel timer is proposed....
Network performance is one of the most important entities in today’s long-distance networks. TCP congestion control mechanisms play an important role in these networks. Most of the current TCP congestion control mechanisms which are also known as TCP variants, detect congestion and slow down the packets transmission to avoid further congestion in the network. In this paper, three classes...
Interactive video streaming requires very low latency and high throughput. Traditional latency based congestion control algorithm performs poorly in fairness. This results in very poor video quality to adaptive video streaming. Software defined networks (SDN) enables us to solve the problem by designing a network controller in the routers. This paper presents a SDN-centric TCP where sending rate of...
This paper introduces an accuracy/energy-flexible configurable 2D Gabor filter based on stochastic computation, where bit streams representing information are used. The Gabor filters show a powerful feature extraction capability, but the calculation based on binary computation is complicated. As opposed to traditional memory-based methods that use fixed Gabor coefficients calculated by software in...
Implementing complex arithmetic routines with Single Instruction Multiple Data (SIMD) instructions requires the use of instructions that are usually not found in their real arithmetic counter-parts. These instructions, such as shuffles and addsub, are often bottlenecks for many complex arithmetic kernels as modern architectures usually can perform more real arithmetic operations than execute instructions...
In this study, we demonstrate that the performance may be undermined in the state-of-the-art intra-SM sharing schemes for concurrent kernel execution (CKE) on GPUs, due to the interference among concurrent kernels. We highlight that cache partitioning techniques proposed for CPUs are not effective for GPUs. Then we propose to balance memory accesses and limit the number of inflight memory instructions...
The advent of 8K and better resolutions of video pose problems for the capture and storage of data by these standards. The contemporary alternative is to compromise on quality and use various (often lossy) compression techniques to reduce the bandwidth required to move this data. This paper proposes a novel method for handling large volumes of video data without compromising its quality through space...
Network function virtualization (NFV) is a concept aiming to achieve telecom grade cloud ecosystem for new generation networks focusing on Capital and Operational expenditure (CAPEX and OPEX) savings. Keeping at least the same performances is one of the main requirements of the applications when being virtualized. This work presents a performance impact of Open Virtual Switch (OVS) user-space forwarding...
The ever changing nature of network technology requires a flexible platform that can change as the technology evolves. In this work, a complete networking switch designed in OpenCL is presented, identifying several high-level constructs that form the building blocks of any network application targeting FPGAs. These include the notion of an on-chip global memory and kernels constantly processing data...
Today's data center servers are equipped with high speed and complex network adaptors, featuring an array of functions, e.g. hardware TX/RX queues, packet filters, rate limiters, etc. Recent work like IX, Arrakis, MultiStack has made us rekindle the user-level network stacks' innovation utilizing these commodity network adaptors. In this paper, we revisit the idea to move stacks' design from in-kernel...
State-of-the-art CNN models for Image recognition use deep networks with small filters instead of shallow networks with large filters, because the former requires fewer weights. In the light of above trend, we present a fast and efficient FPGA based convolution engine to accelerate CNN models over small filters. The convolution engine implements Winograd minimal filtering algorithm to reduce the number...
The efficiency of datacenters is important consideration for cloud service providers to make their datacenters always ready for fulfilling the increasing demand for computing resources. Container-based virtualization is one approach to improving efficiency by reducing the overhead of virtualization. Resource overcommitment is another approach, but cloud providers tend to make conservative allocations...
Containers have been used in many applications for isolation purposes due to the lightweight, scalable and highly portable properties. However, to apply containers in virtual network functions (VNFs) faces a big challenge because high-performance VNFs often generate frequent communication workloads among containers while the container communications are generally not efficient. Compared with hardware...
Multipath TCP (MPTCP) enables transmission via multiple routes for an end-to-end connection to improve resource usage of regular TCP. Due to the increasing concern in green computing, there has been significant interest in designing energy-efficient multipath transport. For existing MPTCP congestion control algorithms, the research community still lacks a comprehensive understanding of which components...
Flow completion times (FCTs) are critical for many cloud applications. To minimize the average FCT, recent transport designs, such as pFabric, PASE, and PIAS, approximate the Shortest Remaining Time First (SRTF) scheduling. A common, implicit assumption of these solutions is that the remaining time is only determined by the remaining flow size. However, this assumption does not hold in many real-world...
Lightweight convolutional neural network (CNN) on tiny embedded platforms can offer energy efficient solution for today's IoT devices. However, CNN implementation on embedded system faces processing bottleneck in convolutional layers and memory storage issues in fully connected layers. In past years, heterogeneous acceleration, where compute intensive tasks are performed on kernel specific cores,...
Network Function Virtualization (NFV) is a novel paradigm that enables flexible and scalable implementation of network services on cloud infrastructure. An important enabler for the NFV paradigm is software switching, which should satisfy rigid network requirements such as high throughput and low latency. Despite recent research activities in the field of NFV, not much attention was given to understand...
Today, machine learning based on neural networks has become mainstream, in many application domains. A small subset of machine learning algorithms, called Convolutional Neural Networks (CNN), are considered as state-ofthe- art for many applications (e.g. video/audio classification). The main challenge in implementing the CNNs, in embedded systems, is their large computation, memory, and bandwidth...
This paper presents the first systematic study on co-scheduling independent jobs on integrated CPU-GPU systems with power caps considered. It reveals the performance degradations caused by the co-run contentions at the levels of both memory and power. It then examines the problem of using job co-scheduling to alleviate the degradations in this less understood scenario. It offers several algorithms...
Large datasets in astronomy and geoscience often require clustering and visualizations of phenomena at different densities and scales in order to generate scientific insight. We examine the problem of maximizing clustering throughput for concurrent dataset clustering in spatial dimensions. We introduce a novel hybrid approach that uses GPUs in conjunction with multicore CPUs for algorithmic throughput...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.