Search results

Items from 41 to 60 out of 273 results

chapter

Real-time, low-latency image processing with high throughput on a multi-core SoC

Barath Ramesh, Alan D. George, Herman Lam

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Real-time, low-latency, image processing with high throughput is vital for many time-critical applications in fields such as medical imaging, robotics, and wearable computers. Traditionally, FPGAs have often been employed to meet these requirements. However, due to the productivity challenges, using FPGAs may not be viable in some cases. Alternatively, the typical approach of processing an image on...

chapter

Intra-host Rate Control with Centralized Approach

Zhuang Wang, Ke Liu, Yifan Shen, Jack Y. B. Lee, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 384 - 387

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Today's datacenter is shared among various applications with different QoS requirements, which poses a great challenge to deliver low delay transport with high throughput. Most of works address this challenge by reducing the in-network delay, but assumes a negligible local delay. However, we show that this assumption does not hold for a multi-tenant datacenter that a physical machine is shared by...

chapter

Application-Assisted Writeback for Hadoop Clusters

Jungi Jeong, Daewoo Lee, Seungryoul Maeng

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 447 - 450

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Achieving low and predictable execution time of short jobs in Hadoop clusters has gained a great attention due to their importance on system productivity and user experience. However, one major contributor that makes it challenging is diskI/O interference. We observed that disk writes unintentionally block latency-sensitive short jobs and cause unexpected high latency. Unfortunately, previous research...

chapter

Rate-splitting for polar codes on block fading channels without CSIT

Nicolas Gresset, Victor Exposito

2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC) > 141 - 145

2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC)

This paper presents a polar code design for block fading channels when no channel state information is available at the transmitter, which involves that the frozen bits cannot be changed dynamically with the fading realizations. An outer parallel code is concatenated with an inner polarization kernel that changes the properties of the block fading channel. The rate-splitting between the parallel outer...

chapter

Out-of-order transmission enabled congestion and scheduling control for multipath TCP

Shih-Hao Ou, Chih-Wei Huang, Tzu-Kuan Lee, Chih-Yang Huang

2016 International Wireless Communications and Mobile Computing Conference (IWCMC) > 1069 - 1073

2016 International Wireless Communications and Mobile Computing Conference (IWCMC)

With development of wireless communication technologies, mobile devices are commonly equipped with multiple network interfaces and ready to adopt emerging transport layer protocols such as multipath TCP (MPTCP). The protocol is specifically useful for Internet of Things streaming applications with critical latency and bandwidth demands. To achieve full potential of MPTCP, major challenges on congestion...

chapter

A VLSI architecture for real-time gradient guided image filtering

Lei Wu, Ching Chuen Jong

2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC) > 1 - 6

2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)

High performance filtering has been in ever increasing demand for a range of applications, especially for real-time image/video processing. Guided image filter is one of the widely used image filters. Among them, the gradient domain guided image filter for edge-preserving smoothing and for mitigating the halo-artifacts problem existed in the current guided image filters is reported recently. Due to...

chapter

Effect of timer interrupt interval on file system synchronization overhead

Hankeun Son, Seongjin Lee, Youjip Won

2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC) > 99 - 102

2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC)

File system metadata is indispensable in both describing the data and maintaining the file system. Despite the importance of metadata in the file system, the overhead of maintaining the metadata cannot be taken lightly. It is because the metadata also have to be persisted on the storage device and it consumes IO bandwidth as well as creates journaling overhead. In this paper, we find that the random...

chapter

Portable and transparent software managed scheduling on accelerators for fair resource sharing

Christos Margiolas, Michael F. P. O'Boyle

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 82 - 93

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Accelerators, such as Graphic Processing Units (GPUs), are popular components of modern parallel systems. Their energy-efficient performance make them attractive components for modern data center nodes. However, they lack control for fair resource sharing amongst multiple users. This paper presents a runtime and Just In Time compiler that enables resource sharing control and software managed scheduling...

chapter

Exploiting integrated GPUs for network packet processing workloads

Janet Tseng, Ren Wang, James Tsai, Saikrishna Edupuganti, more

2016 IEEE NetSoft Conference and Workshops (NetSoft) > 161 - 165

2016 IEEE NetSoft Conference and Workshops (NetSoft)

Software-based network packet processing on standard high volume servers promises better flexibility, manageability and scalability, thus gaining tremendous momentum in recent years. Numerous research efforts have focused on boosting packet processing performance by offloading to discrete Graphics Processing Units (GPUs). While integrated GPUs, residing on the same die with the CPU, offer many advanced...

chapter

Decoding network codes using the sum-product algorithm

Anindya Gupta, B. Sundar Rajan

2016 IEEE International Conference on Communications (ICC) > 1 - 7

ICC 2016 - 2016 IEEE International Conference on Communications

While feasibility and obtaining a solution of a given network coding problem are well studied, the decoding procedure and complexity have not garnered much attention. We consider the decoding problem in a network wherein the sources generate multiple messages and the sink nodes demand some or all of the source messages. We consider both linear and non-linear network codes over a finite field and propose...

chapter

Fast realistic block-based refocusing for sparse light fields

Li-Ren Huang, Yu-Wen Wang, Chao-Tsung Huang

2016 IEEE International Symposium on Circuits and Systems (ISCAS) > 998 - 1001

2016 IEEE International Symposium on Circuits and Systems (ISCAS)

View-interpolation-based refocusing achieves realistic quality for sparse light fields but requires lots of computation. In this paper, we aim to reduce the computation load while maintaining the superior refocusing quality. The idea is to interpolate only few views for infocused regions and to perform refocusing on downsampled pixels for defocused area. This is achieved b y a proposed block-based...

chapter

FPGA kernels for classification rule induction

P. Skoda, B. Medved Rogina

2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 337 - 342

2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...

chapter

DPDK Open vSwitch performance validation with mirroring feature

Sivasothy Shanmugalingam, Adlen Ksentini, Philippe Bertin

2016 23rd International Conference on Telecommunications (ICT) > 1 - 6

2016 23rd International Conference on Telecommunications (ICT)

Network Function Visualization (NFV) and Software Defined Network (SDN) currently play a key role to transform the network architecture from hardware-based to software-based. Along with cloud computing, NFV and SDN are moving network functions from dedicated hardware to software implementation (Virtual Network Functions — VNF), on Virtual Machine (VM) or other virtualization technology such as containers,...

chapter

Throughput oriented FPGA overlays using DSP blocks

Abhishek Kumar Jain, Douglas L. Maskell, Suhaib A. Fahmy

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1628 - 1633

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited...

chapter

A fine-grained performance model for GPU architectures

Nicola Bombieri, Federico Busato, Franco Fummi

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1267 - 1272

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread use of such many-core architectures to accelerate general purpose applications. Nevertheless, tuning applications to efficiently exploit the GPU potentiality is a very challenging task, especially for inexperienced programmers. This is due to the difficulty of developing a SW application for the specific...

chapter

P4GPU: Accelerate packet processing of a P4 program with a CPU-GPU heterogeneous architecture

Peilong Li, Yan Luo

2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 125 - 126

2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

The P4 language is an emerging domain-specific language for describing the data plane processing at a network device. P4 has been mapped to a wide range of forwarding devices including NPUs, programmable NICs and FPGAs, except for General Purpose Graphics Processing Unit (GPGPU) which is a salient parallel architecture for processing network flows. In this work, we design a heterogeneous architecture...

chapter

Hierarchical RAID's parity generation using pass-through GPU in multi virtual-machine environment

Tae-Gun Song, Mehdi Pirahandeh, Deok-Hwan Kim

2016 International Conference on Big Data and Smart Computing (BigComp) > 386 - 389

2016 International Conference on Big Data and Smart Computing (BigComp)

Traditional hierarchical RAID causes huge GPU overhead and does not support node failure. To resolve this problem, this paper proposes a new hierarchical redundant array of inexpensive disks (RAID)'s parity generation using pass-through GPU in multi virtual-machine (VM) environment. The proposed method reduces GPU overhead and parity generation time, and supports node failure compared to the traditional...

chapter

Optimization of behavioral IPs in multi-processor system-on-chips

Yidi Liu, Benjamin Carrion Schafer

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) > 336 - 341

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

This work shows that behavioral IPs (BIPs) are often over-designed when used in heterogenous Multi-Procesosr SoCs (MPSoCs) mainly because they are designed and optimized separately. When instantiated in an MPSoC, these IPs often haven to wait for data from the master and also wait to gain access to the bus to return the results. Behavioral IPs have the advantage over traditional RTL-based IPs that...

chapter

Implementation of edge-enhancement nonlinear anisotropic diffusion filtering using different CUDA memory models

M. H. Attia, S. A. Elshehaby, A. S. Elmaghraby

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 501 - 504

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Graphics Processing Units (GPUs) are used today as affordable energy-efficient method of acceleration for computationally exhaustive algorithms to decrease execution time exploiting the power of parallel programing techniques. In the field of medical imaging, GPUs became crucial acceleration method for computationally exhaustive algorithms. This paper presented the effect of memory optimization on...

chapter

Braiding: A scheme for resolving hazards in kernel adaptive filters

Stephen Tridgell, Duncan J.M. Moss, Nicholas J. Fraser, Philip H.W. Leong

2015 International Conference on Field Programmable Technology (FPT) > 136 - 143

2015 International Conference on Field Programmable Technology (FPT)

Computational cost presents a barrier in the application of machine learning algorithms to large-scale real-time learning problems. Kernel adaptive filters (KAFs) have low computational cost with the ability to learn online and are hence favoured for such applications. Unfortunately, dependencies of the outputs on the weight updates prohibit pipelining.

Keywords:
KERNEL
THROUGHPUT

Publication date

Set your own date range

Content availability

Available (269)
None (4)

Keywords

LINUX (71)
HARDWARE (45)
PROTOCOLS (41)
SERVERS (38)
GRAPHICS PROCESSING UNITS (37)
COMPUTER ARCHITECTURE (35)
INSTRUCTION SETS (34)
BANDWIDTH (31)
PERFORMANCE EVALUATION (29)
FIELD PROGRAMMABLE GATE ARRAYS (28)
IP NETWORKS (26)
TRANSPORT PROTOCOLS (24)
GPU (21)
OPTIMIZATION (20)
BENCHMARK TESTING (19)
RECEIVERS (19)
PARALLEL PROCESSING (18)
DELAY (17)
GRAPHICS PROCESSING UNIT (17)
SWITCHES (17)
RANDOM ACCESS MEMORY (16)
CUDA (15)
DATA MINING (14)
MEMORY MANAGEMENT (14)
PIPELINES (14)
SOCKETS (13)
ENCODING (12)
GPGPU (12)
PROGRAM PROCESSORS (12)
SCHEDULING (12)
TCP (12)
VIRTUAL MACHINING (12)
ENGINES (11)
LOCAL AREA NETWORKS (11)
PERFORMANCE (11)
ALGORITHM DESIGN AND ANALYSIS (10)
ARRAYS (10)
CONTEXT (10)
CRYPTOGRAPHY (10)
DECODING (10)
FPGA (10)
INTERNET (10)
MONITORING (10)
SYNCHRONIZATION (10)
VIRTUAL MACHINES (10)
CLOUD COMPUTING (9)
COPROCESSORS (9)
DELAYS (9)
MULTIPROCESSING SYSTEMS (9)
RESOURCE MANAGEMENT (9)
SCALABILITY (9)
SCHEDULES (9)
YARN (9)
DRIVER CIRCUITS (8)
TELECOMMUNICATION CONGESTION CONTROL (8)
OPERATING SYSTEM KERNELS (7)
PIPELINE PROCESSING (7)
REAL TIME SYSTEMS (7)
REGISTERS (7)
CLOCKS (6)
COMPUTATIONAL MODELING (6)
CONGESTION CONTROL (6)
CONVOLUTION (6)
DIGITAL SIGNAL PROCESSING (6)
LINUX KERNEL (6)
MEASUREMENT (6)
OPTIMISATION (6)
PROGRAMMING (6)
RESOURCE ALLOCATION (6)
STREAMING MEDIA (6)
TELECOMMUNICATION TRAFFIC (6)
WIRELESS LAN (6)
CACHE STORAGE (5)
COMPUTER GRAPHIC EQUIPMENT (5)
CONTAINERS (5)
DEGRADATION (5)
DETECTORS (5)
EMBEDDED SYSTEMS (5)
ETHERNET NETWORKS (5)
MESSAGE SYSTEMS (5)
MICROPROCESSOR CHIPS (5)
MULTI-THREADING (5)
NETWORK INTERFACES (5)
OPENCL (5)
PREFETCHING (5)
PROCESSOR SCHEDULING (5)
QUALITY OF SERVICE (5)
ROUTING (5)
SHARED MEMORY (5)
SYSTEM-ON-CHIP (5)
TIME FACTORS (5)
VIRTUALIZATION (5)
WRITING (5)
ACCELERATION (4)
BUFFER STORAGE (4)
COMPLEXITY THEORY (4)
DATABASES (4)
EMULATION (4)
more

INFONA - science communication portal

Search results

Real-time, low-latency image processing with high throughput on a multi-core SoC

Intra-host Rate Control with Centralized Approach

Application-Assisted Writeback for Hadoop Clusters

Rate-splitting for polar codes on block fading channels without CSIT

Out-of-order transmission enabled congestion and scheduling control for multipath TCP

A VLSI architecture for real-time gradient guided image filtering

Effect of timer interrupt interval on file system synchronization overhead

Portable and transparent software managed scheduling on accelerators for fair resource sharing

Exploiting integrated GPUs for network packet processing workloads

Decoding network codes using the sum-product algorithm

Fast realistic block-based refocusing for sparse light fields

FPGA kernels for classification rule induction

DPDK Open vSwitch performance validation with mirroring feature

Throughput oriented FPGA overlays using DSP blocks

A fine-grained performance model for GPU architectures

P4GPU: Accelerate packet processing of a P4 program with a CPU-GPU heterogeneous architecture

Hierarchical RAID's parity generation using pass-through GPU in multi virtual-machine environment

Optimization of behavioral IPs in multi-processor system-on-chips

Implementation of edge-enhancement nonlinear anisotropic diffusion filtering using different CUDA memory models

Braiding: A scheme for resolving hazards in kernel adaptive filters

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options