Search results

Items from 21 to 40 out of 273 results

chapter

High Throughput FPGA Implementation for regular Non-Surjective Finite Alphabet Iterative Decoders

Thien Truong Nguyen-Ly, Valentin Savin, Xavier Popon, David Declercq

2017 IEEE International Conference on Communications Workshops (ICC Workshops) > 961 - 966

2017 IEEE International Conference on Communications Workshops (ICC Workshops)

This paper deals with the recently introduced class of Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs). First, optimization results for an extended class of regular NS-FAIDs are presented. They reveal different possible trade-offs between decoding performance and hardware implementation efficiency. To validate the promises of optimized NS-FAIDs in terms of hardware implementation benefits,...

chapter

Cache-aware affinitization on commodity multicores for high-speed network flows

Vishal Ahuja, Matthew Farrens, Dipak Ghosal

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 39 - 48

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

For a given TCP or UDP flow, protocol processing of incoming packets is performed on the core that receives the interrupt, while the user-space application which consumes the data may run on the same or a different core. If the cores are not the same, additional costs due to context switches, cache misses, and the movement of data between the caches of the cores may occur. The magnitude of this cost...

chapter

14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems

Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 238 - 239

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

A booming number of computer vision, speech recognition, and signal processing applications, are increasingly benefiting from the use of deep convolutional neural networks (DCNN) stemming from the seminal work of Y. LeCun et al. [1] and others that led to winning the 2012 ImageNet Large Scale Visual Recognition Challenge with AlexNet [2], a DCNN significantly outperforming classical approaches for...

chapter

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control

Abhinandan Majumdar, Leonardo Piga, Indrani Paul, Joseph L. Greathouse, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 613 - 624

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Modern processors can greatly increase energy efficiency through techniques such as dynamic voltage and frequency scaling. Traditional predictive schemes are limited in their effectiveness by their inability to plan for the performance and energy characteristics of upcoming phases. To date, there has been little research exploring more proactive techniques that account for expected future behavior...

chapter

Latency-aware packet processing on CPU-GPU heterogeneous systems

Arian Maghazeh, Unmesh D. Bordoloi, Usman Dastgeer, Alexandru Andrei, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing...

chapter

Quality of service support for fine-grained sharing on GPUs

Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 269 - 281

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

GPUs have been widely adopted in data centers to provide acceleration services to many applications. Sharing a GPU is increasingly important for better processing throughput and energy efficiency. However, quality of service (QoS) among concurrent applications is minimally supported. Previous efforts are too coarse-grained and not scalable with increasing QoS requirements. We propose QoS mechanisms...

chapter

Tiered attestation for Internet-of-Things (IoT) devices

Giridhar D. Mandyam

2017 9th International Conference on Communication Systems and Networks (COMSNETS) > 480 - 483

2017 9th International Conference on Communication Systems and Networks (COMSNETS)

Remote attestation is the procedure in which a relying party verifies the environment in which a device is carrying out cryptographic operations. Relying parties can leverage attestation data as part of their authentication and authorization procedures. However many Internet-of-Things (IoT) devices either do not have direct connectivity to relying parties, or may simply not be able to provide reliable...

chapter

Enabling fast preemption via Dual-Kernel support on GPUs

Li-Wei Shieh, Kun-Chih Chen, Hsueh-Chun Fu, Po-Han Wang, more

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 121 - 126

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

To consider QoS for resource-limited mobile systems, we introduce a fast preemption mechanism on GPUs. First, we involve a dual-kernel execution model to support fine-grained preemption, and a resource allocation policy to avoid resource fragmentation problem. Second, we propose a preemption victim selection scheme to reduce the throughput overhead while satisfying a required preemption latency. Evaluations...

chapter

Taming memory related performance pitfalls in linux Cgroups

Zhenyun Zhuang, Cuong Tran, Jerry Weng, Haricharan Ramachandra, more

2017 International Conference on Computing, Networking and Communications (ICNC) > 531 - 535

2017 International Conference on Computing, Networking and Communications (ICNC)

Linux kernel feature of Cgroups (Control Groups) is being increasingly adopted for running applications in multi-tenanted environments. Many projects (e.g., Docker) rely on cgroups to isolate resources such as CPU and memory. It is critical to ensure high performance for such deployments. At LinkedIn, we have been using Cgroups and investigated its performance. This work presents our findings about...

chapter

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Li Ding, Ping Kang, Wenbo Yin, Linli Wang

2016 International Conference on Field-Programmable Technology (FPT) > 269 - 272

2016 International Conference on Field-Programmable Technology (FPT)

This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...

chapter

To use or not to use: CPUs' cache optimization techniques on GPGPUs

D.R.V.L.B. Thambawita, Roshan G. Ragel, Dhammike Elkaduwe

2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS) > 1 - 6

2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS)

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work...

chapter

A Memory Accessing Method for the Parallel Aho-Corasick Algorithm on GPU

JinMyung Yoon, Kang-Il Choi, HyunJin Kim

2016 International Conference on Information Science and Security (ICISS) > 1 - 3

2016 International Conference on Information Science and Security (ICISS)

In this paper, we propose a memory accessing method of Parallel Failureless Aho-Corasick (PFAC) algorithm considering Graphic Processing Unit (GPU) memory architecture for throughput improvement. Compared with Aho-Corasick (AC) Algorithm using Central Processing Unit (CPU) and Data-Parallel Aho-Corasick (DPAC) using Open Multi-Processing (OpenMP), PFAC using GPU achieves high performance advancement...

chapter

ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs

Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 854 - 865

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation and object detection and localization. Here we consider the parallelization of inference, i.e., the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as the number of output voxels computed per unit...

chapter

Hardware thread reordering to boost OpenCL throughput on FPGAs

Amir Momeni, Hamed Tabkhi, Gunar Schirner, David Kaeli

2016 IEEE 34th International Conference on Computer Design (ICCD) > 257 - 264

2016 IEEE 34th International Conference on Computer Design (ICCD)

Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent...

chapter

Container-Based Service Chaining: A Performance Perspective

Sergio Livi, Quentin Jacquemart, Dino Lopez Pacheco, Guillaume Urvoy-Keller

2016 5th IEEE International Conference on Cloud Networking (Cloudnet) > 176 - 181

2016 5th IEEE International Conference on Cloud Networking (Cloudnet)

Middleboxes, which implement specific network service functions – e.g. firewalls, load balancers, NATs – have traditionally been deployed as hardware appliances, thereby imposing significant constraints on network operators, who must ensure that the traffic is effectively routed to the appropriate set of middleboxes, following the right order. Being hardware-based, these boxes offer limited upgrade...

chapter

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

2016 IEEE 34th International Conference on Computer Design (ICCD) > 320 - 327

2016 IEEE 34th International Conference on Computer Design (ICCD)

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained...

chapter

Performance evaluation of transmission-control middleware in WLAN and LTE networks

Ayumi Shimada, Masato Oguchi, Saneyasu Yamaguchi, Heidi Kaartinen, more

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) > 115 - 120

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

Since mobile terminals such as smartphones are basic information tools for users, their communication performance is always significant. Modern loss-based Transmission Control Protocols (TCP) take aggressive congestion window (CWND) control strategies in order to gain better throughput, but such strategies may cause a large number of packets to be backlogged and eventually dropped at the entry point...

chapter

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Iman Firmansyah, Yoshiki Yamaguchi, Taisuke Boku

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA) > 23 - 27

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA)

FPGA, or Field Programmable Gate Array, has been widely used for several applications such as digital signal and image processing, video processing, software-defined radio, radar processing, medical imaging and so on. Currently, with the significance growth of parallel computing and cloud computing application, FPGA provides another solution for high performance computing instead of CPU or GPGPU due...

chapter

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

Ian Janik, Mohammed A. S. Khalid

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1 - 4

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)

This paper uses the Altera SDK for OpenCL (AOCL) High-Level Synthesis (HLS) tool to accelerate the computation of the SHA-1 hash function. Using FPGAs to increase throughput of this algorithm has been a popular topic in research. The work done thus far, focuses on HDL based design methodologies. The goal of this paper is to determine if the HLS implementation can compare in terms of speed to the HDL...

chapter

CUDA implementation of an optimal online Gaussian-Signal-in-Gaussian-Noise detector

Nir Nossenson, Ariel J. Jaffe

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

We address the computationally demanding task of real time optimal detection of a Gaussian Signal in Gaussian Noise. The mathematical principles of such a detector were formulated in 1965, but a full real-time implementation of these principles was not possible for decades mainly due to technological barriers. We present a CUDA based implementation of such an optimal detector and study its decision...

Keywords:
KERNEL
THROUGHPUT

Publication date

Set your own date range

Content availability

Available (269)
None (4)

Keywords

LINUX (71)
HARDWARE (45)
PROTOCOLS (41)
SERVERS (38)
GRAPHICS PROCESSING UNITS (37)
COMPUTER ARCHITECTURE (35)
INSTRUCTION SETS (34)
BANDWIDTH (31)
PERFORMANCE EVALUATION (29)
FIELD PROGRAMMABLE GATE ARRAYS (28)
IP NETWORKS (26)
TRANSPORT PROTOCOLS (24)
GPU (21)
OPTIMIZATION (20)
BENCHMARK TESTING (19)
RECEIVERS (19)
PARALLEL PROCESSING (18)
DELAY (17)
GRAPHICS PROCESSING UNIT (17)
SWITCHES (17)
RANDOM ACCESS MEMORY (16)
CUDA (15)
DATA MINING (14)
MEMORY MANAGEMENT (14)
PIPELINES (14)
SOCKETS (13)
ENCODING (12)
GPGPU (12)
PROGRAM PROCESSORS (12)
SCHEDULING (12)
TCP (12)
VIRTUAL MACHINING (12)
ENGINES (11)
LOCAL AREA NETWORKS (11)
PERFORMANCE (11)
ALGORITHM DESIGN AND ANALYSIS (10)
ARRAYS (10)
CONTEXT (10)
CRYPTOGRAPHY (10)
DECODING (10)
FPGA (10)
INTERNET (10)
MONITORING (10)
SYNCHRONIZATION (10)
VIRTUAL MACHINES (10)
CLOUD COMPUTING (9)
COPROCESSORS (9)
DELAYS (9)
MULTIPROCESSING SYSTEMS (9)
RESOURCE MANAGEMENT (9)
SCALABILITY (9)
SCHEDULES (9)
YARN (9)
DRIVER CIRCUITS (8)
TELECOMMUNICATION CONGESTION CONTROL (8)
OPERATING SYSTEM KERNELS (7)
PIPELINE PROCESSING (7)
REAL TIME SYSTEMS (7)
REGISTERS (7)
CLOCKS (6)
COMPUTATIONAL MODELING (6)
CONGESTION CONTROL (6)
CONVOLUTION (6)
DIGITAL SIGNAL PROCESSING (6)
LINUX KERNEL (6)
MEASUREMENT (6)
OPTIMISATION (6)
PROGRAMMING (6)
RESOURCE ALLOCATION (6)
STREAMING MEDIA (6)
TELECOMMUNICATION TRAFFIC (6)
WIRELESS LAN (6)
CACHE STORAGE (5)
COMPUTER GRAPHIC EQUIPMENT (5)
CONTAINERS (5)
DEGRADATION (5)
DETECTORS (5)
EMBEDDED SYSTEMS (5)
ETHERNET NETWORKS (5)
MESSAGE SYSTEMS (5)
MICROPROCESSOR CHIPS (5)
MULTI-THREADING (5)
NETWORK INTERFACES (5)
OPENCL (5)
PREFETCHING (5)
PROCESSOR SCHEDULING (5)
QUALITY OF SERVICE (5)
ROUTING (5)
SHARED MEMORY (5)
SYSTEM-ON-CHIP (5)
TIME FACTORS (5)
VIRTUALIZATION (5)
WRITING (5)
ACCELERATION (4)
BUFFER STORAGE (4)
COMPLEXITY THEORY (4)
DATABASES (4)
EMULATION (4)
more

INFONA - science communication portal

Search results

High Throughput FPGA Implementation for regular Non-Surjective Finite Alphabet Iterative Decoders

Cache-aware affinitization on commodity multicores for high-speed network flows

14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control

Latency-aware packet processing on CPU-GPU heterogeneous systems

Quality of service support for fine-grained sharing on GPUs

Tiered attestation for Internet-of-Things (IoT) devices

Enabling fast preemption via Dual-Kernel support on GPUs

Taming memory related performance pitfalls in linux Cgroups

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

To use or not to use: CPUs' cache optimization techniques on GPGPUs

A Memory Accessing Method for the Parallel Aho-Corasick Algorithm on GPU

ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs

Hardware thread reordering to boost OpenCL throughput on FPGAs

Container-Based Service Chaining: A Performance Perspective

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Performance evaluation of transmission-control middleware in WLAN and LTE networks

Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing

Synthesis and evaluation of SHA-1 algorithm using altera SDK for OpenCL

CUDA implementation of an optimal online Gaussian-Signal-in-Gaussian-Noise detector

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options