Search results

article

Scalable Approximate DCT Architectures for Efficient HEVC-Compliant Video Coding

Maher Jridi, Pramod Kumar Meher

IEEE Transactions on Circuits and Systems for Video Technology > 2017 > 27 > 8 > 1815 - 1825

An approximate kernel for the discrete cosine transform (DCT) of length 4 is derived from the 4-point DCT defined by the High Efficiency Video Coding (HEVC) standard and used for the computation of DCT and inverse DCT (IDCT) of power-of-two lengths. There are two reasons for considering the DCT of length 4 as the basic module. First, it allows computation of DCTs of lengths 4, 8, 16, and 32 prescribed...

chapter

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Patrick MacArthur

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 103 - 110

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...

chapter

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Kanishkan Vadivel, Mark Wijtvliet, Roel Jordans, Henk Corporaal

2017 Euromicro Conference on Digital System Design (DSD) > 14 - 21

2017 Euromicro Conference on Digital System Design (DSD)

Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large...

chapter

Automatic Control Flow Generation for OpenVX Graphs

Merten Popp, Stef van Son, Orlando Moreira

2017 Euromicro Conference on Digital System Design (DSD) > 198 - 204

2017 Euromicro Conference on Digital System Design (DSD)

Heterogeneous platforms with large numbers of processing elements (PEs) have been proposed to satisfy the computational requirements of computer vision applications. Limiting the incurred communication cost here is key to meet the power constraints of embedded devices.We present a new heuristic to reduce communication among PEs and to external memory by aggregating inter-process communication and...

chapter

TransCrypt: Transparent Main Memory Encryption Using a Minimal ARM Hypervisor

Julian Horsch, Manuel Huber, Sascha Wessel

2017 IEEE Trustcom/BigDataSE/ICESS > 152 - 161

2017 IEEE Trustcom/BigDataSE/ICESS

Attacks on memory, revealing secrets, for example, via DMA or cold boot, are a long known problem. In this paper, we present TransCrypt, a concept for transparent and guest-agnostic, dynamic kernel and user main memory encryption using a custom minimal hypervisor. The concept utilizes the address translation features provided by hardware-based virtualization support of modern CPUs to restrict the...

chapter

Cyclops: PRU programming framework for precise timing applications

Amr Alanwar, Fatima M. Anwar, Yi-Fan Zhang, Justin Pearson, more

2017 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS) > 1 - 6

2017 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS)

The Beaglebone Black single-board computer is well-suited for real-time embedded applications because its system-on-a-chip contains two "Programmable Real-time Units" (PRUs): 200-MHz microcontrollers that run concurrently with the main 1-GHz CPU that runs Linux. This paper introduces "Cyclops": a web-browser-based IDE that facilitates the development of embedded applications on...

chapter

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Gabriel Campeanu, Jan Carlson, Severine Sentilles

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) > 176 - 180

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

Nowadays, there are many embedded systems with different architectures that have incorporated GPUs. However, it is difficult to develop CPU-GPU embedded systems using component-based development (CBD), since existing CBD approaches have no support for GPU development. In this context, when targeting a particular CPU-GPU platform, the component developer is forced to construct hardware-specific components,...

chapter

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, more

2017 46th International Conference on Parallel Processing (ICPP) > 432 - 441

2017 46th International Conference on Parallel Processing (ICPP)

The home-grown SW26010 many-core processor enabled the production of China’s first independently developed number-one ranked supercomputer – the Sunway TaihuLight. The design of the limited off-chip memory bandwidth, however, renders the SW26010 a highly memory-bound processor. To compensate for this limitation, the processor was designed with a unique hardware feature, "Register Level Communication"...

chapter

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

Zhang Youhui, Zhang Youyang, Li Yanhua, Fei Xiang, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 69 - 76

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

With the increase of CMP (Chip-Multiprocessor) scale, moving data to computation on chip becomes more expensive. Accordingly, moving computation to data has potential to improve efficiency. We propose an in-place computation co-design of many-simple-core CMP for irregular applications. The computing paradigm is that an application's critical irregular data (or part of them) is partitioned into on-chip...

chapter

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 45 - 52

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory...

chapter

Autotuning GPU Kernels via Static and Predictive Analysis

Robert Lim, Boyana Norris, Allen Malony

2017 46th International Conference on Parallel Processing (ICPP) > 523 - 532

2017 46th International Conference on Parallel Processing (ICPP)

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...

chapter

An efficient FPGA-Based architecture for convolutional neural networks

Wen-Jyi Hwang, Yun-Jie Jhang, Tsung-Ming Tai

2017 40th International Conference on Telecommunications and Signal Processing (TSP) > 582 - 588

2017 40th International Conference on Telecommunications and Signal Processing (TSP)

The goal of this paper is to implement an efficient FPGA-based hardware architectures for the design of fast artificial vision systems. The proposed architecture is capable of performing classification operations of a Convolutional Neural Network (CNN) in realtime. To show the effectiveness of the architecture, some design examples such as hand posture recognition, character recognition, and face...

chapter

Jamming resistant encoding for non-uniformly distributed information

Batya Karp, Yerucham Berkowitz, Osnat Keren

2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS) > 169 - 173

2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS)

Codes that aim to detect any error regardless of its multiplicity are referred to as security oriented codes. Most of these codes are designed to protect uniformly distributed codewords; there are few solutions which are used in protecting systems with non-uniformly distributed words. The paper introduces a new encoding method, termed “Level-Out encoding”, for cases in which some words are more likely...

chapter

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Tuan Ta, David Troendle, Xiaoqi Hu, Byunghyun Jang

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 132 - 139

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

The conventional OpenCL 1.x style CPU-GPU heterogeneous computing paradigm treats the CPU and GPU processors as loosely connected separate entities. At best each executes independent tasks, but, more commonly, the CPU idles while waiting for results from the GPU. No data-sharing and communications are allowed during kernel execution. This model limits the number of applications that can harness the...

chapter

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

Kyle C. Hale, Conor Hetland, Peter Dinda

2017 IEEE International Conference on Autonomic Computing (ICAC) > 177 - 186

2017 IEEE International Conference on Autonomic Computing (ICAC)

The hybrid runtime (HRT) model offers a path towards high performance and efficiency. By integrating the OS kernel, runtime, and application, an HRT allows the runtime developer to leverage the full feature set of the hardware and specialize OS services to the runtime's needs. However, conforming to the HRT model currently requires a port of the runtime to the kernel level, for example to the Nautilus...

chapter

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Jeng-Hau Lin, Tianwei Xing, Ritchie Zhao, Zhiru Zhang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 344 - 352

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we propose BCNN with Separable Filters (BCNNw/SF), which applies Singular...

chapter

A Model Driven Approach for Device Driver Development

Yunwei Dong, Yuanyuan He, Yin Lu, Hong Ye

2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C) > 122 - 129

2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)

In order to facilitate the development and maintenance of device drivers integrated into the operating system, a model driven approach is proposed in this pater for driver design and verification before codding. Architecture model and behavior model are created to illustrate both static and dynamic characteristics of device drivers, in company with device model and device-driver-O.S. interaction model...

chapter

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

chapter

Automated monitoring and detection of resource-limited NFV-based services

Steven Van Rossem, Wouter Tavernier, Didier Colle, Mario Pickavet, more

2017 IEEE Conference on Network Softwarization (NetSoft) > 1 - 5

2017 IEEE Conference on Network Softwarization (NetSoft)

The growing demand for flexibility and cost reduction in the telecommunication landscape directs the focus of service development heavily to programmability and softwarization. In the domain of Network Function Virtualization (NFV), one of the goals is to replace dedicated hardware devices (such as switches, routers, firewalls) with software-based network functionalities, showing comparable performance...

chapter

A survey on decoding schedules of LDPC convolutional codes and associated hardware architectures

Hayfa Ben Thameur, Bertrand Le Gal, Nadia Khouja, Fethi Tlili, more

2017 IEEE Symposium on Computers and Communications (ISCC) > 898 - 905

2017 IEEE Symposium on Computers and Communications (ISCC)

Low-density parity-check convolutional codes (LDPC-CC) have interesting error correction features. They have a great potential to become a key error-correcting codes for enhancing reliability of modern digital communication systems, optical systems and storage devices. On the implementation side, however, the design of low-cost low-power and high-throughput LDPC-CC decoders remains challenging. This...

INFONA - science communication portal

Search results

Scalable Approximate DCT Architectures for Efficient HEVC-Compliant Video Coding

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Automatic Control Flow Generation for OpenVX Graphs

TransCrypt: Transparent Main Memory Encryption Using a Minimal ARM Hypervisor

Cyclops: PRU programming framework for precise timing applications

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Autotuning GPU Kernels via Static and Predictive Analysis

An efficient FPGA-Based architecture for convolutional neural networks

Jamming resistant encoding for non-uniformly distributed information

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

A Model Driven Approach for Device Driver Development

OpenMP device offloading to FPGA accelerators

Automated monitoring and detection of resource-limited NFV-based services

A survey on decoding schedules of LDPC convolutional codes and associated hardware architectures

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options