Search results

chapter

A static-placement, dynamic-issue framework for CGRA loop accelerator

Zhongyuan Zhao, Weiguang Sheng, Weifeng He, ZhiGang Mao, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1348 - 1353

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper presents a static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA) in order to tackle the inefficiencies of the static-issue, static-placement (SISP) CGRA. This framework includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations. We stress on introducing the...

chapter

Accurate private/shared classification of memory accesses: A run-time analysis system for the LEON3 multi-core processor

Nam Ho, Ishraq Ibne Ashraf, Paul Kaufmann, Marco Platzner

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 788 - 793

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Related work has presented simulation-based experiments to classify data accesses in a shared memory multi-core into private and shared. This information can be used to selectively turn on/off cache coherency mechanisms for data blocks, which can save memory bus bandwidth, minimize energy consumption, and reduce application runtimes. In this paper we present an implementation of a private/shared classification...

chapter

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Mohammed Alawad, Mingjie Lin

2017 18th International Symposium on Quality Electronic Design (ISQED) > 13 - 18

2017 18th International Symposium on Quality Electronic Design (ISQED)

Large-scale convolutional neural network (CNN), conceptually mimicking the operational principle of visual perception in human brain, has been widely applied to tackle many challenging computer vision and artificial intelligence applications. Unfortunately, despite of its simple architecture, a typically-sized CNN is well known to be computationally intensive. This work presents a novel stochastic-based...

chapter

Hardware-based on-line intrusion detection via system call routine fingerprinting

Liwei Zhou, Yiorgos Makris

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1546 - 1551

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

We introduce a hardware-based methodology for performing on-line intrusion detection in microprocessors. The proposed method extracts fingerprints from the basic blocks of the routine executed in response to a system call and examines their validity using a Bloom filter. Implementation in hardware renders spoofing attacks, to which operating system or hypervisor-level intrusion detection methods are...

chapter

From exaflop to exaflow

Tobias Becker, Pavel Burovskiy, Anna Maria Nestorov, Hristina Palikareva, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 404 - 409

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Exascale computing is facing a gap between the ever increasing demand for application performance and the underlying chip technology that does no longer deliver the expected exponential increases in CPU performance. The industry is now progressively moving towards dedicated accelerators to deliver high performance and better energy efficiency. However, the question of programmability still remains...

chapter

SenseBox: A low-cost smart home system

Joseph Taylor, H M Sajjad Hossain, Mohammad Aril Ul Alam, Md Abdullah Al Hafiz Khan, more

2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) > 60 - 62

2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)

Smart home technologies are getting acclaimed for providing a wide variety of functionalities - security, appliance control, HVAC control and remote monitoring. Installation cost and interoperability issues have restricted the adaptability of these technologies. In this demo paper, we demonstrate the design of an interoperable prototype of our smart home system, SenseBox using low-cost embedded device...

chapter

A novel zero weight/activation-aware hardware architecture of convolutional neural network

Dongyoung Kim, Junwhan Ahn, Sungjoo Yoo

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1462 - 1467

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load...

chapter

NetSlices: Scalable multi-core packet processing in user-space

Tudor Marian, Ki Suh Lee, Hakim Weatherspoon

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 27 - 38

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

Modern commodity operating systems do not provide developers with user-space abstractions for building high-speed packet processing applications. The conventional raw socket is inefficient and unable to take advantage of the emerging hardware, like multi-core processors and multi-queue network adapters. In this paper we present the NetSlice operating system abstraction. Unlike the conventional raw...

chapter

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Xiang Gong, Zhongliang Chen, Amir Kavyan Ziabari, Rafael Ubal, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 39 - 49

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU's resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels...

chapter

Taming warp divergence

Jayvant Anantpur, R. Govindarajan

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 50 - 60

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...

chapter

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 649 - 660

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...

chapter

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control

Abhinandan Majumdar, Leonardo Piga, Indrani Paul, Joseph L. Greathouse, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 613 - 624

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Modern processors can greatly increase energy efficiency through techniques such as dynamic voltage and frequency scaling. Traditional predictive schemes are limited in their effectiveness by their inability to plan for the performance and energy characteristics of upcoming phases. To date, there has been little research exploring more proactive techniques that account for expected future behavior...

chapter

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Bernhard Egger, Hochan Lee, Duseok Kang, Mansureh S. Moghaddam, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 197 - 209

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved...

chapter

Preserving Energy Resources Using an Android Kernel Extension: A Case Study

Luis Corral, Ilenia Fronza, Nabil El Ioini, Andrea Janes, more

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft) > 23 - 24

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft)

In this paper, we present our experience designing and testing anenergy saving strategy for mobile phones, implemented atoperating system level, using Android OS. Our approach was todeploy kernel extensions that assess the status of the device, andenable economic profiles without user intervention. Ourexperiments showed that the power management kernel extensionwas able to extend the battery runtime...

chapter

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...

chapter

Hardware-software codesign of accurate, multiplier-free Deep Neural Networks

Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, Sherief Reda

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent interests in designing low-power, low-latency DNNs...

chapter

Latency-aware packet processing on CPU-GPU heterogeneous systems

Arian Maghazeh, Unmesh D. Bordoloi, Usman Dastgeer, Alexandru Andrei, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing...

chapter

Instruction-level data isolation for the kernel on ARM

Yeongpil Cho, Donghyun Kwon, Yunheung Paek

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

As more sophisticated services are increasingly offered by the OS kernel on mobile devices, the security and sensitivity of kernel data that they depend on are becoming a critical issue. Data isolation has emerged as a key technique that can address the issue by providing strong protection for sensitive kernel data. However, existing data isolation mechanisms for mobile devices all incur non-negligible...

chapter

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Hyeonuk Kim, Jaehyeong Sim, Yeongjae Choi, Lee-Sup Kim

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

The binary-weight CNN is one of the most efficient solutions for mobile CNNs. However, a large number of operations are required to process each image. To reduce such a huge operation count, we propose an energy-efficient kernel decomposition architecture, based on the observation that a large number of operations are redundant. In this scheme, all kernels are decomposed into sub-kernels to expose...

chapter

Analyzing hardware based malware detectors

Nisarg Patel, Avesta Sasan, Houman Homayoun

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Detection of malicious software at the hardware level is emerging as an effective solution to increasing security threats. Hardware based detectors rely on Machine Learning(ML) classifiers to detect malware-like execution pattern based on Hardware Performance Counters(HPC) information at run-time. The effectiveness of these learning methods mainly relies on the information provided by expensive-to-implement...

INFONA - science communication portal

Search results

A static-placement, dynamic-issue framework for CGRA loop accelerator

Accurate private/shared classification of memory accesses: A run-time analysis system for the LEON3 multi-core processor

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Hardware-based on-line intrusion detection via system call routine fingerprinting

From exaflop to exaflow

SenseBox: A low-cost smart home system

A novel zero weight/activation-aware hardware architecture of convolutional neural network

NetSlices: Scalable multi-core packet processing in user-space

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Taming warp divergence

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Dynamic GPGPU Power Management Using Adaptive Model Predictive Control

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Preserving Energy Resources Using an Android Kernel Extension: A Case Study

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Hardware-software codesign of accurate, multiplier-free Deep Neural Networks

Latency-aware packet processing on CPU-GPU heterogeneous systems

Instruction-level data isolation for the kernel on ARM

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Analyzing hardware based malware detectors

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options