Search results

chapter

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Anshuman Verma, Huiyang Zhou, Skip Booth, Robbie King, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...

chapter

Hardware-software codesign of accurate, multiplier-free Deep Neural Networks

Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, Sherief Reda

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent interests in designing low-power, low-latency DNNs...

chapter

Latency-aware packet processing on CPU-GPU heterogeneous systems

Arian Maghazeh, Unmesh D. Bordoloi, Usman Dastgeer, Alexandru Andrei, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing...

chapter

Instruction-level data isolation for the kernel on ARM

Yeongpil Cho, Donghyun Kwon, Yunheung Paek

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

As more sophisticated services are increasingly offered by the OS kernel on mobile devices, the security and sensitivity of kernel data that they depend on are becoming a critical issue. Data isolation has emerged as a key technique that can address the issue by providing strong protection for sensitive kernel data. However, existing data isolation mechanisms for mobile devices all incur non-negligible...

chapter

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Hyeonuk Kim, Jaehyeong Sim, Yeongjae Choi, Lee-Sup Kim

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

The binary-weight CNN is one of the most efficient solutions for mobile CNNs. However, a large number of operations are required to process each image. To reduce such a huge operation count, we propose an energy-efficient kernel decomposition architecture, based on the observation that a large number of operations are redundant. In this scheme, all kernels are decomposed into sub-kernels to expose...

chapter

Analyzing hardware based malware detectors

Nisarg Patel, Avesta Sasan, Houman Homayoun

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Detection of malicious software at the hardware level is emerging as an effective solution to increasing security threats. Hardware based detectors rely on Machine Learning(ML) classifiers to detect malware-like execution pattern based on Hardware Performance Counters(HPC) information at run-time. The effectiveness of these learning methods mainly relies on the information provided by expensive-to-implement...

chapter

Special session paper: exploiting quality-energy tradeoffs with arbitrary quantization

Thierry Moreau, Felipe Augusto, Patrick Howe, Armin Alaghi, more

2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 2

2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

Approximate computing aims to expose and exploit quality vs. efficiency tradeoffs to enable ever-more demanding applications on energy-constrained devices such as smartphones, or IoT devices. This paper makes the case for arbitrary quantization as a compelling approximation technique that exposes quality vs. energy tradeoffs and provides practical error guarantees. We present QAPPA (Quality Autotuner...

chapter

Work-in-progress: REDEFINE – a case for WCET-friendly hardware accelerators for real time applications

Kavitha Madhu, Tarun Singla, S K Nandy, Ranjani Narayan, more

2017 International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES) > 1 - 2

2017 International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES)

REDEFINE is a distributed dynamic dataow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). is paper dwells on the exibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.

chapter

Quality of service support for fine-grained sharing on GPUs

Zhenning Wang, Jun Yang, Rami Melhem, Bruce Childers, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 269 - 281

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

GPUs have been widely adopted in data centers to provide acceleration services to many applications. Sharing a GPU is increasingly important for better processing throughput and energy efficiency. However, quality of service (QoS) among concurrent applications is minimally supported. Previous efforts are too coarse-grained and not scalable with increasing QoS requirements. We propose QoS mechanisms...

chapter

Access pattern-aware cache management for improving data utilization in GPU

Gunjae Koo, Yunho Oh, Won Woo Ro, Murali Annavaram

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 307 - 319

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space...

chapter

LogCA: A high-level performance model for hardware accelerators

Muhammad Shoaib Bin Altaf, David A. Wood

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 375 - 388

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and...

chapter

Security sandbox model for modern web environment

Amod Narendra Narvekar, Kiran K. Joshi

2017 International Conference on Nascent Technologies in Engineering (ICNTE) > 1 - 6

2017 International Conference on Nascent Technologies in Engineering (ICNTE)

We require a very good technical knowledge to create automated tests to exploit the browser vulnerabilities. It is usually a combination of technical abilities and set of specific tools. Security concerns is of prime importance when it comes to web browsers. Attacks during surfing, executing any downloaded file and while transmission are very frequent these days and hence all browsers need to be hardened...

chapter

A command-level study of Linux kernel bugs

Yiliang Shi, Danny V. Murillo, Simeng Wang, Jinrui Cao, more

2017 International Conference on Computing, Networking and Communications (ICNC) > 798 - 802

2017 International Conference on Computing, Networking and Communications (ICNC)

As computer systems increase in size and complexity, bugs become ever subtler and more difficult to detect and diagnose. A bug could exist at different layers of computer systems (e.g., applications, shared libraries, file systems, device firmware), or could be caused by the incompatibility among layers. In many cases, bugs would require a very specific combination of events to be triggered and are...

chapter

DFGenTool: A Dataflow Graph Generation Tool for Coarse Grain Reconfigurable Architectures

Manideepa Mukherjee, Alexander Fell, Apala Guha

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID) > 67 - 72

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID)

In this paper, DFGenTool, a dataflow graph (DFG) generation tool, is presented, which converts loops in a sequential program given in a high-level language such as C, into a DFG. DFGenTool adapts DFGs for mapping to Coarse Grain Reconfigurable Architectures (CGRA) to enable a variety of CGRA implementations and compilers to be benchmarked against a standard set of DFGs. Several kernels have been converted...

chapter

Soft Errors Susceptibility of Virtualization Servers

Frederico Cerveira, Raul Barbosa, Henrique Madeira

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC) > 125 - 134

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC)

Virtualization is essential in supporting today's information infrastructure, and in particular the Cloud Computing area. However, the move to a virtualized architecture implies the addition of a new single point of failure: the hypervisor. Attempts to characterize and compare the susceptibility of systems (including virtualized systems) are often limited to the study of failure modes and their probabilities...

chapter

A systematic security analysis of real-time cyber-physical systems

Arvind Easwaran, Anupam Chattopadhyay, Shivam Bhasin

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 206 - 213

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Security in Cyber-Physical Systems (CPS) has become a serious concern owing to the rapid adoption of technologies such as plug-and-play connectivity, robotics and remote coordination and control. It is well understood that the performance overhead incurred due to security considerations is rather high, which needs to be captured holistically for a real-time CPS with strict timing budget and hard deadlines...

chapter

Non-intrusive dynamic profiler for multicore embedded systems

Sudarshan Sargur, Roman Lysecky

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 500 - 505

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Application profiling is an important step in the design and optimization of embedded systems. Accurately identifying and analyzing the execution of frequently executed computational kernels is needed to effectively optimize the system implementation, at both design time and runtime. Most previous profiling approaches are software based, which can incur significant overhead and may be prohibitive...

chapter

Open vSwitch Vxlan performance acceleration in cloud computing data center

Yaohua Yan, Hongbo Wang

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) > 567 - 571

2016 5th International Conference on Computer Science and Network Technology (ICCSNT)

Cloud computing is one of the most popular Internet concepts, and many large companies provide cloud services to users. These large companies have built their own data centers to support upper layers of cloud services. To save cost and increase flexibility, SDN and virtualization technologies are widely used in data centers. Open vSwitch is an open source virtual switch that supports the OpenFlow...

chapter

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Li Ding, Ping Kang, Wenbo Yin, Linli Wang

2016 International Conference on Field-Programmable Technology (FPT) > 269 - 272

2016 International Conference on Field-Programmable Technology (FPT)

This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...

chapter

Random projections for scaling machine learning on FPGAs

Sean Fox, Stephen Tridgell, Craig Jin, Philip H.W. Leong

2016 International Conference on Field-Programmable Technology (FPT) > 85 - 92

2016 International Conference on Field-Programmable Technology (FPT)

Random projections have recently emerged as a powerful technique for large scale dimensionality reduction in machine learning applications. Crucially, the projection can be obtained from sparse probability distributions, enabling hardware implementations with little overhead. In this paper, we describe a Field-Programmable Gate Array (FPGA) implementation alongside a kernel adaptive filter (KAF) that...

INFONA - science communication portal

Search results

Developing dynamic profiling and debugging support in OpenCL for FPGAs

Hardware-software codesign of accurate, multiplier-free Deep Neural Networks

Latency-aware packet processing on CPU-GPU heterogeneous systems

Instruction-level data isolation for the kernel on ARM

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Analyzing hardware based malware detectors

Special session paper: exploiting quality-energy tradeoffs with arbitrary quantization

Work-in-progress: REDEFINE – a case for WCET-friendly hardware accelerators for real time applications

Quality of service support for fine-grained sharing on GPUs

Access pattern-aware cache management for improving data utilization in GPU

LogCA: A high-level performance model for hardware accelerators

Security sandbox model for modern web environment

A command-level study of Linux kernel bugs

DFGenTool: A Dataflow Graph Generation Tool for Coarse Grain Reconfigurable Architectures

Soft Errors Susceptibility of Virtualization Servers

A systematic security analysis of real-time cyber-physical systems

Non-intrusive dynamic profiler for multicore embedded systems

Open vSwitch Vxlan performance acceleration in cloud computing data center

Hardware TCP Offload Engine based on 10-Gbps Ethernet for low-latency network communication

Random projections for scaling machine learning on FPGAs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options