Search results

chapter

Image-Domain Gridding on Graphics Processors

Bram Veenboer, Matthias Petschow, John W. Romein

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 545 - 554

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Realizing the next generation of radio telescopes such as the Square Kilometre Array (SKA) requires both more efficient hardware and algorithms than today's technology provides. The recently introduced image-domain gridding (IDG) algorithm is a novel approach towards solving the most compute-intensive parts of creating sky images: gridding and degridding. It avoids the performance bottlenecks of traditional...

chapter

Alternative Processor Within Threshold: Flexible Scheduling on Heterogeneous Systems

Sonia Lopez, Stavan Satish Karia

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 42 - 53

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Computing systems have become increasingly heterogeneous contributing to higher performance and power efficiency. However, this is at the cost of increasing the overall complexity of designing such systems. One key challenge in the design of heterogeneous systems is the efficient scheduling of computational load. To address this challenge, this paper thoroughly analyzes state of the art scheduling...

chapter

Portable Implementation of Advanced Driver-Assistance Algorithms on Heterogeneous Architectures

Oliver Jakob Arndt, Fabian David Trager, Tobias MoB, Holger Blume

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 6 - 17

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The increased use of application-specific computational devices turns even low-power chips into high-performance computers. Not only additional accelerators (e.g., GPU, DSP, or even FPGA), but also heterogeneous CPU clusters form modern computer systems. Programming these chips is however challenging, due to management overhead, data transfer delays, and a missing unification of the programming flow...

chapter

Design of image acquisition system based on embedded Linux

Hong He, Yang Li, Zhihong Zhang

2017 29th Chinese Control And Decision Conference (CCDC) > 2261 - 2264

2017 29th Chinese Control And Decision Conference (CCDC)

Based on the requirements of miniaturization, stability and definition of the image acquisition device, an embedded Linux image acquisition and display system based on embedded system is designed. The system hardware using ARM core S3C2440 microprocessor, USB camera and LCD display to build image acquisition and display system; the software system placed Linux system as the core is built. Build hardware...

chapter

Modeling Distributed Platforms from Application Traces for Realistic File Transfer Simulation

Anchen Chai, Mohammad-Mahdi Bazm, Sorina Camarasu-Pop, Tristan Glatard, more

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 54 - 63

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Simulation is a fast, controlled, and reproducible way to evaluate new algorithms for distributed computing platforms in a variety of conditions. However, the realism of simulations is rarely assessed, which critically questions the applicability of a whole range of findings. In this paper, we present our efforts to build platform models from application traces, to allow for the accurate simulation...

chapter

Convolving over time via recurrent connections for sequential weight sharing in neural networks

Jason M. Allred, Kaushik Roy

2017 International Joint Conference on Neural Networks (IJCNN) > 4444 - 4450

2017 International Joint Conference on Neural Networks (IJCNN)

Convolutional Neural Networks (CNNs) have proven effective for machine learning tasks such as computer vision. Analog, asynchronous hardware implementations of such neural networks appear to be promising avenues for fast, online, real-time, energy efficient machine learning. However, the weight-sharing requirements of CNNs present challenges for such neuromorphic designs. We propose a biologically...

chapter

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

chapter

High Throughput FPGA Implementation for regular Non-Surjective Finite Alphabet Iterative Decoders

Thien Truong Nguyen-Ly, Valentin Savin, Xavier Popon, David Declercq

2017 IEEE International Conference on Communications Workshops (ICC Workshops) > 961 - 966

2017 IEEE International Conference on Communications Workshops (ICC Workshops)

This paper deals with the recently introduced class of Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs). First, optimization results for an extended class of regular NS-FAIDs are presented. They reveal different possible trade-offs between decoding performance and hardware implementation efficiency. To validate the promises of optimized NS-FAIDs in terms of hardware implementation benefits,...

chapter

A Small-Scale Testbed for Large-Scale Reliable Computing

Jason St. John, Thomas J. Hacker

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1251 - 1258

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

High performance computing (HPC) systems frequently suffer errors and failures from hardware components that negatively impact the performance of jobs run on these systems. We analyzed system logs from two HPC systems at Purdue University and created statistical models for memory and hard disk errors. We created a small-scale error injection testbed—using a customized QEMU build, libvirt, and Python—that...

chapter

Reviving instruction set randomization

Kanad Sinha, Vasileios P. Kemerlis, Simha Sethumadhavan

2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST) > 21 - 28

2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

Instruction set randomization (ISR) was proposed early in the last decade as a countermeasure against code injection attacks. However, it is considered to have lost its relevance; with the pervasiveness of code-reuse techniques in modern attacks, code injection no longer remains a foundational component in contemporary exploits. This paper revisits the relevance of ISR in the current security landscape...

chapter

Power Analysis of HLS-Designed Customized Instruction Set Architectures

Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 207 - 212

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Performance and power consumption are key features for evaluating any processor design. In this paper, we present close attention to the impact on power and energy consumption of customized Instruction SetArchitecture (ISA) designed by means of High Level Synthesis (HLS) tools. We compare these results against a full ISA soft processor, Microblaze. Our customized ISA processors greatly reduce the...

chapter

Multi2Sim Kepler: A detailed architectural GPU simulator

Xun Gong, Rafael Ubal, David Kaeli

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 269 - 278

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Presilicon simulation is one of the key toolsets for computer architects to evaluate and optimize their future designs. As Graphics Processing Units (GPUs) have become the platform of choice in many computing communities due to their impressive processing capabilities, computer architecture researchers need a simulation framework that allows them to quantitatively consider design tradeoffs. In this...

chapter

SimBench: A portable benchmarking methodology for full-system simulators

Harry Wagstaff, Bruno Bodin, Tom Spink, Bjorn Franke

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 217 - 226

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Full-system simulators are increasingly finding their way into the consumer space for the purposes of backwards compatibility and hardware emulation (e.g. for games consoles). For such compute-intensive applications simulation performance is paramount. In this paper we argue that existing benchmark suites such as SPEC CPU2006, originally designed for architecture and compiler performance evaluation,...

chapter

PTAT: An efficient and precise tool for collecting detailed TLB miss traces

Jiutian Zhang, Yuhang Liu, Xiaojing Zhu, Yuan Ruan, more

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 137 - 138

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

It is well known that the TLB performance impacts the memory system performance, which is critical for overall system performance. Similar to multi-level caches, multilevel TLBs have become an important leverage for boosting data access performance. Applications have increasingly large working sets. Servers targeting such applications have thus been built with ever larger main memory capacities, but...

chapter

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, more

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 152 - 159

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and...

chapter

An FPGA Design Framework for CNN Sparsification and Acceleration

Sicheng Li, Wei Wen, Yu Wang, Song Han, more

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 28

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implementations...

chapter

From Smashed Screens to Smashed Stacks: Attacking Mobile Phones Using Malicious Aftermarket Parts

Omer Shwartz, Guy Shitrit, Asaf Shabtai, Yossi Oren

2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) > 94 - 98

2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

In this preliminary study we present thefirst practical attack on a modern smartphone whichis mounted through a malicious aftermarket replace-ment part (specifically, a replacement touchscreen). Our attack exploits the lax security checks on thepackets traveling between the touchscreen's embed-ded controller and the phone's main CPU, and isable to achieve kernel-level code execution privilegeson modern...

chapter

On the Effectiveness of Virtualization Based Memory Isolation on Multicore Platforms

Siqi Zhao, Xuhua Ding

2017 IEEE European Symposium on Security and Privacy (EuroS&P) > 546 - 560

2017 IEEE European Symposium on Security and Privacy (EuroS&P)

Virtualization based memory isolation has been widely used as a security primitive in many security systems. This paper firstly provides an in-depth analysis of its effectiveness in the multicore setting, a first in the literature. Our study reveals that memory isolation by itself is inadequate for security. Due to the fundamental design choices in hardware, it faces several challenging issues including...

chapter

An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads

Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, more

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) > 353 - 364

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Autonomous vehicles are an exemplar for forward-looking safety-critical real-time systems where significant computing capacity must be provided within strict size, weight, and power (SWaP) limits. A promising way forward in meeting these needs is to leverage multicore platforms augmented with graphics processing units (GPUs) as accelerators. Such an approach is being strongly advocated by NVIDIA,...

chapter

TimerShield: Protecting High-Priority Tasks from Low-Priority Timer Interference (Outstanding Paper)

Pratyush Patel, Manohar Vanga, Bjorn B. Brandenburg

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) > 3 - 12

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Timer interference arises when a high-priority realtime task is delayed by a timer interrupt that is intended for a lower-priority task. We demonstrate that high-resolution timers, as exposed for instance by Linux's hrtimer API, can cause substantial timer interference, which manifests as significantly increased response times and lowered throughput. To eliminate this source of unpredictability, we...

INFONA - science communication portal

Search results

Image-Domain Gridding on Graphics Processors

Alternative Processor Within Threshold: Flexible Scheduling on Heterogeneous Systems

Portable Implementation of Advanced Driver-Assistance Algorithms on Heterogeneous Architectures

Design of image acquisition system based on embedded Linux

Modeling Distributed Platforms from Application Traces for Realistic File Transfer Simulation

Convolving over time via recurrent connections for sequential weight sharing in neural networks

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

High Throughput FPGA Implementation for regular Non-Surjective Finite Alphabet Iterative Decoders

A Small-Scale Testbed for Large-Scale Reliable Computing

Reviving instruction set randomization

Power Analysis of HLS-Designed Customized Instruction Set Architectures

Multi2Sim Kepler: A detailed architectural GPU simulator

SimBench: A portable benchmarking methodology for full-system simulators

PTAT: An efficient and precise tool for collecting detailed TLB miss traces

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

An FPGA Design Framework for CNN Sparsification and Acceleration

From Smashed Screens to Smashed Stacks: Attacking Mobile Phones Using Malicious Aftermarket Parts

On the Effectiveness of Virtualization Based Memory Isolation on Multicore Platforms

An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads

TimerShield: Protecting High-Priority Tasks from Low-Priority Timer Interference (Outstanding Paper)

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options