Search results

chapter

A static-placement, dynamic-issue framework for CGRA loop accelerator

Zhongyuan Zhao, Weiguang Sheng, Weifeng He, ZhiGang Mao, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1348 - 1353

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper presents a static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA) in order to tackle the inefficiencies of the static-issue, static-placement (SISP) CGRA. This framework includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations. We stress on introducing the...

chapter

Taming warp divergence

Jayvant Anantpur, R. Govindarajan

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 50 - 60

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...

chapter

Pilot Register File: Energy Efficient Partitioned Register File for GPUs

Mohammad Abdel-Majeed, Alireza Shafaei, Hyeran Jeon, Massoud Pedram, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 589 - 600

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

GPU adoption for general purpose computing hasbeen accelerating. To support a large number of concurrentlyactive threads, GPUs are provisioned with a very large registerfile (RF). The RF power consumption is a critical concern. Oneoption to reduce the power consumption dramatically is touse near-threshold voltage(NTV) to operate the RF. However, operating MOSFET devices at NTV is fraught with stabilityand...

chapter

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Bernhard Egger, Hochan Lee, Duseok Kang, Mansureh S. Moghaddam, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 197 - 209

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved...

chapter

HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect

Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, Li-Shiuan Peh

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect...

chapter

Instruction-level data isolation for the kernel on ARM

Yeongpil Cho, Donghyun Kwon, Yunheung Paek

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

As more sophisticated services are increasingly offered by the OS kernel on mobile devices, the security and sensitivity of kernel data that they depend on are becoming a critical issue. Data isolation has emerged as a key technique that can address the issue by providing strong protection for sensitive kernel data. However, existing data isolation mechanisms for mobile devices all incur non-negligible...

chapter

Enabling fast preemption via Dual-Kernel support on GPUs

Li-Wei Shieh, Kun-Chih Chen, Hsueh-Chun Fu, Po-Han Wang, more

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 121 - 126

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

To consider QoS for resource-limited mobile systems, we introduce a fast preemption mechanism on GPUs. First, we involve a dual-kernel execution model to support fine-grained preemption, and a resource allocation policy to avoid resource fragmentation problem. Second, we propose a preemption victim selection scheme to reduce the throughput overhead while satisfying a required preemption latency. Evaluations...

chapter

Soft Errors Susceptibility of Virtualization Servers

Frederico Cerveira, Raul Barbosa, Henrique Madeira

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC) > 125 - 134

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC)

Virtualization is essential in supporting today's information infrastructure, and in particular the Cloud Computing area. However, the move to a virtualized architecture implies the addition of a new single point of failure: the hypervisor. Attempts to characterize and compare the susceptibility of systems (including virtualized systems) are often limited to the study of failure modes and their probabilities...

chapter

KRGuard: Kernel Rootkits Detection Method by Monitoring Branches Using Hardware Features

Yohei Akao, Toshihiro Yamauchi

2016 International Conference on Information Science and Security (ICISS) > 1 - 5

2016 International Conference on Information Science and Security (ICISS)

Attacks on an operating system kernel using kernel rootkits pose a particularly serious threat. Detecting an attack is difficult when the operating system kernel is infected with a kernel rootkit. For this reason, handling an attack will be delayed causing an increase in the amount of damage done to a computer system. In this paper, we discuss KRGuard (Kernel Rootkits Guard), which is a new method...

chapter

Risk Mining: Company-Risk Identification from Unstructured Sources

Timothy Nugent, Jochen L. Leidner

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 1308 - 1311

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Risk permeates all aspects of doing business. However, support tools capable of systematically identifying the complete spectrum of risks that a company might face are currently lacking. Such a tool would need to reliably identify company-risk relationships from unstructured sources, therefore providing a qualitative assessment of risk exposure. We propose a supervised learning approach that combines...

chapter

TZ-KPM:Kernel Protection Mechanism on Embedded Devices on Hardware-Assisted Isolated Environment

Xianyi Zheng, Yanhong He, Jiangang Ma, Gang Shi, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 663 - 670

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

With the rapid development of network technology and the increasingly complexity of system function, the embedded system is facing more and more serious threats. Previous researches on kernel monitoring and protection widely relies on higher privileged system components, such as hardware virtualization extensions, to isolate security tools from potential kernel attacks. These approaches increase both...

chapter

Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching

Zhen Lin, Lars Nyland, Huiyang Zhou

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 898 - 908

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Context switching is a key technique enabling preemption and time-multiplexing for CPUs. However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics processing units (GPUs), it is challenging to support context switching due to the massive number of threads, which leads to a huge amount of architectural states to be swapped during context switching. The architectural...

chapter

Modular architecture for multiple transforms in modern video standards

Roberto R. Osorio

2016 Conference on Design of Circuits and Integrated Systems (DCIS) > 1 - 6

2016 Conference on Design of Circuits and Integrated Systems (DCIS)

Modern video standards such as H.264 and HEVC introduce new simplified transform functions that allow for simple hardware implementation, different block sizes and enhanced coding efficiency. However, the number of different transforms to implement has increased, leading to the need of shared architectures able to process several transforms with minimum hardware overhead. This trend started with H...

chapter

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support

Matt Martineau, Simon McIntosh-Smith, Carlo Bertolli, Arpith C. Jacob, more

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) > 54 - 64

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

The Clang implementation of OpenMP® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA® performance without major code restructuring is an open research issue.In order to analyze the current performance, we port a suite of representative benchmarks,...

chapter

Code generation for a SIMD architecture with custom memory organisation

Mehmet Ali Arslan, Flavius Gruian, Krzysztof Kuchcinski, Andreas Karlsson

2016 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 90 - 97

2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Today's multimedia and DSP applications impose requirements on performance and power consumption that only custom processor architectures with SIMD capabilities can satisfy. However, the specific features of such architectures, including vector operations and high-bandwidth complex memory organization, make them notoriously complicated and time consuming to program. In this paper we present an automated...

chapter

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

Yue Hu, David M. Koppelman, Steven R. Brandt

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 361 - 368

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation...

chapter

Dual-Engine Cross-ISA DBTO Technique Utilising MultiThreaded Support for Multicore Processor System

Joo On Ooi, Fawnizu Azmadi B. Hussin, Nordin Zakaria

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 257 - 264

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

The emergence of new era of Internet of Things or IoT have encouraged intensive if not extensive usage of modern mobile apps, thus multi-ISA equipped multicore processor gain great potential to be used for more efficient instruction binary processing in near future. In order to support this ISA diversity of computing platforms, mix modes of statically and dynamically Binary Translation and Optimization...

chapter

Exploiting Large Memory Using 32-Bit Energy-Efficient Manycore Architectures

Mohamed L. Karaoui, Pierre-Yves Peneau, Quentin Meunier, Franck Wajsburt, more

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 61 - 68

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Recent advances in processor manufacturing has led to integrating tens of cores in a single chip and promise to integrate many more with the so-called manycore architectures. Manycore architectures usually integrate many small power efficient cores, which can be 32-bit cores in order to maximize the performance per Watt ratio. Providing large physical memory (e.g. 1 TB) to such architectures thus...

chapter

VarySched: A Framework for Variable Scheduling in Heterogeneous Environments

Tim SuB, Nils Doring, Ramy Gad, Lars Nagel, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 489 - 492

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often...

chapter

Enhancing Data Secrecy with Segmentation Based Isolation

Chi Zhang, Hui He, Xiaoguang Wang, Yichen Li, more

2016 13th Web Information Systems and Applications Conference (WISA) > 203 - 208

2016 13th Web Information Systems and Applications Conference (WISA)

Software memory disclosure attacks, such as buffer over-read, often work quietly and may cause leakage of secrets. The well-known OpenSSL Heartbleed vulnerability leaked out millions of servers' private keys, and caused most of Internet services insecure during that time. Existing solutions are either hard to apply to large code bases, or too heavyweight (e.g. by involving a hypervisor software or...

INFONA - science communication portal

Search results

A static-placement, dynamic-issue framework for CGRA loop accelerator

Taming warp divergence

Pilot Register File: Energy Efficient Partitioned Register File for GPUs

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect

Instruction-level data isolation for the kernel on ARM

Enabling fast preemption via Dual-Kernel support on GPUs

Soft Errors Susceptibility of Virtualization Servers

KRGuard: Kernel Rootkits Detection Method by Monitoring Branches Using Hardware Features

Risk Mining: Company-Risk Identification from Unstructured Sources

TZ-KPM:Kernel Protection Mechanism on Embedded Devices on Hardware-Assisted Isolated Environment

Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching

Modular architecture for multiple transforms in modern video standards

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support

Code generation for a SIMD architecture with custom memory organisation

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

Dual-Engine Cross-ISA DBTO Technique Utilising MultiThreaded Support for Multicore Processor System

Exploiting Large Memory Using 32-Bit Energy-Efficient Manycore Architectures

VarySched: A Framework for Variable Scheduling in Heterogeneous Environments

Enhancing Data Secrecy with Segmentation Based Isolation

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options