Search results

chapter

Power Efficient Sharing-Aware GPU Data Management

Abdulaziz Tabbakh, Murali Annavaram, Xuehai Qian

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 698 - 707

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The power consumed by memory system in GPUs is a significant fraction of the total chip power. As thread level parallelism increases, GPUs are likely to stress cache and memory bandwidth even more, thereby exacerbating power consumption. We observe that neighboring concurrent thread arrays (CTAs) within GPU applications share considerable amount of data. However, the default GPU scheduling policy...

chapter

Exploring pipe implementations using an OpenCL framework for FPGAs

Vincent Mirian, Paul Chow

2015 International Conference on Field Programmable Technology (FPT) > 112 - 119

2015 International Conference on Field Programmable Technology (FPT)

In the last decade, OpenCL has sparked the interest of the computing world as it is a language based on an open standard that can run on many different heterogeneous platforms. This standard is continuously evolving to adapt to various use cases of different platforms. For example, with requests from the FPGA community, the pipe construct was added to the standard to facilitate the implementation...

chapter

Supporting x86-64 address translation for 100s of GPU lanes

Jason Power, Mark D. Hill, David A. Wood

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) > 568 - 578

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access patterns designed for throughput...

chapter

Preemption of a CUDA Kernel Function

Jon Calhoun, Hai Jiang

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)

As graphics processing units (GPUs) gain adoption as general purpose parallel compute devices, several key problems need to be addressed in order for their use to become more practical and more user friendly. One such problem is special functions designed to execute on GPUs called kernel functions are non-preempt able. Once the kernel is issued to the GPU it will remain there till either execution...

chapter

Proactive Detection of Kernel-Mode Rootkits

Pablo Bravo, Daniel F. Garcia

2011 Sixth International Conference on Availability, Reliability and Security > 515 - 520

2011 Sixth International Conference on Availability, Reliability and Security (ARES)

The sophistication of malicious software (malware) used to break the computer security has increased exponentially in the last years. Frequently, malware is hidden into a computer by software components called root kits. Therefore, early detection of root kits is of primary importance to avoid the uncontrolled operation of malware. Most of current techniques for root kit detection only allow a late...

chapter

A reuse-aware prefetching scheme for scratchpad memory

Jason Cong, Hui Huang, Chunyue Liu, Yi Zou

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC) > 960 - 965

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)

Scratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements...

INFONA - science communication portal

Search results

Power Efficient Sharing-Aware GPU Data Management

Exploring pipe implementations using an OpenCL framework for FPGAs

Supporting x86-64 address translation for 100s of GPU lanes

Preemption of a CUDA Kernel Function

Proactive Detection of Kernel-Mode Rootkits

A reuse-aware prefetching scheme for scratchpad memory

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Power Efficient Sharing-Aware GPU Data Management

Exploring pipe implementations using an OpenCL framework for FPGAs

Supporting x86-64 address translation for 100s of GPU lanes

Preemption of a CUDA Kernel Function

Proactive Detection of Kernel-Mode Rootkits

A reuse-aware prefetching scheme for scratchpad memory

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options