Snehasish Kumar

chapter

Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs

Snehasish Kumar, Nick Sumner, Vijayalakshmi Srinivasan, Steve Margerm, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 565 - 576

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Technology constraints have increasingly led to the adoption of specialized coprocessors, i.e. hardware accelerators. The first challenge that computer architects encounter is identifying "what to specialize in the program". We demonstrate that this requires precise enumeration of program paths based on dynamic program behavior. We hypothesize that path-based [4] accelerator offloading leads...

chapter

Chainsaw: Von-neumann accelerators to leverage fused instruction chains

Amirali Sharifian, Snehasish Kumar, Apala Guha, Arrvindh Shriraman

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 1 - 14

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

A central tenet behind accelerators is to partition a program execution into regions with different behavior (e.g., SIMD, Irregular, Compute-Intensive) and then use behavior-specialized architectures [1] for each region. It is unclear whether the gains in efficiency arise from recognizing that a simpler microarchitecture is sufficient for the acceleratable code region or the actual microarchitecture,...

chapter

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Snehasish Kumar, William N. Sumner, Arrvindh Shriraman

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 11

2016 IEEE International Symposium on Workload Characterization (IISWC)

The end of Dennard Scaling has necessitated research into the adoption of specialized architectures for offloading specific code regions in applications. Recent works in accelerator architectures have chosen diverse workloads and even diverse code regions (within the same workload) to highlight the efficacy of specific accelerator architectures. However this makes it challenging to evaluate the power/performance...

chapter

Fusion: Design tradeoffs in coherent cache hierarchies for accelerators

Snehasish Kumar, Arrvindh Shriraman, Naveen Vedula

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 733 - 745

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

Chip designers have shown increasing interest in integrating specialized fixed-function coprocessors into multicore designs to improve energy efficiency. Recent work in academia [11, 37] and industry [16] has sought to enable more fine-grain offloading at the granularity of functions and loops. The sequential program now needs to migrate across the chip utilizing the appropriate accelerator for each...

chapter

SQRL: Hardware accelerator for collecting software data structures

Snehasish Kumar, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, more

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 475 - 476

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Software data structures are a critical aspect of emerging data-centric applications which makes it imperative to improve the energy efficiency of data delivery. We propose SQRL, a hardware accelerator that integrates with the last-level-cache (LLC) and enables energy-efficient iterative computation on data structures. SQRL integrates a data structure-specific LLC refill engine (Collector) with a...

chapter

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish Kumar, Hongzhou Zhao, Arrvindh Shriraman, Eric Matthews, more

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture > 376 - 388

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

The fixed geometries of current cache designs do not adapt to the working set requirements of modern applications, causing significant inefficiency. The short block lifetimes and moderate spatial locality exhibited by many applications result in only a few words in the block being touched prior to eviction. Unused words occupy between 17 -- 80% of a 64K L1 cache and between 1% -- 79% of a 1MB private...

INFONA - science communication portal

Search results for: Snehasish Kumar

Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs

Chainsaw: Von-neumann accelerators to leverage fused instruction chains

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Fusion: Design tradeoffs in coherent cache hierarchies for accelerators

SQRL: Hardware accelerator for collecting software data structures

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Snehasish Kumar

Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs

Chainsaw: Von-neumann accelerators to leverage fused instruction chains

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Fusion: Design tradeoffs in coherent cache hierarchies for accelerators

SQRL: Hardware accelerator for collecting software data structures

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options