Reena Panda

chapter

SelSMaP: A Selective Stride Masking Prefetching Scheme

Jiajun Wang, Reena Panda, Lizy Kurian John

2017 IEEE International Conference on Computer Design (ICCD) > 369 - 372

2017 IEEE 35th International Conference on Computer Design (ICCD)

Although prefetching concepts have been proposed for decades, new challenges are introduced by sophisticated system architecture and emerging applications. Large instruction windows coupled with out-of-order execution makes program data access sequence distorted from cache perspective. Big data applications stress memory subsystems heavily with their large working set sizes and complex data access...

chapter

Proxy Benchmarks for Emerging Big-Data Workloads

Reena Panda, Lizy Kurian John

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 105 - 116

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Early design-space evaluation of computer-systems is usually performed using performance models such as detailed simulators, RTL-based models etc. Unfortunately, it is very challenging (often impossible) to run many emerging applications on detailed performance models owing to their complex application software-stacks, significantly long run times, system dependencies and the limited speed/potential...

chapter

Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration

Reena Panda, Xinnian Zheng, Lizy Kurian John

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 87 - 96

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

With increasing memory footprints and working set sizes of emerging workloads, system designers need to evaluate new memory hierarchies with large last level caches (LLCs), DRAM caches, large DRAMs, etc. to optimize performance gains. This requires a deep understanding of the memory access behavior of the target workloads. It is important to have accurate mechanisms to generate address streams to...

chapter

Prefetching for cloud workloads: An analysis based on address patterns

Jiajun Wang, Reena Panda, Lizy Kurian John

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 163 - 172

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Cloud computing is gaining popularity due to its ability to provide infrastructure, platform and software services to clients on a global scale. Using cloud services, clients reduce the cost and complexity of buying and managing the underlying hardware and software layers. Popular services like web search, data analytics and data mining typically work with big data sets that do not fit into top level...

chapter

Proxy benchmarks for emerging big-data workloads

Reena Panda, Lizy Kurian John

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 139 - 140

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Early design space evaluation of computer systems is usually performed using performance models (e.g., detailed simulators, RTL-based models, etc.). However, it is very challenging (often impossible) to run many emerging applications on detailed performance models owing to their complex software-stacks and long run times. To overcome such challenges in benchmarking these complex applications, we propose...

chapter

Statistical pattern based modeling of GPU memory access streams

Reena Panda, Xinnian Zheng, Jiajun Wang, Andreas Gerstlauer, more

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Recent research studies have shown that modern GPU performance is often limited by the memory system performance. Optimizing memory hierarchy performance requires GPU designers to draw design insights based on the cache & memory behavior of end-user applications. Unfortunately, it is often difficult to get access to end-user workloads due to the confidential or proprietary nature of the software/data...

chapter

Proxy-Guided Load Balancing of Graph Processing Workloads on Heterogeneous Clusters

Shuang Song, Meng Li, Xinnian Zheng, Michael LeBeane, more

2016 45th International Conference on Parallel Processing (ICPP) > 77 - 86

2016 45th International Conference on Parallel Processing (ICPP)

Big data decision-making techniques take advantage of large-scale data to extract important insights from them. One of the most important classes of such techniques falls in the domain of graph applications, where data segments and their inherent relationships are represented as vertices and edges. Efficiently processing large-scale graphs involves many subtle tradeoffs and is still regarded as an...

chapter

Genesys: Automatically generating representative training sets for predictive benchmarking

Reena Panda, Xinnian Zheng, Shuang Song, Jee Ho Ryoo, more

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) > 116 - 123

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)

Fast and efficient design space exploration is a critical requirement for designing computer systems, however, the growing complexity of hardware/software systems and significantly long run-times of detailed simulators often makes it challenging. Machine learning (ML) models have been proposed as popular alternatives that enable fast exploratory studies. The accuracy of any ML model depends heavily...

chapter

POSTER: SILC-FM: Subblocked interleaved Cache-Like Flat Memory Organization

Jee Ho Ryoo, Mitesh R. Meswani, Reena Panda, Lizy K. John

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 435 - 437

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

In this paper, we present a flat address space organization called SILC-FM that allows subblocks from two pages to coexist in an interleaved fashion in die-stacked DRAM. Data movement at subblocked granularity consumes less bandwidth compared to migrating the entire large block and prevents fetching useless subblocks that may never get accessed. SILC-FM can get more spatial locality hits than CAMEO...

chapter

Watt Watcher: Fine-Grained Power Estimation for Emerging Workloads

Michael LeBeane, Jee Ho Ryoo, Reena Panda, Lizy Kurian John

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 106 - 113

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Extensive research has focused on estimating power to guide advances in power management schemes, thermal hot spots, and voltage noise. However, simulated power models are slow and struggle with deep software stacks, while direct measurements are typically coarse-grained. This paper introduces Watt Watcher, a multicore power measurement framework that offers fine-grained functional unit breakdowns...

chapter

Performance Characterization of Modern Databases on Out-of-Order CPUs

Reena Panda, Christopher Erb, Michael LeBeane, Jee Ho Ryoo, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 114 - 121

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Big data revolution has created an unprecedented demand for intelligent data management solutions on a large scale. While data management has traditionally been used as a synonym for relational data processing, in recent years a new group popularly known as NoSQL databases have emerged as a competitive alternative. There is a pressing need to gain greater understanding of the characteristics of modern...

chapter

GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?

Jee Ho Ryoo, Saddam J. Quirem, Michael Lebeane, Reena Panda, more

2015 44th International Conference on Parallel Processing > 320 - 329

2015 44th International Conference on Parallel Processing (ICPP)

Recently, GPGPUs have positioned themselves in the mainstream processor arena with their potential to perform a massive number of jobs in parallel. At the same time, many GPGPU benchmark suites have been proposed to evaluate the performance of GPGPUs. Both academia and industry have been introducing new sets of benchmarks each year while some already published benchmarks have been updated periodically...

chapter

Data partitioning strategies for graph workloads on heterogeneous clusters

Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Large scale graph analytics are an important class of problem in the modern data center. However, while data centers are trending towards a large number of heterogeneous processing nodes, graph analytics frameworks still operate under the assumption of uniform compute resources. In this paper, we develop heterogeneity-aware data ingress strategies for graph analytics workloads using the popular PowerGraph...

chapter

B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors

David Kadjo, Jinchun Kim, Prabal Sharma, Reena Panda, more

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 623 - 634

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

For decades, the primary tools in alleviating the "Memory Wall" have been large cache hierarchies and dataprefetchers. Both approaches, become more challenging in modern, Chip-multiprocessor (CMP) design. Increasing the last-level cache (LLC) size yields diminishing returns in terms of performance per Watt, given VLSI power scaling trends, this approach becomes hard to justify. These trends...

chapter

Data analytics workloads: Characterization and similarity analysis

Reena Panda, Lizy Kurian John

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) > 1 - 9

2014 IEEE International Performance Computing and Communications Conference (IPCCC)

Performance of modern day computer systems greatly depends on the wide range of workloads, which run on the systems. Thus, a representative set of workloads, representing the different classes of real-world applications, need to be used by computer designers and researchers for processor design-space evaluation studies. While a number of different benchmark suites are available, a few common benchmark...

article

B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors

Reena Panda, Paul V. Gratz, Daniel A. Jimenez

IEEE Computer Architecture Letters > 2012 > 11 > 2 > 41 - 44

Computer architecture is beset by two opposing trends. Technology scaling and deep pipelining have led to high memory access latencies; meanwhile, power and energy considerations have revived interest in traditional in-order processors. In-order processors, unlike their superscalar counterparts, do not allow execution to continue around data cache misses. In-order processors, therefore, suffer a greater...

INFONA - science communication portal

Search results for: Reena Panda

SelSMaP: A Selective Stride Masking Prefetching Scheme

Proxy Benchmarks for Emerging Big-Data Workloads

Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration

Prefetching for cloud workloads: An analysis based on address patterns

Proxy benchmarks for emerging big-data workloads

Statistical pattern based modeling of GPU memory access streams

Proxy-Guided Load Balancing of Graph Processing Workloads on Heterogeneous Clusters

Genesys: Automatically generating representative training sets for predictive benchmarking

POSTER: SILC-FM: Subblocked interleaved Cache-Like Flat Memory Organization

Watt Watcher: Fine-Grained Power Estimation for Emerging Workloads

Performance Characterization of Modern Databases on Out-of-Order CPUs

GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?

Data partitioning strategies for graph workloads on heterogeneous clusters

B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors

Data analytics workloads: Characterization and similarity analysis

B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Reena Panda

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options