Moinuddin K. Qureshi

chapter

DICE: Compressing DRAM caches for bandwidth and capacity

Vinson Young, Prashant J. Nair, Moinuddin K. Qureshi

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 627 - 638

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines...

chapter

Pay-As-You-Go: Low-overhead hard-error correction for phase change memories

Moinuddin K. Qureshi

2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 318 - 328

2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Phase Change Memory (PCM) suffers from the problem of limited write endurance. This problem is exacerbated because of the high variability in lifetime across PCM cells, resulting in weaker cells failing much earlier than nominal cells. Ensuring long lifetimes under high variability requires that the design can correct a large number of errors for any given memory line. Unfortunately, supporting high...

chapter

Feedback-directed pipeline parallelism

M. Aater Suleman, Moinuddin K. Qureshi, Khubaib, Yale N. Patt

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 147 - 156

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Extracting high performance from Chip Multiprocessors requires that the application be parallelized. A common software technique to parallelize loops is pipeline parallelism in which the programmer/compiler splits each loop iteration into stages and each stage runs on a certain number of cores. It is important to choose the number of cores for each stage carefully because the core-to-stage allocation...

chapter

CANDY: Enabling coherent DRAM caches for multi-node systems

Chiachen Chou, Aamer Jaleel, Moinuddin K. Qureshi

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 1 - 13

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

This paper investigates the use of DRAM caches for multi-node systems. Current systems architect the DRAM cache as Memory-Side Cache (MSC), restricting the DRAM cache to cache only the local data, and relying on only the small on-die caches for the remote data. As MSC keeps only the local data, it is implicitly coherent and obviates the need of any coherence support. Unfortunately, as accessing the...

chapter

XED: Exposing On-Die Error Detection Information for Strong Memory Reliability

Prashant J. Nair, Vilas Sridharan, Moinuddin K. Qureshi

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) > 341 - 353

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

Large-granularity memory failures continue to be a critical impediment to system reliability. To make matters worse, as DRAM scales to smaller nodes, the frequency of unreliable bits in DRAM chips continues to increase. To mitigate such scaling-related failures, memory vendors are planning to equip existing DRAM chips with On-Die ECC. For maintaining compatibility with memory standards, On-Die ECC...

chapter

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM

Kevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, more

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 568 - 580

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)

This paper introduces a new DRAM design that enables fast and energy-efficient bulk data movement across subarrays in a DRAM chip. While bulk data movement is a key operation in many applications and operating systems, contemporary systems perform this movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow...

chapter

Reducing Refresh Power in Mobile Devices with Morphable ECC

Chiachen Chou, Prashant Nair, Moinuddin K. Qureshi

2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks > 355 - 366

2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Energy consumption is a primary consideration that determines the usability of emerging mobile computing devices such as smartphones. Refresh operations for main memory account for a significant fraction of the overall energy consumption, especially during idle periods, when processor can be switched off quickly, however, memory contents continue to get refreshed to avoid data loss. Given that mobile...

chapter

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems

Moinuddin K. Qureshi, Dae-Hyun Kim, Samira Khan, Prashant J. Nair, more

2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks > 427 - 437

2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Multirate refresh techniques exploit the non-uniformity in retention times of DRAM cells to reduce the DRAM refresh overheads. Such techniques rely on accurate profiling of retention times of cells, and perform faster refresh only for a few rows which have cells with low retention times. Unfortunately, retention times of some cells can change at runtime due to Variable Retention Time (VRT), which...

chapter

Reducing read latency of phase change memory via early read and Turbo Read

Prashant J. Nair, Chiachen Chou, Bipin Rajendran, Moinuddin K. Qureshi

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) > 309 - 319

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

Phase Change Memory (PCM) is an emerging memory technology that can enable scalable high-density main memory systems. Unfortunately, PCM has higher read latency than DRAM, resulting in lower system performance. This paper investigates architectural techniques to improve the read latency of PCM. We observe that there is a wide distribution in cell resistance in both the SET state and the RESET state,...

chapter

Unified address translation for memory-mapped SSDs with FlashMap

Jian Huang, Anirudh Badam, Moinuddin K. Qureshi, Karsten Schwan

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 580 - 591

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

Applications can map data on SSDs into virtual memory to transparently scale beyond DRAM capacity, permitting them to leverage high SSD capacities with few code changes. Obtaining good performance for memory-mapped SSD content, however, is hard because the virtual memory layer, the file system and the flash translation layer (FTL) perform address translations, sanity and permission checks independently...

chapter

BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches

Chiachen Chou, Aamer Jaleel, Moinuddin K. Qureshi

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 198 - 210

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

Die stacking memory technology can enable gigascale DRAM caches that can operate at 4x–8x higher bandwidth than commodity DRAM. Such caches can improve system performance by servicing data at a faster rate when the requested data is found in the cache, potentially increasing the memory bandwidth of the system by 4x–8x. Unfortunately, a DRAM cache uses the available memory bandwidth not only for data...

article

Architectural Support for Mitigating Row Hammering in DRAM Memories

Dae-Hyun Kim, Prashant J. Nair, Moinuddin K. Qureshi

IEEE Computer Architecture Letters > 2015 > 14 > 1 > 9 - 12

DRAM scaling has been the prime driver of increasing capacity of main memory systems. Unfortunately, lower technology nodes worsen the cell reliability as it increases the coupling between adjacent DRAM cells, thereby exacerbating different failure modes. This paper investigates the reliability problem due to Row Hammering , whereby frequent activations of a given row can cause data loss for its neighboring...

chapter

Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures

Prashant J. Nair, David A. Roberts, Moinuddin K. Qureshi

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 51 - 62

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Stacked memory modules are likely to be tightly integrated with the processor. It is vital that these memory modules operate reliably, as memory failure can require the replacement of the entire socket. To make matters worse, stacked memory designs are susceptible to newer failure modes (for example, due to faulty through-silicon vias, or TSVs) that can cause large portions of memory, such as a bank,...

chapter

CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache

Chia Chen Chou, Aamer Jaleel, Moinuddin K. Qureshi

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 1 - 12

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

This paper analyzes the trade-offs in architecting stacked DRAM either as part of main memory or as a hardware-managed cache. Using stacked DRAM as part of main memory increases the effective capacity, but obtaining high performance from such a system requires Operating System (OS) support to migrate data at a page-granularity. Using stacked DRAM as a hardware cache has the advantages of being transparent...

chapter

Operating SECDED-based caches at ultra-low voltage with FLAIR

Moinuddin K. Qureshi, Zeshan Chishti

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) > 1 - 11

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Voltage scaling is often limited by bit failures in large on-chip caches. Prior approaches for enabling cache operation at low voltages rely on correcting cache lines with multi-bit failures. Unfortunately, multi-bit Error Correcting Codes (ECC) incur significant storage overhead and complex logic. Our goal is to develop solutions that enable ultra-low voltage operation while incurring minimal changes...

chapter

A case for Refresh Pausing in DRAM memory systems

Prashant Nair, Chia-Chen Chou, Moinuddin K. Qureshi

2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) > 627 - 638

2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

DRAM cells rely on periodic refresh operations to maintain data integrity. As the capacity of DRAM memories has increased, so has the amount of time consumed in doing refresh. Refresh operations contend with read operations, which increases read latency and reduces system performance. We show that eliminating latency penalty due to refresh can improve average performance by 7.2%. However, simply doing...

chapter

Embedded tutorial - Emerging memory technologies: What it means for computer system designers

Moinuddin K. Qureshi

2013 26th International Conference on VLSI Design and 2013 12th International Conference on Embedded Systems > lxxvii - lxxviii

2013 26th International Conference on VLSI Design: concurrently with the 12th International Conference on Embedded Systems

Summary form only given. As conventional memory technologies such as DRAM run into the scaling wall, architects and system designers are forced to look at alternative technologies for building future computer systems. Several emerging Non-Volatile Memory (NVM) technologies such as PCM, STT-RAM, and Memristors have the potential to boost memory capacity in a scalable and power-efficient manner. However,...

chapter

Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

Moinuddin K. Qureshi, Gabe H. Loh

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture > 235 - 246

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

This paper analyzes the design trade-offs in architecting large-scale DRAM caches. Prior research, including the recent work from Loh and Hill, have organized DRAM caches similar to conventional caches. In this paper, we contend that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement...

chapter

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion

Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, Hyesoon Kim

2012 39th Annual International Symposium on Computer Architecture (ISCA) > 321 - 332

2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA)

Exclusive last-level caches (LLCs) reduce memory accesses by effectively utilizing cache capacity. However, they require excessive on-chip bandwidth to support frequent insertions of cache lines on eviction from upper-level caches. Non-inclusive caches, on the other hand, have the advantage of using the on-chip bandwidth more effectively but suffer from a higher miss rate. Traditionally, the decision...

chapter

PreSET: Improving performance of phase change memories by exploiting asymmetry in write times

Moinuddin K. Qureshi, Michele M. Franceschini, Ashish Jagmohan, Luis A. Lastras

2012 39th Annual International Symposium on Computer Architecture (ISCA) > 380 - 391

2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA)

Phase Change Memory (PCM) is a promising technology for building future main memory systems. A prominent characteristic of PCM is that it has write latency much higher than read latency. Servicing such slow writes causes significant contention for read requests. For our baseline PCM system, the slow writes increase the effective read latency by almost 2X, causing significant performance degradation...

INFONA - science communication portal

Search results for: Moinuddin K. Qureshi

DICE: Compressing DRAM caches for bandwidth and capacity

Pay-As-You-Go: Low-overhead hard-error correction for phase change memories

Feedback-directed pipeline parallelism

CANDY: Enabling coherent DRAM caches for multi-node systems

XED: Exposing On-Die Error Detection Information for Strong Memory Reliability

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM

Reducing Refresh Power in Mobile Devices with Morphable ECC

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems

Reducing read latency of phase change memory via early read and Turbo Read

Unified address translation for memory-mapped SSDs with FlashMap

BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches

Architectural Support for Mitigating Row Hammering in DRAM Memories

Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures

CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache

Operating SECDED-based caches at ultra-low voltage with FLAIR

A case for Refresh Pausing in DRAM memory systems

Embedded tutorial - Emerging memory technologies: What it means for computer system designers

Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion

PreSET: Improving performance of phase change memories by exploiting asymmetry in write times

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Moinuddin K. Qureshi

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options