Search results

Items from 61 to 80 out of 529 results

chapter

Architecting an Energy-Efficient DRAM System for GPUs

Niladrish Chatterjee, Mike OConnor, Donghyuk Lee, Daniel R. Johnson, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 73 - 84

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

This paper proposes an energy-efficient, high-throughput DRAM architecture for GPUs and throughput processors. In these systems, requests from thousands of concurrent threads compete for a limited number of DRAM row buffers. As a result, only a fraction of the data fetched into a row buffer is used, leading to significant energy overheads. Our proposed DRAM architecture exploits the hierarchical organization...

chapter

Understanding and Optimizing Power Consumption in Memory Networks

Xun Jian, Pavan Kumar Hanumolu, Rakesh Kumar

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 229 - 240

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

As the amount of digital data the world generates explodes, data centers and HPC systems that process this big data will require high bandwidth and high capacity main memory. Unfortunately, conventional memory technologies either provide high memory capacity (e.g., DDRx memory) or high bandwidth (GDDRx memory), but not both. Memory networks, which provide both high bandwidth and high capacity memory...

chapter

Near-Optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources

Jayesh Gaur, Mainak Chaudhuri, Pradeep Ramachandran, Sreenivas Subramoney

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 13 - 24

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

The memory wall continues to be a major performance bottleneck. While small on-die caches have been effective so far in hiding this bottleneck, the ever-increasing footprint of modern applications renders such caches ineffective. Recent advances in memory technologies like embedded DRAM (eDRAM) and High Bandwidth Memory (HBM) have enabled the integration of large memories on the CPU package as an...

chapter

Partial Row Activation for Low-Power DRAM System

Yebin Lee, Hyeonggyu Kim, Seokin Hong, Soontae Kim

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 217 - 228

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Owing to increasing demand of faster and larger DRAM system, the DRAM system accounts for a large portion of the total power consumption of computing systems. As memory traffic and DRAM bandwidth grow, the row activation and I/O power consumptions are becoming major contributors to total DRAM power consumption. Thus, reducing row activation and I/O power consumptions has big potential for improving...

chapter

SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization

Jee Ho Ryoo, Mitesh R. Meswani, Andreas Prodromou, Lizy K. John

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 349 - 360

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

With current DRAM technology reaching its limit, emerging heterogeneous memory systems have become attractive to keep the memory performance scaling. This paper argues for using a small, fast memory closer to the processor as part of a flat address space where the memory system is composed of two or more memory types. OS-transparent management of such memory has been proposed in prior works such as...

chapter

Processing-in-Memory Enabled Graphics Processors for 3D Rendering

Chenhao Xie, Shuaiwen Leon Song, Jing Wang, Weigong Zhang, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 637 - 648

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

The performance of 3D rendering of GraphicsProcessing Unit that converts 3D vector stream into 2D framewith 3D image effects significantly impacts users gamingexperience on modern computer systems. Due to its hightexture throughput requirement, main memory bandwidthbecomes a critical obstacle for improving the overall renderingperformance. 3D-stacked memory systems such as HybridMemory Cube provide...

chapter

MCM-GPU: Multi-chip-module GPUs for continued performance scalability

Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 320 - 332

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monolithic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this...

chapter

Analyzes of the distributed system load with multifractal input data flows

Lyudmyla Kirichenko, Tamara Radivilova

2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM) > 260 - 264

2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM)

The paper proposes a solution an actual scientific problem related to load balancing and efficient utilization of resources of the distributed system. The proposed method is based on calculation of load CPU, memory, and bandwidth by flows of different classes of service for each server and the entire distributed system and taking into account multifractal properties of input data flows. Weighting...

chapter

Locality-aware bank partitioning for shared DRAM MPSoCs

Yangguo Liu, Junlin Lu, Dong Tong, Xu Cheng

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 770 - 775

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Memory interference is a critical impediment to system performance in MPSoCs. To address this problem, we first propose a Locality-Aware Bank Partitioning (LABP), which partitions memory banks according to applications' memory access behavior. The key idea is to separate memory intensive applications with high row-buffer locality from the other applications. Moreover, we integrate LABP with a bandwidth...

chapter

BoDNoC: Providing bandwidth-on-demand interconnection for multi-granularity memory systems

Shiqi Lian, Ying Wang, Yinhe Han, Xiaowei Li

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 738 - 743

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Multi-granularity memory system provides multiple access granularities for the applications with various spatial localities. In the multi-granularity access pattern, the one-size-bandwidth NoC design cannot utilize the bandwidth efficiently. We propose a novel NoC design, called BoDNoC, which can merge multiple narrow subnets to provide various bandwidths for access data. The new design also adopts...

chapter

ApproxPIM: Exploiting realistic 3D-stacked DRAM for energy-efficient processing in-memory

Yibin Tang, Ying Wang, Huawei Li, Xiaowei Li

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 396 - 401

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Processing-in-Memory (PIM), has recently been revisited as one of the most promising solutions to deal with the issue of bandwidth and power wall between processor and memory. In this paper, we propose a light-weight PIM architecture, approxPIM, which leverages approximate computing techniques to enable InMemory Processing in a realistic 3D-stacked DRAM, Micron's Hybrid Memory Cube (HMC). Using the...

chapter

High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs

Kosuke Tatsumura, Sadegh Yazdanshenas, Vaughn Betz

2016 International Conference on Field-Programmable Technology (FPT) > 4 - 11

2016 International Conference on Field-Programmable Technology (FPT)

Many important applications demand large amounts of on-chip memory both to fully utilize an FPGA's computational capacity and to minimize energy-consuming off-chip memory accesses, leading some recent commercial FPGAs to add higher-capacity on-chip block RAMs (BRAMs). While memory is becoming more important to FPGA designs, SRAM scaling is becoming more difficult because of increasing device variation...

chapter

Generalization and Implementation of RAM-Based Key-Value Store

Tian Tian, Chengfei Zhang, Kai Yu, Yiming Zhang, more

2016 International Conference on Computational Science and Computational Intelligence (CSCI) > 1412 - 1413

2016 International Conference on Computational Science and Computational Intelligence (CSCI)

RAM-based storage aggregates the RAM of thousands of commodity servers in data center networks (DCN) to provide extremely low I/O latency and high I/O throughput. In order to achieve fast failure recovery, MemCube exploits network proximity to restrict failure detection and recovery within 1-hop range. However, previous design is applicable only to the BCube network, which limits the usage of RAM-based...

chapter

Small cache lookaside table for fast DRAM cache access

Xi Tao, Qi Zeng, Jih-Kwon Peir, Shih-Lien Lu

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) > 1 - 10

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)

Large off-die stacked DRAM caches have been proposed to provide higher effective bandwidth and lower average latency to main memory. Designing a large off-die DRAM cache with conventional block size requires a large tag array which is impractical to fit on-die. Placing the large directory off-die prolong the latency since a tag access is necessary before the data can be accessed. This additional trip...

chapter

A low power lossy frame memory recompression algorithm

Xin Zhou, Xiaocong Lian, Wei Zhou, Zhenyu Liu, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

With the development of Ultra-High-Definition video, the power consumed by accessing reference frames in the external DRAM has become the bottleneck for the portable video encoding system design. To reduce the dynamic power of DRAM, a lossy frame memory recompression algorithm is proposed. The compression algorithm is composed of a content-aware adaptive quantization, a multi-mode directional prediction,...

chapter

Hybrid model for validating performance of streaming video signals

Debasish Das, Tapan Das

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 1852 - 1856

2016 IEEE 13th International Conference on Signal Processing (ICSP)

As more and more consumers access streaming video content over the internet, enterprises across the entire video distribution value chain experience tremendous pressure to deliver better performance and high quality of experience (QoE) to the end users. Enhanced video performance is highly desirable, starting from the Video Origin Servers through the core network and Content Delivery Networks (CDN)...

chapter

Early Investigations into Using a Remote RAM Pool with the vl3 Visualization Framework

Dawid Zawislak, Brian Toonen, William Allcock, Silvio Rizzi, more

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV) > 23 - 28

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)

This paper discusses early efforts to integrate the RAN remote memory technology into the vl3 volume rendering framework. We successfully demonstrate this integration, achieving 73% of the theoretical hardware maximum with minimal variation.

chapter

Effective Use of Large High-Bandwidth Memory Caches in HPC Stencil Computation via Temporal Wave-Front Tiling

Charles Yount, Alejandro Duran

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) > 65 - 75

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

Stencil computation is an important class of algorithms used in a large variety of scientific-simulation applications. The performance of stencil calculations is often bounded by memory bandwidth. High-bandwidth memory (HBM) on devices such as those in the Intel® Xeon Phi™ ™200 processor family (code-named Knights Landing) can thus provide additional performance. In a traditional sequential time-step...

chapter

High bandwidth memory(HBM) with TSV technique

Jong Chern Lee, Jihwan Kim, Kyung Whan Kim, Young Jun Ku, more

2016 International SoC Design Conference (ISOCC) > 181 - 182

2016 International SoC Design Conference (ISOCC)

In this paper, HBM DRAM with TSV technique is introduced. This paper covers the general TSV feature and techniques such as TSV architecture, TSV reliability, TSV open / short test, and TSV repair. And HBM DRAM, representative DRAM product using TSV, is widely presented, especially the use and features.

chapter

Enabling technologies for memory compression: Metadata, mapping, and prediction

Arjun Deb, Paolo Faraboschi, Ali Shafiee, Naveen Muralimanohar, more

2016 IEEE 34th International Conference on Computer Design (ICCD) > 17 - 24

2016 IEEE 34th International Conference on Computer Design (ICCD)

Future systems dealing with big-data workloads will be severely constrained by the high performance and energy penalty imposed by data movement. This penalty can be reduced by storing datasets in DRAM or NVM main memory in compressed formats. Prior compressed memory systems have required significant changes to the operating system, thus limiting commercial viability. The first contribution of this...

Keywords:
BANDWIDTH
RANDOM ACCESS MEMORY

Publication date

Set your own date range

Keywords

MEMORY MANAGEMENT (145)
COMPUTER ARCHITECTURE (70)
HARDWARE (62)
SERVERS (51)
DRAM CHIPS (49)
FIELD PROGRAMMABLE GATE ARRAYS (45)
BENCHMARK TESTING (43)
PROGRAM PROCESSORS (43)
SYSTEM-ON-CHIP (43)
CLOCKS (41)
PERFORMANCE EVALUATION (39)
PARALLEL PROCESSING (36)
RESOURCE MANAGEMENT (36)
ARRAYS (35)
THROUGHPUT (35)
NONVOLATILE MEMORY (32)
SYSTEM-ON-A-CHIP (32)
OPTIMIZATION (31)
MEMORY ARCHITECTURE (30)
POWER DEMAND (30)
COMPUTATIONAL MODELING (29)
GRAPHICS PROCESSING UNITS (29)
STREAMING MEDIA (29)
VIDEO CODING (29)
INSTRUCTION SETS (28)
PROTOCOLS (28)
DECODING (26)
SOFTWARE (26)
DELAY (25)
ENCODING (25)
KERNEL (25)
REGISTERS (25)
ALGORITHM DESIGN AND ANALYSIS (24)
COMPUTERS (24)
FPGA (23)
THREE-DIMENSIONAL DISPLAYS (23)
TIMING (23)
MICROPROCESSORS (22)
MULTICORE PROCESSING (22)
ORGANIZATIONS (22)
ROUTING (22)
THROUGH-SILICON VIAS (22)
DRAM (21)
INTERFERENCE (20)
LOGIC GATES (20)
SWITCHES (20)
COMPLEXITY THEORY (19)
MOBILE COMMUNICATION (19)
PREFETCHING (19)
CLOUD COMPUTING (18)
DATA MINING (18)
BUFFER STORAGE (17)
MULTIPROCESSING SYSTEMS (17)
PIPELINES (17)
QUALITY OF SERVICE (17)
MATHEMATICAL MODEL (16)
MEMORY (16)
THREE DIMENSIONAL DISPLAYS (16)
INTEGRATED CIRCUIT INTERCONNECTIONS (15)
PROCESSOR SCHEDULING (15)
RELIABILITY (15)
SIGNAL PROCESSING (15)
SILICON (15)
TESTING (15)
VIRTUAL MACHINING (15)
DATA MODELS (14)
DELAYS (14)
LINUX (14)
MOTION ESTIMATION (14)
STORAGE MANAGEMENT (14)
TOPOLOGY (14)
CACHE STORAGE (13)
DEGRADATION (13)
EMBEDDED SYSTEMS (13)
PROCESS CONTROL (13)
SIMULATION (13)
STANDARDS (13)
SYNCHRONIZATION (13)
CMOS INTEGRATED CIRCUITS (12)
CONFERENCES (12)
ENGINES (12)
HEURISTIC ALGORITHMS (12)
INTERNET (12)
IP NETWORKS (12)
MEMORY BANDWIDTH (12)
SCHEDULING (12)
STACKING (12)
DATA TRANSFER (11)
IMAGE PROCESSING (11)
INDEXES (11)
INTEGRATED CIRCUIT DESIGN (11)
MICROPROCESSOR CHIPS (11)
MULTIMEDIA COMMUNICATION (11)
NETWORK-ON-CHIP (11)
PROPOSALS (11)
RADIATION DETECTORS (11)
SCALABILITY (11)
SIGNAL PROCESSING ALGORITHMS (11)
more

INFONA - science communication portal

Search results

Architecting an Energy-Efficient DRAM System for GPUs

Understanding and Optimizing Power Consumption in Memory Networks

Near-Optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources

Partial Row Activation for Low-Power DRAM System

SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization

Processing-in-Memory Enabled Graphics Processors for 3D Rendering

MCM-GPU: Multi-chip-module GPUs for continued performance scalability

Analyzes of the distributed system load with multifractal input data flows

Locality-aware bank partitioning for shared DRAM MPSoCs

BoDNoC: Providing bandwidth-on-demand interconnection for multi-granularity memory systems

ApproxPIM: Exploiting realistic 3D-stacked DRAM for energy-efficient processing in-memory

High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs

Generalization and Implementation of RAM-Based Key-Value Store

Small cache lookaside table for fast DRAM cache access

A low power lossy frame memory recompression algorithm

Hybrid model for validating performance of streaming video signals

Early Investigations into Using a Remote RAM Pool with the vl3 Visualization Framework

Effective Use of Large High-Bandwidth Memory Caches in HPC Stencil Computation via Temporal Wave-Front Tiling

High bandwidth memory(HBM) with TSV technique

Enabling technologies for memory compression: Metadata, mapping, and prediction

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options