Search results

Items from 41 to 60 out of 529 results

chapter

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 683 - 692

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional...

chapter

HBM (High Bandwidth Memory) DRAM Technology and Architecture

Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, more

2017 IEEE International Memory Workshop (IMW) > 1 - 4

2017 IEEE International Memory Workshop (IMW)

HBM (High Bandwidth Memory) is an emerging standard DRAM solution that can achieve breakthrough bandwidth of higher than 256GBps while reducing the power consumption as well. It has stacked DRAM architecture with core DRAM dies on top of a base logic die, based on the TSV and die stacking technologies. In this paper, the HBM architecture is introduced and a comparison of its generations is provided...

chapter

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications

Kavitha Chandrasekar, Xiang Ni, Laxmikant V. Kale

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1293 - 1300

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth...

chapter

In-memory processing paradigm for bitwise logic operations in STT-MRAM

W. Kang, L. Chang, Z. Wang, W. Zhao

2017 IEEE International Magnetics Conference (INTERMAG) > 1

2017 IEEE International Magnetics Conference (INTERMAG)

In current big data era, the limited data bandwidth (memory wall) between the processor and the memory becomes one of the most critical bottlenecks for conventional Von-Newman computer architecture.

chapter

Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs

Saumay Dublish, Vijay Nagarajan, Nigel Topham

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 239 - 248

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own bandwidth limitations in sustaining such high levels of memory traffic. In this paper, we characterize the bandwidth bottlenecks present across the memory...

chapter

An application-adaptive data allocation method for multi-channel memory

Takuya Toyoshima, Masayuki Sato, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) > 1 - 3

2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)

Modern memory systems are equipped with multiple channels to achieve a higher memory bandwidth. Since the multi-channel memory system focuses on achieving a high memory bandwidth, data are allocated to all the channels. Hence, when the memory system is accessed, all the channels are activated until the next DRAM refresh starts. Therefore, when executing compute-intensive applications that do not need...

chapter

Wide-I/O 3D-staked DRAM controller for near-data processing system

Yu-Hsuan Lin, Shih-Fan Peng, Wei Hwang

2017 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) > 1 - 4

2017 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)

Nowadays, big data becomes one of most popular topics in the world. Analyzing these data needs large amount of memory accessing. For the requests of multi users, the memory need high bandwidth and high density. The power of moving data also needs to be considered in the big data generation. High density 3D-Stacked DRAM is the potential solution for the big data storage. By applying the through-silicon-vias...

chapter

A Requests Bundling DRAM Controller for Mixed-Criticality Systems

Danlu Guo, Rodolfo Pellizzoni

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) > 247 - 258

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

We design a novel DRAM controller that bundles and executes memory requests of hard real-time applications in consecutive rounds based on their type to reduce read/write switching delay. At the same time, our controller provides a configurable, guaranteed bandwidth for soft real-time requests. We show that there is a fundamental trade-off between the latency guarantee for hard real-time requests and...

chapter

Paving the Way for Energy Efficient Cloud Data Centers: A Type-Aware Virtual Machine Placement Strategy

Auday Al-Dulaimy, Ahmed Zekri, Wassim Itani, Rached Zantout

2017 IEEE International Conference on Cloud Engineering (IC2E) > 5 - 8

2017 IEEE International Conference on Cloud Engineering (IC2E)

The rapid revolution of cloud computing model is accompanied by huge amounts of energy consumed by the cloud data centers. So, enhancing the energy efficiency of those data centers has become a major challenge. This paper tackles the problem of enhancing the energy consumption of cloud data centers by proposing a novel virtual machine placement strategy. The proposed strategy suits both static and...

chapter

Rethinking on-chip DRAM cache for simultaneous performance and energy optimization

Fazal Hameed, Jeronimo Castrillon

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 362 - 367

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization...

chapter

A cache-based bandwidth optimized motion compensation architecture for video decoder

Meng Li, Huizhu Jia, Xiaodong Xie, Jason Cong, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1303 - 1307

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In video decoder applications, motion compensation (MC) is bandwidth consuming because of the non-regular memory access. Especially with the popularity of UHD video and the development of new coding standard (HEVC), external memory bandwidth becomes a crucial bottleneck. In this paper, we propose an area efficiency cache-based bandwidth optimization strategy to minimize the memory bandwidth. First...

chapter

Design space exploration of FPGA accelerators for convolutional neural networks

Atul Rahman, Sangyun Oh, Jongeun Lee, Kiyoung Choi

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1147 - 1152

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The increasing use of machine learning algorithms, such as Convolutional Neural Networks (CNNs), makes the hardware accelerator approach very compelling. However the question of how to best design an accelerator for a given CNN has not been answered yet, even on a very fundamental level. This paper addresses that challenge, by providing a novel framework that can universally and accurately evaluate...

chapter

One approach how to reduce data transferring

Kanji Otsuka, Yoichi Sato, Fumiaki Fujii

2017 Pan Pacific Microelectronics Symposium (Pan Pacific) > 1 - 7

2017 Pan Pacific Microelectronics Symposium (Pan Pacific)

The most important element on IoT thought is communication bandwidth that is directly affected the data processing performance and communicating each other. The way for getting wider bandwidth involves three approaches which are high speed clocking, many lanes and high data compression. The first two issues relate with packaging technology^[10]. The system performance balance should put together the...

chapter

Methodology for design of optimum NOC based on IPG

Kulkarni Rashmi Manik, L. C. Siddanna Gowd

2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET) > 1 - 6

2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET)

High performance embedded applications are developed using system-on-chips (SoCs) which in turn include silicon intensive, integrated application processors. These SoCs integrate multi-core processor (i.e., ARM Cortex9 or A15) with variety of memory interface controllers, communication interface controllers and special purpose accelerators. Traditionally bus matrix is used for integrating these intellectual...

chapter

3.1 POWER9™: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4

Christopher Gonzalez, Eric Fluhr, Daniel Dreps, David Hogenmiller, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 50 - 51

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

Cognitive computing and cloud infrastructure require flexible, connectable, and scalable processors with extreme IO bandwidth. With 4 distinct chip configurations, the POWER9 family of chips delivers multiple options for memory ports, core thread counts, and accelerator options to address this need. The 24-core scale-out processor is implemented in 14nm SOI FinFET technology [1] and contains 8.0B...

chapter

23.9 An 8-channel 4.5Gb 180GB/s 18ns-row-latency RAM for the last level cache

Tah-Kang Joseph Ting, Gyh-Bin Wang, Ming-Hung Wang, Chun-Peng Wu, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 404 - 405

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

In recent years, the demand for memory performance has grown rapidly due to the increasing number of cores on a single CPU, along with the integration of graphics processing units and other accelerators. Caching has been a very effective way to relieve bandwidth demand and to reduce average memory latency. As shown by the cache feature table in Fig. 23.9.1, there is a big latency gap between SRAM...

chapter

23.8 A 1V 7.8mW 15.6Gb/s C-PHY transceiver using tri-level signaling for post-LPDDR4

Woojun Choi, Taewoong Kim, Jongjoo Shim, Hyungsoo Kim, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 402 - 403

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

Mobile DRAMs are essential to support memory-intensive operations for smartphones and tablet PCs [1, 2]. Since mobile DRAM standard (LPDDR), for the next generation, targets the speed specification of 51.2GB/s, its I/O interface demands high bandwidth, low power and high efficiency. Single-ended signaling has been used for LPDDR interfaces due to 100% pin efficiency. However, as the data rate increases...

chapter

23.1 An 8Gb 12Gb/s/pin GDDR5X DRAM for cost-effective high-performance applications

Martin Brox, Mani Balakrishnan, Martin Broschwitz, Cristian Chetreanu, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 388 - 389

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

Over the last years, GDDR5 has emerged as the dominant standard for applications requiring high system bandwidth like graphic cards and game consoles. However, GDDR5 data rates are saturating due to limitations in the clock frequency and column-access cycle time (t_CCD). To reach the data rate of 9Gb/s/pin [1], a GDDR5 DRAM has to be clocked at 2.25GHz and operate at a t_CCD of 888ps. This combination...

chapter

3.7 A 1920×1080 30fps 2.3TOPS/W stereo-depth processor for robust autonomous navigation

Ziyun Li, Qing Dong, Mehdi Saligane, Benjamin Kempke, more

2017 IEEE International Solid-State Circuits Conference (ISSCC) > 62 - 63

2017 IEEE International Solid- State Circuits Conference - (ISSCC)

Precise depth estimation is a key kernel function to realizing autonomous navigation on micro-aerial vehicles (MAVs). The state-of-the-art semi-global matching (SGM) algorithm has become favored for its high accuracy. In particular, it effectively handles low texture regions due to its global optimization of the disparity between a left and right image over the entire frame. However, SGM involves...

chapter

Cooperative Path-ORAM for Effective Memory Bandwidth Sharing in Server Settings

Rujia Wang, Youtao Zhang, Jun Yang

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 325 - 336

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Path ORAM (Oblivious RAM) is a recently proposed ORAM protocol for preventing information leakage from memory access sequences. It receives wide adoption due to its simplicity, practical efficiency and asymptotic efficiency. However, Path ORAM has extremely large memory bandwidth demand, leading to severe memory competition in server settings, e.g., a server may service one application that uses Path...

Keywords:
BANDWIDTH
RANDOM ACCESS MEMORY

Publication date

Set your own date range

Keywords

MEMORY MANAGEMENT (145)
COMPUTER ARCHITECTURE (70)
HARDWARE (62)
SERVERS (51)
DRAM CHIPS (49)
FIELD PROGRAMMABLE GATE ARRAYS (45)
BENCHMARK TESTING (43)
PROGRAM PROCESSORS (43)
SYSTEM-ON-CHIP (43)
CLOCKS (41)
PERFORMANCE EVALUATION (39)
PARALLEL PROCESSING (36)
RESOURCE MANAGEMENT (36)
ARRAYS (35)
THROUGHPUT (35)
NONVOLATILE MEMORY (32)
SYSTEM-ON-A-CHIP (32)
OPTIMIZATION (31)
MEMORY ARCHITECTURE (30)
POWER DEMAND (30)
COMPUTATIONAL MODELING (29)
GRAPHICS PROCESSING UNITS (29)
STREAMING MEDIA (29)
VIDEO CODING (29)
INSTRUCTION SETS (28)
PROTOCOLS (28)
DECODING (26)
SOFTWARE (26)
DELAY (25)
ENCODING (25)
KERNEL (25)
REGISTERS (25)
ALGORITHM DESIGN AND ANALYSIS (24)
COMPUTERS (24)
FPGA (23)
THREE-DIMENSIONAL DISPLAYS (23)
TIMING (23)
MICROPROCESSORS (22)
MULTICORE PROCESSING (22)
ORGANIZATIONS (22)
ROUTING (22)
THROUGH-SILICON VIAS (22)
DRAM (21)
INTERFERENCE (20)
LOGIC GATES (20)
SWITCHES (20)
COMPLEXITY THEORY (19)
MOBILE COMMUNICATION (19)
PREFETCHING (19)
CLOUD COMPUTING (18)
DATA MINING (18)
BUFFER STORAGE (17)
MULTIPROCESSING SYSTEMS (17)
PIPELINES (17)
QUALITY OF SERVICE (17)
MATHEMATICAL MODEL (16)
MEMORY (16)
THREE DIMENSIONAL DISPLAYS (16)
INTEGRATED CIRCUIT INTERCONNECTIONS (15)
PROCESSOR SCHEDULING (15)
RELIABILITY (15)
SIGNAL PROCESSING (15)
SILICON (15)
TESTING (15)
VIRTUAL MACHINING (15)
DATA MODELS (14)
DELAYS (14)
LINUX (14)
MOTION ESTIMATION (14)
STORAGE MANAGEMENT (14)
TOPOLOGY (14)
CACHE STORAGE (13)
DEGRADATION (13)
EMBEDDED SYSTEMS (13)
PROCESS CONTROL (13)
SIMULATION (13)
STANDARDS (13)
SYNCHRONIZATION (13)
CMOS INTEGRATED CIRCUITS (12)
CONFERENCES (12)
ENGINES (12)
HEURISTIC ALGORITHMS (12)
INTERNET (12)
IP NETWORKS (12)
MEMORY BANDWIDTH (12)
SCHEDULING (12)
STACKING (12)
DATA TRANSFER (11)
IMAGE PROCESSING (11)
INDEXES (11)
INTEGRATED CIRCUIT DESIGN (11)
MICROPROCESSOR CHIPS (11)
MULTIMEDIA COMMUNICATION (11)
NETWORK-ON-CHIP (11)
PROPOSALS (11)
RADIATION DETECTORS (11)
SCALABILITY (11)
SIGNAL PROCESSING ALGORITHMS (11)
more

INFONA - science communication portal

Search results

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

HBM (High Bandwidth Memory) DRAM Technology and Architecture

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications

In-memory processing paradigm for bitwise logic operations in STT-MRAM

Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs

An application-adaptive data allocation method for multi-channel memory

Wide-I/O 3D-staked DRAM controller for near-data processing system

A Requests Bundling DRAM Controller for Mixed-Criticality Systems

Paving the Way for Energy Efficient Cloud Data Centers: A Type-Aware Virtual Machine Placement Strategy

Rethinking on-chip DRAM cache for simultaneous performance and energy optimization

A cache-based bandwidth optimized motion compensation architecture for video decoder

Design space exploration of FPGA accelerators for convolutional neural networks

One approach how to reduce data transferring

Methodology for design of optimum NOC based on IPG

3.1 POWER9™: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4

23.9 An 8-channel 4.5Gb 180GB/s 18ns-row-latency RAM for the last level cache

23.8 A 1V 7.8mW 15.6Gb/s C-PHY transceiver using tri-level signaling for post-LPDDR4

23.1 An 8Gb 12Gb/s/pin GDDR5X DRAM for cost-effective high-performance applications

3.7 A 1920×1080 30fps 2.3TOPS/W stereo-depth processor for robust autonomous navigation

Cooperative Path-ORAM for Effective Memory Bandwidth Sharing in Server Settings

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options