Search results

chapter

Bandwidth-Greedy Hashing for Massive-Scale Concurrent Flows

Tian Pan, Bin Liu, Xiaoyu Guo, Yang Li, more

2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) > 659 - 668

2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS)

The explosion of network bandwidth poses greatchallenges to data-plane flow processing. Due to the variable andpoor worst-case performance, naive hash table is incapable ofwire-speed processing. State-of-the-art schemes rely on multiplehash functions for enhanced load balancing to improve the worst-case performance. These schemes exploit the memory hierarchyand allocate compact on-chip data structures...

chapter

Energy-aware cache assessment of HEVC decoding

Eduarda Monteiro, Mateus Grellert, Sergio Bampi, Bruno Zatt

2016 IEEE International Symposium on Circuits and Systems (ISCAS) > 574 - 577

2016 IEEE International Symposium on Circuits and Systems (ISCAS)

This paper presents a thorough analysis of energy consumption of a software HEVC decoder. The evaluation utilizes a framework developed herein specifically to estimate the energy consumption in all levels of cache hierarchies. Our framework is based on analytical models combined with memory profiling; tools. Energy analyses of several cache hierarchies executing HEVC decoding with different input...

chapter

HMC-Sim-2.0: A Simulation Platform for Exploring Custom Memory Cube Operations

John D. Leidel, Yong Chen

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 621 - 630

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The recent advent of stacked memory devices has led to a resurgence of researchassociated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices haveinspired research associated with eliminating the bandwidth bottlenecksassociated with many applications in high performance computing. Further, recent efforts have focused...

chapter

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory

Paula Aguilera, Dong Ping Zhang, Nam Sung Kim, Nuwan Jayasena

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 489 - 498

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Graphs are used in a wide variety of applicationdomains, from social science to machine learning. Graphalgorithms present large numbers of irregular accesses with littledata reuse to amortize the high cost of memory accesses, requiring high memory bandwidth. Processing in memory (PIM) implemented through 3D die-stacking can deliver this highmemory bandwidth. In a system with multiple memory moduleswith...

chapter

Exploring specialized near-memory processing for data intensive operations

Salessawi Ferede Yitbarek, Tao Yang, Reetuparna Das, Todd Austin

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1449 - 1452

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of...

chapter

Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic

Jinho Lee, Jung Ho Ahn, Kiyoung Choi

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1243 - 1248

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multi-bank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place...

chapter

Large vector extensions inside the HMC

Marco A. Z. Alves, Matthias Diener, Paulo C. Santos, Luigi Carro

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1249 - 1254

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

One of the main challenges for embedded systems is the transfer of data between memory and processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations, while also allowing the execution of simple atomic instructions in the memory. However, the complex memory hierarchy still remains a bottleneck, especially...

chapter

Virtual network embedding in software-defined networks

Leonardo Richter Bays, Luciano Paschoal Gaspary, Reaz Ahmed, Raouf Boutaba

NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium > 10 - 18

NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium

Research on network virtualization has been active for a number of years, during which a number of virtual network embedding (VNE) approaches have been proposed. These approaches, however, neglect important operational requirements imposed by the underlying virtualization platforms. In the case of SDN/OpenFlow-based virtualization, a crucial example of an operational requirement is the availability...

chapter

System design challenges for future consumer devices: From glass to Chromebooks

Eric Shiu, James Ko

2016 International Conference on Electronics Packaging (ICEP) > 1 - 5

2016 International Conference on Electronics Packaging (ICEP)

Internet and mobile application have been the driving force for semiconductor innovation in the past 10 years. In this paper, we will focus on the system design challenges for today's and tomorrow's consumer gadgets from productivity laptop computers to wearable glasses. We will start with everyone's favorite apps such as finding the fastest route to a baseball game with Google maps, taking family...

chapter

Block based prediction with Modified Hierarchical Prediction image coding scheme for Lossless color image compression

S. Sathappan, P.Suresh Babu

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) > 2408 - 2411

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)

Today is world of electronics gadget's which generates voluminous data that includes the various forms of multimedia data. In these data collections images need more space to store the content and need more bandwidth to transmit through network. By using Image compression the size of the image can be reduced which helps in less utilization of memory and transmitted across network with less bandwidth...

chapter

Improving compression ratios for high bit-depth grayscale video formats

An Ho, Alan George, Ann Gordon-Ross

2016 IEEE Aerospace Conference > 1 - 9

2016 IEEE Aerospace Conference

Since increasing demand for high bit-depth video places large demands upon resources, such as communication bandwidth as well as memory and storage capacity, research into improving the compression ratio (CR) for these videos is critically important. Most conventional video encoders are not amenable to high bit-depth format, so this paper presents novel preprocessing methods designed to improve CR...

chapter

Argon2: New Generation of Memory-Hard Functions for Password Hashing and Other Applications

Alex Biryukov, Daniel Dinu, Dmitry Khovratovich

2016 IEEE European Symposium on Security and Privacy (EuroS&P) > 292 - 302

2016 IEEE European Symposium on Security and Privacy (EuroS&P)

We present a new hash function Argon2, which is oriented at protection of low-entropy secrets without secret keys. It requires a certain (but tunable) amount of memory, imposes prohibitive time-memory and computation-memory tradeoffs on memory-saving users, and is exceptionally fast on regular PC. Overall, it can provide ASIC-and botnet-resistance by filling the memory in 0.6 cycles per byte in the...

chapter

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...

chapter

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...

chapter

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors

Vicent Selfa, Crispin Gomez, Maria E. Gomez, Julio Sahuquillo

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 143 - 150

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially...

chapter

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency

Paulo C. Santos, Marco A. Z. Alves, Matthias Diener, Luigi Carro, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 388 - 392

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss...

chapter

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors

Vicent Selfa, Crispin Gomez, Maria E. Gomez, Julio Sahuquillo

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 143 - 150

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially...

chapter

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency

Paulo C. Santos, Marco A. Z. Alves, Matthias Diener, Luigi Carro, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 388 - 392

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss...

chapter

Scheduling techniques for GPU architectures with processing-in-memory capabilities

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, more

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 31 - 44

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...

chapter

Student research poster: Slack-aware shared bandwidth management in GPUs

Saumay Dublish

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 451 - 452

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Due to lack of sufficient compute threads in memory-intensive applications, GPUs often exhaust all the active warps and therefore, the memory latencies get exposed and appear in the critical path. In such a scenario, the shared on-chip and off-chip memory bandwidth appear more performance critical to cores with few or no active warps, in contrast to cores with sufficient active warps. In this work,...

INFONA - science communication portal

Search results

Bandwidth-Greedy Hashing for Massive-Scale Concurrent Flows

Energy-aware cache assessment of HEVC decoding

HMC-Sim-2.0: A Simulation Platform for Exploring Custom Memory Cube Operations

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory

Exploring specialized near-memory processing for data intensive operations

Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic

Large vector extensions inside the HMC

Virtual network embedding in software-defined networks

System design challenges for future consumer devices: From glass to Chromebooks

Block based prediction with Modified Hierarchical Prediction image coding scheme for Lossless color image compression

Improving compression ratios for high bit-depth grayscale video formats

Argon2: New Generation of Memory-Hard Functions for Password Hashing and Other Applications

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency

Scheduling techniques for GPU architectures with processing-in-memory capabilities

Student research poster: Slack-aware shared bandwidth management in GPUs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options