The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The explosion of network bandwidth poses greatchallenges to data-plane flow processing. Due to the variable andpoor worst-case performance, naive hash table is incapable ofwire-speed processing. State-of-the-art schemes rely on multiplehash functions for enhanced load balancing to improve the worst-case performance. These schemes exploit the memory hierarchyand allocate compact on-chip data structures...
This paper presents a thorough analysis of energy consumption of a software HEVC decoder. The evaluation utilizes a framework developed herein specifically to estimate the energy consumption in all levels of cache hierarchies. Our framework is based on analytical models combined with memory profiling; tools. Energy analyses of several cache hierarchies executing HEVC decoding with different input...
The recent advent of stacked memory devices has led to a resurgence of researchassociated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices haveinspired research associated with eliminating the bandwidth bottlenecksassociated with many applications in high performance computing. Further, recent efforts have focused...
Graphs are used in a wide variety of applicationdomains, from social science to machine learning. Graphalgorithms present large numbers of irregular accesses with littledata reuse to amortize the high cost of memory accesses, requiring high memory bandwidth. Processing in memory (PIM) implemented through 3D die-stacking can deliver this highmemory bandwidth. In a system with multiple memory moduleswith...
Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of...
We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multi-bank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place...
One of the main challenges for embedded systems is the transfer of data between memory and processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations, while also allowing the execution of simple atomic instructions in the memory. However, the complex memory hierarchy still remains a bottleneck, especially...
Research on network virtualization has been active for a number of years, during which a number of virtual network embedding (VNE) approaches have been proposed. These approaches, however, neglect important operational requirements imposed by the underlying virtualization platforms. In the case of SDN/OpenFlow-based virtualization, a crucial example of an operational requirement is the availability...
Internet and mobile application have been the driving force for semiconductor innovation in the past 10 years. In this paper, we will focus on the system design challenges for today's and tomorrow's consumer gadgets from productivity laptop computers to wearable glasses. We will start with everyone's favorite apps such as finding the fastest route to a baseball game with Google maps, taking family...
Today is world of electronics gadget's which generates voluminous data that includes the various forms of multimedia data. In these data collections images need more space to store the content and need more bandwidth to transmit through network. By using Image compression the size of the image can be reduced which helps in less utilization of memory and transmitted across network with less bandwidth...
Since increasing demand for high bit-depth video places large demands upon resources, such as communication bandwidth as well as memory and storage capacity, research into improving the compression ratio (CR) for these videos is critically important. Most conventional video encoders are not amenable to high bit-depth format, so this paper presents novel preprocessing methods designed to improve CR...
We present a new hash function Argon2, which is oriented at protection of low-entropy secrets without secret keys. It requires a certain (but tunable) amount of memory, imposes prohibitive time-memory and computation-memory tradeoffs on memory-saving users, and is exceptionally fast on regular PC. Overall, it can provide ASIC-and botnet-resistance by filling the memory in 0.6 cycles per byte in the...
Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...
Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...
Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially...
One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss...
Prefetching significantly reduces the memory latencies of a wide range of applications and thus increases the system performance. However, as a speculative technique, prefetching may also noticeably increase the number of memory accesses, which in turns may negatively impact on the main memory bandwidth consumption, performance, and power. Main memory bandwidth consumption is a critical resource especially...
One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss...
Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...
Due to lack of sufficient compute threads in memory-intensive applications, GPUs often exhaust all the active warps and therefore, the memory latencies get exposed and appear in the critical path. In such a scenario, the shared on-chip and off-chip memory bandwidth appear more performance critical to cores with few or no active warps, in contrast to cores with sufficient active warps. In this work,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.