The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional...
HBM (High Bandwidth Memory) is an emerging standard DRAM solution that can achieve breakthrough bandwidth of higher than 256GBps while reducing the power consumption as well. It has stacked DRAM architecture with core DRAM dies on top of a base logic die, based on the TSV and die stacking technologies. In this paper, the HBM architecture is introduced and a comparison of its generations is provided...
Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth...
In current big data era, the limited data bandwidth (memory wall) between the processor and the memory becomes one of the most critical bottlenecks for conventional Von-Newman computer architecture.
GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own bandwidth limitations in sustaining such high levels of memory traffic. In this paper, we characterize the bandwidth bottlenecks present across the memory...
Modern memory systems are equipped with multiple channels to achieve a higher memory bandwidth. Since the multi-channel memory system focuses on achieving a high memory bandwidth, data are allocated to all the channels. Hence, when the memory system is accessed, all the channels are activated until the next DRAM refresh starts. Therefore, when executing compute-intensive applications that do not need...
Nowadays, big data becomes one of most popular topics in the world. Analyzing these data needs large amount of memory accessing. For the requests of multi users, the memory need high bandwidth and high density. The power of moving data also needs to be considered in the big data generation. High density 3D-Stacked DRAM is the potential solution for the big data storage. By applying the through-silicon-vias...
We design a novel DRAM controller that bundles and executes memory requests of hard real-time applications in consecutive rounds based on their type to reduce read/write switching delay. At the same time, our controller provides a configurable, guaranteed bandwidth for soft real-time requests. We show that there is a fundamental trade-off between the latency guarantee for hard real-time requests and...
The rapid revolution of cloud computing model is accompanied by huge amounts of energy consumed by the cloud data centers. So, enhancing the energy efficiency of those data centers has become a major challenge. This paper tackles the problem of enhancing the energy consumption of cloud data centers by proposing a novel virtual machine placement strategy. The proposed strategy suits both static and...
State-of-the-art DRAM cache employs a small Tag-Cache and its performance is dependent upon two important parameters namely bank-level-parallelism and Tag-Cache hit rate. These parameters depend upon the row buffer organization. Recently, it has been shown that a small row buffer organization delivers better performance via improved bank-level-parallelism than the traditional large row buffer organization...
In video decoder applications, motion compensation (MC) is bandwidth consuming because of the non-regular memory access. Especially with the popularity of UHD video and the development of new coding standard (HEVC), external memory bandwidth becomes a crucial bottleneck. In this paper, we propose an area efficiency cache-based bandwidth optimization strategy to minimize the memory bandwidth. First...
The increasing use of machine learning algorithms, such as Convolutional Neural Networks (CNNs), makes the hardware accelerator approach very compelling. However the question of how to best design an accelerator for a given CNN has not been answered yet, even on a very fundamental level. This paper addresses that challenge, by providing a novel framework that can universally and accurately evaluate...
The most important element on IoT thought is communication bandwidth that is directly affected the data processing performance and communicating each other. The way for getting wider bandwidth involves three approaches which are high speed clocking, many lanes and high data compression. The first two issues relate with packaging technology[10]. The system performance balance should put together the...
High performance embedded applications are developed using system-on-chips (SoCs) which in turn include silicon intensive, integrated application processors. These SoCs integrate multi-core processor (i.e., ARM Cortex9 or A15) with variety of memory interface controllers, communication interface controllers and special purpose accelerators. Traditionally bus matrix is used for integrating these intellectual...
Cognitive computing and cloud infrastructure require flexible, connectable, and scalable processors with extreme IO bandwidth. With 4 distinct chip configurations, the POWER9 family of chips delivers multiple options for memory ports, core thread counts, and accelerator options to address this need. The 24-core scale-out processor is implemented in 14nm SOI FinFET technology [1] and contains 8.0B...
In recent years, the demand for memory performance has grown rapidly due to the increasing number of cores on a single CPU, along with the integration of graphics processing units and other accelerators. Caching has been a very effective way to relieve bandwidth demand and to reduce average memory latency. As shown by the cache feature table in Fig. 23.9.1, there is a big latency gap between SRAM...
Mobile DRAMs are essential to support memory-intensive operations for smartphones and tablet PCs [1, 2]. Since mobile DRAM standard (LPDDR), for the next generation, targets the speed specification of 51.2GB/s, its I/O interface demands high bandwidth, low power and high efficiency. Single-ended signaling has been used for LPDDR interfaces due to 100% pin efficiency. However, as the data rate increases...
Over the last years, GDDR5 has emerged as the dominant standard for applications requiring high system bandwidth like graphic cards and game consoles. However, GDDR5 data rates are saturating due to limitations in the clock frequency and column-access cycle time (tCCD). To reach the data rate of 9Gb/s/pin [1], a GDDR5 DRAM has to be clocked at 2.25GHz and operate at a tCCD of 888ps. This combination...
Precise depth estimation is a key kernel function to realizing autonomous navigation on micro-aerial vehicles (MAVs). The state-of-the-art semi-global matching (SGM) algorithm has become favored for its high accuracy. In particular, it effectively handles low texture regions due to its global optimization of the disparity between a left and right image over the entire frame. However, SGM involves...
Path ORAM (Oblivious RAM) is a recently proposed ORAM protocol for preventing information leakage from memory access sequences. It receives wide adoption due to its simplicity, practical efficiency and asymptotic efficiency. However, Path ORAM has extremely large memory bandwidth demand, leading to severe memory competition in server settings, e.g., a server may service one application that uses Path...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.