The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Live migration of virtual machine has attracted significant attention in recent years. It facilitates system online maintenance, load balancing, fault tolerance and power management. Existing pre-copy live migration approach has to iteratively copy redundant memory pages, which causes high network overhead and slow migration. Another post-copy live migration approach can provide quick migration with...
The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...
Big data processing has been an increasingly important field which has attracted a lot of attention from academia and industry. However, it worsens the memory wall problem for processor design, which means a large performance gap between processor computation and memory access. The 3D stacked memory structure has been put forward as a promising method to relieve this problem. As non-volatile memory(NVM)...
Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...
Memory systems are critical to system responsivenessand operating costs. New memory technologies like PCM, STT-MRAM, RRAM are poised to provide an intermediatememory layer between DRAM and flash to better serve the needs of capacity, latency hungry datacenter applications. To drive their efficient deployment, it is imperative to make complex architectural decisions and justify the need to rethink...
Applications in modern data centers have a wide variety of resource requirements along the four main dimensions of computing, memory, storage, and networking. Data centers must manage these resources separately for each dimension, resulting in highly inefficient allocation of precious resources or even disastrous schemes that contribute to low utilization or over-provisioning of resources. However,...
The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...
Irregular applications, by their very nature, suffer from poor data locality. This often results in high miss rates for caches, and many long waits to off-chip memory. Historically, long latencies have been dealt with in two ways: (1) latency mitigation using large cache hierarchies, or (2) latency masking where threads relinquish their control after issuing a memory request. Multithreaded CPUs are...
Video-on-demand (VOD) system allows users to access media any time without the need to leave their home. Hard disk drive (HDD) has become popular for VOD storage to store and deal with a large amount of data. In practical cases, disk storage throughput is limited by a slow HDD and disk operating degrades drastically when the server needs to serve many simultaneous video streams. On the other hand,...
Data centers offer many services hosted on dedicated physical servers, which are often under-utilized in terms of resources used. Virtual machine placement goal maximizing the usage of available resources and saving of power being shut down some unused physical machines. After studying different Virtual Machine placement techniques in the data center, there is wastage of resources in Multi-dimensionality...
With the growth of cloud computing, security and privacy is becoming more and more important. Timing channel attack is one of the most remarkable security threads for memory controllers due to competition for shared resources. However, the existing protection strategies that ensure the deterministic of memory accesses by dividing bandwidth introduce great latency and performance degradation. This...
This paper describes our experience with storage optimization that utilizes cost-effective PCIe solid-state drives (SSDs) to improve the overall performance of a Spark framework. A key problem we address is the limited memory system performance. In particular, we adopt high-performance SSDs to alleviate the saturated DRAM bandwidth and its limited capacity. We utilize SSDs to store shuffle data and...
Despite the ability of modern processors to execute a variety of algorithms efficiently through instructions based on registers with ever-increasing widths, some applications present poor performance due to the limited interconnection bandwidth between main memory and processing units. Near-data processing has started to gain acceptance as an accelerator device due to the technology constraints and...
The Hybrid Memory Cube (HMC) is an innovative DRAM architecture that adopts 3D-stacking to improve bandwidth and save energy. An HMC module adopts separate receive and transmit lanes and thus may achieve the maximal memory bandwidth only if data can be driven at full speed in both directions. However, due to the natural read and write imbalance in modern applications, the effective memory bandwidth...
The Fast TracKer (FTK) to Level-2 Interface Card (FLIC) of the ATLAS FTK trigger upgrade is the final component in the FTK chain of custom electronics to connect the system to the High-Level trigger (HLT). The FTK performs full event tracking using the ATLAS Silicon detectors for every Level-1(L1) accepted event at 100 kHz. The FLIC is a custom Advanced Telecommunications Architecture (ATCA) card...
High Performance Computing (HPC) aggregates computing power in order to solve large and complex problems in different knowledge areas. Nowadays, HPC users can utilize virtualized infrastructures as a low-cost alternative to deploy their applications. However, virtualization brings some challenges for HPC, specially in regard to overhead caused by hyper visors. In this work, our main goal is to analyze...
The continuous request for higher storage density in Solid State Drives (SSD) is pushing the NAND-Flash technology to their reliability and performance limits. Among many memories technology candidates to replace them the Resistive RAM (RRAM) concept seems to emerge. However, before designing an entire SSD based on RRAM memory devices it must be performed a design space exploration of the disk features...
This paper presents an operating system managed die-stacked DRAM called i-MIRROR that mirrors high locality pages from off-chip DRAM. Optimizing the problems of reducing cache tag area, reducing transfer bandwidth and improving hit latency altogether while using die-stacked DRAM as hardware cache is extremely challenging. In this paper, we show that performance and energy efficiency can be obtained...
Die-stacked DRAM caches are likely to become available in mainstream chips in the near future. DRAM caches are typically used as a last level shared cache behind the traditional hierarchy of on-chip SRAM caches. However, its internal organization differs from traditional caches as it is based on DRAM technology that provides significantly diverse access latencies depending on the state of its internal...
Advances in die-stacking (3D) technology have enabled the tight integration of significant quantities of DRAM with high-performance computation logic. How to integrate this technology into the overall architecture of a computing system is an open question. While much recent effort has focused on hardware-based techniques for using die-stacked memory (e.g., caching), in this paper we explore what it...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.