The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A new page swap protocol is proposed for a user-level remote memory paging system to accelerate the performance of out-of-core processing with multi-thread user programs and libraries written in OpenMP and pthread. The original swap protocol has a bottle-neck in efficient page swapping which is requested by multiple threads in a user program, because all MPI communications to memory servers and page...
A remote memory paging system called a distributed large memory (DLM) has been developed, which uses remote-node memories in a cluster, as the main memory extension of a local node. The DLM is available for out-of-core processing, i.e., processing of large-size data that exceeds the main memory capacity in the local node. By using the DLM and memory servers, it is possible to run multi-thread programs...
This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very different from existing algorithms designed for memory-and-cache, host-and-GPU, and local-and-remote...
This paper proposes a new scheme for solving data size requirements for a large-scale stencil computation, which are greater than the total size of the main memories of nodes in a cluster. It utilizes distributed flash SSDs over cluster nodes as an extension to the main memory with a locality-aware algorithm. Three algorithms with a different hierarchical blocking scheme for three memory tiers, namely,...
This paper proposes the auto-tuning system designed for flash-based out-of-core stencil computations. Blk-Tune is a runtime blocking parameter auto-tuning system that enables the use of flash memory as an extension of main memory. It incorporates automatic hardware information retrieval using Portable Hardware Locality and minimizes the amount of data transferred between the flash device and DRAM,...
This paper investigates the performance of flash solid state drives (SSDs) as an extension to main memory with a locality-aware algorithm for stencil computations. We propose three different configurations, swap, m map, and aio, for accessing the flash media, with data structure blocking techniques. Our results indicate that hierarchical blocking optimizations for three tiers, flash SSD, DRAM, and...
This paper investigates the potential of flash as a large and slow memory behind dynamic random-access memory (DRAM) for stencil computation, which is one of the most common and important computation kernels in various scientific and engineering simulations. We evaluate the performance of a fastswap kernel, which was recently incorporated into Linux, in stencil computation using flash as a swap device...
The new page swap mechanism is introduced to resolve an inconsistent page problem for multithreaded applications in user-level remote paging systems. According to the evaluations, its overhead is limited and it can be applicable to actual use for multithreaded applications.
An automatic adaptive page size control methodology is proposed for remote memory paging. It estimates a working data set and changes page size dynamically and adaptively to each processing part of an application during it is running. It is highly effective to prevent memory server thrashing when the size of local memory is limited.
Prevailing 64bit-OS enables us to use a large memory address space in computer programming general. However, the actual physical memory becomes the limitation in utilizing it fully. When a program requires more memory than available physical memory in a computer, a traditional virtual memory system performs the page swap between a local hard disk and physical memory. Here, with the recent development...
The Distributed Large Memory system, DLM, was designed to provide a larger size of memory beyond that of local physical memory by using remote memory distributed over cluster nodes. The original DLM adopted a low cost page replacement algorithm which selects an evicted page in address order. In the DLM, the remote page swapping is the most critical in performance. For more efficient swap-out page...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.