The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Accelerated clusters, which are distributed memory systems equipped with accelerators, have been used in various fields. For accelerated clusters, programmers often implement their applications by a combination of MPI and CUDA (MPI+CUDA). However, the approach faces programming complexity issues. This paper introduces the XcalableACC (XACC) language, which is a hybrid model of XcalableMP (XMP) and...
OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult to implement directives, is the cache directive. The cache directive allows the programmer to utilize accelerator's hardware- or software-managed...
In this paper, aiming at realizing directive-based temporal blocking for out-of-core stencil computation, we present an extension of OpenACC directives and a source-to-source translator capable of accelerating out-of-core stencil computation on a graphics processing unit (GPU). Out-of-core stencil computation here deals with large data that cannot be entirely stored in GPU memory. Given an OpenACC-like...
This paper presents a source-to-source OpenACC optimizer that automatically optimizes a histogram computation code for a graphics processing unit (GPU). Parallel histogram computation codes typically deploy multiple copies of histograms and update them with atomic operations. This duplication method can be implemented as an OpenACC code. However, the structure of sequential code blocks must be manually...
This paper presents a source-to-source OpenACC optimizer that automatically optimizes a histogram computation code for a graphics processing unit (GPU). Parallel histogram computation codes typically deploy multiple copies of histograms and update them with atomic operations. This duplication method can be implemented as an OpenACC code. However, the structure of sequential code blocks must be manually...
Unpredictable power outages in NAND flashbased Solid State Drives (SSDs) may cause system failure or reliability problems. Capacitors are widely adopted as the interim power supplier when power interruption happens. However, since the energy provided by backup capacitors is limited, and the capacitance of a capacitor will gradually degrade with time, it is imperative to improve the efficiency and...
Program disturb is a major issue limiting the functionality of hot carrier programmed flash memories. This paper reports a detailed characterization of program disturb in a split-gate flash memory cell using source side injection programming. Key parameters influencing the cell's disturb sensitivity have been investigated, empirical models have been developed and a physical root cause has been identified...
Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL,...
We show a 90nm nanocrystal-based split gate embedded flash memory that is able to meet the speed, endurance and reliability requirements for 32-bit microcontroller products. A 3.4V operating window is achievable and the process is robust and repeatable across many lots. Erase after 10k cycles can be achieved in 5ms, long-term data retention of cycled arrays is not susceptible to SILC-induced charge...
Graphics processing units (GPUs) have emerged as a powerful platform for high-performance computation. They have been successfully used to accelerate many scientific workloads. Typically, the computationally intensive parts of the application are offloaded to the GPU, which serves as the CPU's parallel coprocessor. The key to effective utilization of GPUs for scientific computing is the design and...
Phase Change Memory (PCM) has emerged as an attractive candidate for next-generation non-volatile memory devices. For these applications, reliability is determined by the ability to retain the state of data in the device and support a specified number of re-writes without failure. In PCM technologies, retention is limited by the meta-stable amorphous state of the cell. For cycling endurance (re-writes),...
The reliability of advanced embedded non-volatile memories has been discussed using the 2T-FNFN devices example. The write/erase endurance and the data retention are the most important reliability parameters. The intrinsic reliability mechanisms can be addressed through single cell evaluation, while the cell-to-cell variation determines the product level reliability. The cell-to-cell variation can...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.