The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monolithic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this...
Most server-grade systems provide Chipkill-Correct error protection at the expense of power and performance. In this paper we present a low overhead solution to improving the reliability of commodity DRAM systems with no change in the existing memory architecture. Specifically, we propose five erasure and error correction (E-ECC) schemes that provide at least Chipkill-Correct protection for x4 (Schemes...
Modern graphic processing units (GPUs) are not only able to perform graphics rendering, but also perform general purpose parallel computations (GPGPUs). It has been shown that the GPU L1 data cache and the on chip interconnect bandwidth are important sources of performance bottlenecks and inefficiencies in GPGPUs. Through this work, we aim to understand the sources of inefficiencies and possible opportunities...
The availability of a wide range of general purpose as well as accelerator cores on modern smart phones means that a significant number of applications can be executed on a smart phone simultaneously, resulting in an ever increasing demand on the memory subsystem. While the increased computation capability is intended for improving user experience, memory requests from each concurrent application...
The ubiquity of graphics processing unit (GPU) architectures has made them efficient alternatives to chipmultiprocessors for parallel workloads. GPUs achieve superior performance by making use of massive multi-threading and fast context-switching to hide pipeline stalls and memory access latency. However, recent characterization results have shown that general purpose GPU (GPGPU) applications commonly...
To mitigate the significant main memory access latency in modern chip multiprocessors, multi-level on-chip caches are used to bridge the gap by retaining frequently used data closer to the processor cores. Such dependence on the last-level cache (LLC) has motivated numerous innovations in cache management schemes. However, most prior works focus their efforts on optimizing cache miss counts experienced...
There has recently been considerable interest in connectivity analysis of fMRI and scalp and intracranial EEG time-series. The computational requirements of the pair-wise correlation (PWC), the core time-series measure used to estimate connectivity, presents a challenge to the real-time estimation of the PWC between all pairs of multiple time-series. We describe a parallel algorithm for computing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.