The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As the size of new supercomputers scales to tens of thousands of sockets, the mean time between failures (MTBF) is decreasing to just several hours and long executions need some kind of fault tolerance method to survive failures. Checkpoint\Restart is a popular technique used for this purpose; but writing the state of a big scientific application to remote storage will become prohibitively expensive...
Regional weather forecasting demands fast simulation over fine-grained grids, resulting in extremely memory- bottlenecked computation, a difficult problem on conventional supercomputers. Early work on accelerating mainstream weather code WRF using GPUs with their high memory performance, however, resulted in only minor speedup due to partial GPU porting of the huge code. Our full CUDA porting of the...
We present a statistical approach for estimating power consumption of GPU kernels. We use the GPU performance counters that are exposed for CUDA applications, and train a linear regression model where performance counters are used as independent variables and power consumption is the dependent variable. For model training and evaluation, we use publicly available CUDA applications, consisting of 49...
We discuss hardware and software aspects of GPGPU, specifically focusing on NVIDIA cards and CUDA, from the viewpoints of parallel computing. The major weak points of GPU against newest supercomputers are identified to be and summarized as only four points: large SIMD vector length, small memory, absence of fast L2 cache, and high register spill penalty. As software concerns, we derive optimal scheduling...
Most GPU performance ldquohypesrdquo have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being...
In this paper, we introduce a new fast Fourier transform (FFT) library. In developing this software, we focus on the efficient execution of the floating-point operation instructions. To achieve high performance on various processors, we provide the source code which compilers can optimize easily. Since the compilers provided by processor vendors have powerful optimizers for loop sentences, the code...
This paper presents a novel application framework named simple interface for library collections (SILC) that allows users to make use of matrix computation libraries in a flexible and language-independent manner. Using SILC, various computing environments as well as alternative solvers and matrix storage formats from different libraries can be easily utilized. The present paper describes the design...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.