The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Performance tuning is an ongoing activity at most HPC sites. Small performance improvements can save thousands of dollars. Run-to-run performance variations significantly impact performance tuning. Not being able to tell which code version is faster (or more energy efficient) in a single run greatly increases the computational expense and uncertainty for theprogrammer. We will show examples where...
Most studies of Best-Effort HTM (BE-HTM) performance use a single serialization manager and a single parameter value across all benchmarks, inputs and thread counts. The experimental study in this paper indicates that the values chosen for serialization-manager parameters have a significant effect on performance in the Blue Gene/Q's (BG/Q) BE-HTM system. Moreover, for a given serialization manager,...
OpenCL is now available on a very large set of processors. This makes this language an attractive layer to address multiple targets with a single code base. The question on how sensitive to the underlying hardware is the OpenCL code in practice remains to be better understood. 1
While the correctness of an NVIDIA CUDA program is easy to achieve, exploiting the GPU capabilities to obtain the best performance possible is a task for CUDA experienced programmers. Typical code tuning strategies, like choosing an appropriate size and shape for the thread-blocks, programming a good coalescing, or maximize occupancy, are inter-dependent. Moreover, the choices are also dependent on...
The speed of the memory subsystem often constrains the performance of large-scale parallel applications. Experts tune such applications to use hierarchical memory subsystems efficiently. Hardware accelerators, such as GPUs, can potentially improve memory performance beyond the capabilities of traditional hierarchical systems. However, the addition of such specialized hardware complicates code porting...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.