The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Provides an abstract for each of the invited presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings.
Modern GPUs embrace on-chip cache memory to exploit the locality present in applications. However, the behavior and effect of the cache on GPUs are different from those on conventional processors due to the Single Instruction Multiple Thread (SIMT) thread execution model and resulting memory access patterns. Previous studies report that caching data can hurt the performance due to increased memory...
Parallel applications are highly irregular and high performance computing (HPC) infrastructures are very complex. The HPC applications of interest herein are timestepping scientific applications (TSSA). Often, TSSA involve the repeated execution of multiple parallel loops with thousands of iterations and irregular behavior. Dynamic loop scheduling (DLS) techniques were developed over time and have...
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing interest, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dynamic loop scheduling (DLS) techniques are an effective approach towards performance improvement of scientific applications via load balancing. These techniques address algorithmic and...
Modern high performance computing (HPC) systems exhibit a rapid growth in size, both “horizontally” in the number of nodes, as well as “vertically” in the number of cores per node. As such, they offer additional levels of hardware parallelism. Each level requires and employs algorithms for appropriately scheduling the computational work at the respective level. The present work explores the relation...
We address the topology-aware job scheduling and placement problems on 3D torus-based high performance computing systems, with the objective of reducing system fragmentation. In our previous work, we proposed a job placement algorithm based on a local migration process, which aims at reducing the internal fragmentation due to using a convex prism shape for job allocation. However, HPC systems are...
Although cloud computing greatly utilises virtualised environments for applications to be executed efficiently in low-cost hosting, it has turned energy wasting and overconsumption issues into major concerns. Cloud infrastructure is built on a great amount of server equipment, including high performance computing (HPC), and the servers are naturally prone to failures.In this paper, we report on an...
Cloud Services have become an important part of the business world. Although public cloud services provide a range of advantages, it also creates concerns over security, network usage and the need for technical expertise in the management of the cloud. This paper describes a local private cloud which uses on-site hardware infrastructure and which incorporates a three tier management system. This device...
In this paper the authors compare the performance and scalability of the SHMEM and corresponding MPI-3 routines for five different benchmark tests using a Cray XC30. The performance of the MPI-3 get and put operations was evaluated using fence synchronization and also using lock-unlock synchronization. The five tests used communication patterns ranging from light to heavy data traffic: accessing distant...
Gyrokinetic modeling is appropriate for describing plasma turbulence in the core of Tokamaks, and the gyroaverage operator is a cornerstone of this approach. In a gyrokinetic code the gyroaveraging scheme needs to be accurate enough, but also requires a low computational cost because it is often applied on the main unknown, namely the 5D guiding-center distribution function, as well as on several...
This paper investigates the scalability of WRF (Weather Research and Forecast) model on three different platforms: BlueGene/P, Intel Xeon Cluster and Microsoft Azure cloud at different resolutions and domain sizes. Contrary to prior work we benchmark the model on a cloud platform, analyze the behavior of various individual configurations, and test the scalability of our previously proposed parallel...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.