The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present the direct performance measurements of two popular weather forecast models, Weather Research and Forecast Model (WRF) and Models for Predictions Across Scales (MPAS) on Intel's Knight Landing Platform (KNL). WRF is widely evaluated over different platforms while the benchmarks of MPAS are still scarce. In this study we measured the running time of WRF and MPAS on the QCT Developer Cloud,...
We give efficient algorithms to solve fundamental data movement problems on mesh-connected computers augmented with limited global bandwidth. Adding a small amount of global bandwidth makes a practical design that combines aspects of mesh and fully connected models to achieve the benefits of each. We give algorithms for sorting, finding the median, finding a spanning tree, and determining various...
Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a detailed understanding of the performance characteristics...
In the last decade, the scope of software optimizations expanded to encompass energy consumption on top of the classical runtime minimization objective. In that context, several optimizations have been developed to improve the software energy efficiency. However, these optimizations commonly rely on long profiling steps and are often implemented as unstable runtime systems, which limits their applicability...
Intelligent partitioning models are commonly used for efficient parallelization of irregular applications on distributed systems. These models usually aim to minimize a single communication cost metric, which is either related to communication volume or message count. However, both volume- and message-related metrics should be taken into account during partitioning for a more efficient parallelization...
A computational grid is a high performance computing system consisting of computer resources distributed over multiple locations and connected via computer network. One of many possible types of applications executed in computational grids is known as workflow applications. These applications consist of multiple computational tasks, which are precedence related, and usually process huge data files...
Synchronous iterative algorithms are often less scalable than asynchronous iterative ones. Performing large scale experiments with different kind of network parameters is not easy because with supercomputers such parameters are fixed. So, one solution consists in using simulations first in order to analyze what parameters could influence or not the behavior of an algorithm. In this paper, we show...
We consider the problem of communication avoidance in computing interactions between a set of particles in scenarios with and without a cutoff radius for interaction. Our strategy, which we show to be optimal in communication, divides the work in the iteration space rather than simply dividing the particles over processors, so more than one processor may be responsible for computing updates to a single...
Energy efficiency of computing devices has become a dominant area of research interest in recent years. Most previous work has focused on architectural techniques to improve power and energy efficiency, only a few consider saving energy at the algorithmic level. We prove that a region of perfect strong scaling in energy exists for matrix multiplication (classical and Strassen) and the direct n-body...
Data organization for matrices and arrays in memory has been extensively studied since the early 70's and until the mid 90's - the vector computers golden age. But this old SIMD model seems more topical than ever, as shown by the use of GPU in high performance computers or the architecture of the Nec SX-9. Such memory organization should then be considered again in order to access efficiently data...
We present a framework for efficient, physics-based computer simulation of complex time-dependent waveforms (i.e. wide-band, with large number of frequency components) in nonlinear amplifiers with memory. It is built upon a well established pseudo-spectral, multi-frequency, large-signal code and relies on an adaptive algorithm for signal splitting and splicing in the time domain. Included in the model,...
In this paper we study the execution of iterative applications on volatile processors such as those found on desktop grids. We develop master-worker scheduling schemes that attempt to achieve good trade-offs between worker speed and worker availability. A key feature of our approach is that we consider a communication model where the bandwidth capacity of the master for sending application data to...
We have previously documented the on-going work in the EUFORIA project to parallelise and optimise European fusion simulation codes, see. This involves working with a wide range of codes to try and address any performance and scaling issues that these codes have. However, as no two simulation codes are exactly the same, it is very hard to apply exactly the same approach to optimising a disparate range...
In large-scale cluster systems, interconnecting thousands of computing nodes increase the complexity of the network topology. Nevertheless, few existing computational models consider the impact of hierarchical communication latencies and bandwidths caused by the network complexity. In this paper we propose a new parallel computational model called LogGPH with a new parameter H incorporated into the...
Data-intensive applications are becoming increasingly common in Grid environments. These applications require enormous volume of data for the computation. Most conventional meta-scheduling approaches are aimed at computation intensive application and they do not take data requirement of the applications into account, thus leading to poor performance. Efficient scheduling of data-intensive applications...
Pipelined workflows are a popular programming paradigm for parallel applications. In these workflows, the computation is divided into several stages, and these stages are connected to each other through first-in first-out channels. In order to execute these workflows on a parallel machine, we must first determine the mapping of the stages onto the various processors on the machine. After finding the...
In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation...
In the last years, grid computing has emerged as a valuable service to solve complex computational problems in many scientific and industrial domains. Quality of Service (QoS) provision for these applications is therefore a key challenge for high speed Next Generation Networks and cross-layer mechanisms, enabling the development of network-aware grids, should be introduced. This paper takes into account,...
In this paper, we analyze restrictions of traditional models affecting the accuracy of analytical prediction of the execution time of collective communication operations. In particular, we show that the constant and variable contributions of processors and network are not fully separated in these models. Full separation of the contributions that have different nature and arise from different sources...
In this paper, we propose a resource broker, which providing a friendly interface for accessing available and appropriate resources via user credentials, is developed on a platform constructed by employing the Globus toolkit. This broker not only deploys a domain-based network information model and its dynamic version to measure network status by invoking Network Weather Service (NWS) on grid computing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.