The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Scaling clusters is no longer the only struggle in moving towards exascale in HPC. While scaling components such as the network and file systems is a widely accepted need, monitoring, on the other hand, is often left behind in the procurement of these large systems. Monitoring is often quite an afterthought that is expected to be incorporated in existing infrastructure. While that often works for...
Recent rapid scale out of high performance computing systems has rapidly and continuously increased the scale and complexity of the interconnects. As a result, current static and over-provisioned interconnects are becoming cost-ineffective. Against this background, we have been working on the integration of network programmability into the interconnect control, based on the idea that dynamically controlling...
In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources,...
In this paper we present lo2s - a lightweight performance monitoring tool to sample applications as well as the executing system. It enables the user to analyze the performance of a parallel application without requiring the time-consuming and error-prone process of application instrumentation. The collected performance data is complemented with various metric data, i.e., perf counters, kernel tracepoints,...
In this work, we seek to gain an understanding of the InfiniBand network processing limitations that might exist in gathering performance metric information from InfiniBand switches using our new LDMS ibfabric sampler. The limitations studied consist of delays in gathering InfiniBand metric information from a specific switch device due to the switch's processor response delays or RDMA contention for...
This work evaluates performance variability in the Cray Aries dragonfly network and characterizes its impact on MPI Allreduce. The execution time of Allreduce is limited by the performance of the slowest participating process, which can vary by more than an order of magnitude. We utilize counters from the network routers to provide a better understanding of how competing workloads can influence performance...
Because data collection in HPC systems happens on the nodes and is easily related to the job running on the node, tools presenting the data and subsequent analyses to the user generally present them at the job level. Our position is that this is the wrong level of abstraction and thus limits the value of the analyses, often dissuading users from using any of the offered tools. In this paper we present...
A kernel or mini-app is a self-contained small application that retains certain characteristics of the original application [7]. Working on a kernel or mini-app in the place of the original application can dramatically reduce the resources and effort required for performing software tasks such as performance optimization and porting to new platforms. However, using kernel as a proxy is based on the...
Widely used benchmarks, such as High Performance Linpack (HPL), do not always provide direct insights are notoriously poor indicators of into the actual application performance of systems. When real applications are used, and there have been are criticisms indicating that the performance of simplified benchmarks such as HPL no longer strongly correlate to real application performance. In contrast,...
Modernizing production-grade, often legacy applications to take advantage of modern multi-core and many-core architectures can be a difficult and costly undertaking. This is especially true currently, as it is unclear which architectures will dominate future systems. The complexity of these codes can mean that parallelisation for a given architecture requires significant re-engineering. One way to...
Iterative sparse linear solvers are an important class of algorithm in high performance computing, and form a crucial component of many scientific codes. As intra and inter node parallelism continues to increase rapidly, the design of new, scalable solvers which can target next generation architectures becomes increasingly important. In this work we present TeaLeaf, a recent mini-app constructed to...
The arch project is a suite of mini-apps that have been developed with consistent coding practices, under a common infrastructural layer. Great emphasis has been placed on making the applications concise and easy to manipulate, while capturing the key performance characteristics of their proxied algorithmic classes. The suite is intended for traditional exploration of performance, portability and...
Approximate computing addresses many of the identified challenges for exascale computing, leading to performance improvements that may include changes in fidelity of calculation. In this paper, we examine approximate approaches for a range of DOE-relevant computational problems run on a variety of architectures as a proxy for the wider set of exascaleclass applications.We show anticipated improvements...
Like many other code teams, the developers of the Mercury Monte Carlo Transport code at Lawrence Livermore National Laboratory are being forced by the arrival of GPUbased supercomputers to substantially refactor their application to obtain acceptable performance on new architectures. This paper describes how we have designed, developed, and used Quicksilver, a proxy application for Mercury, to assist...
Irregular applications pose considerable challenges to modern computer systems, especially in distributed environments, where traditional high-performance networks are optimized for large message transfers. In this work, we analyze performance of an irregular application proxy benchmark running over traditional MPI/Infiniband as well as over the Data Vortex network, an emerging network architecture...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.