The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts...
Upcoming high-performance computing (HPC) platforms will have more complex memory hierarchies with high-bandwidth on-package memory and in the future also non-volatile memory. How to use such deep memory hierarchies effectively remains an open research question. In this paper we evaluate the performance implications of a scheme based on a software-managed scratchpad with coarse-grained memory-copy...
The Argo project is a DOE initiative for designing a modular operating system/runtime for the next generation of supercomputers. A key focus area in this project is power management, which is one of the main challenges on the path to exascale. In this paper, we discuss ideas for systemwide power management in the Argo project. We present a hierarchical and scalable approach to maintain a power bound...
Resilience is an important challenge for extreme-scale supercomputers. Today, failures in supercomputers are assumed to be uniformly distributed in time. However, recent studies show that failures in high-performance computing systems are partially correlated in time, generating periods of higher failure density. Our study of the failure logs of multiple supercomputers show that periods of higher...
This paper presents an extension of a decidable fragment of Separation Logic for singly-linked lists, defined by Berdine, Calcagno and O’Hearn [8]. Our main extension consists in introducing atomic formulae of the form lsk(x, y) describing a list segment of length k, stretching from x to y, where k is a logical variable interpreted over positive natural numbers,...
Resilience is an important challenge for extreme-scale supercomputers. Failures in current supercomputers are assumed to be uniformly distributed in time. However, recent studies show that failures in high-performance computing systems are partially correlated in time, generating periods of higher failure density. The detection of those periods is important in order to adjust the system to new conditions...
Future exascale systems will impose several conflicting challenges on the operating system (OS) running on the compute nodes of such machines. On the one hand, the targeted extreme scale requires the kind of high resource usage efficiency that is best provided by lightweight OSes. At the same time, substantial changes in hardware are expected for exascale systems. Compute nodes are expected to host...
Work stealing is a popular solution to perform dynamic load balancing of irregular computations, both for shared memory and distributed memory systems. While shared memory performance of work stealing is well understood, distributing this algorithm to several thousands of nodes can introduce new performance issues. In particular, most studies of work stealing assume that all participating processes...
In this article we present KRASH, a tool for reproducible generation of system-level CPU load. This tool is intended for use in shared memory machines equipped with multiple CPU cores which are usually exploited concurrently by several users. The objective of KRASH is to enable parallel application developers to validate their resources use strategies on a partially loaded machine by replaying an...
This paper presents an extension of a decidable fragment of Separation Logic for singly-linked lists, defined by Berdine et al. (2004). Our main extension consists in introducing atomic formulae of the form lsk(x, y) describing a list segment of length k, stretching from x to y, where k is a logical variable interpreted over positive natural numbers, that may occur further inside Presburger constraints...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.