The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Graph models of social information systems typically contain trillions of edges. Such big graphs cannot beprocessed on a single machine. The graph object must bepartitioned and distributed among machines and processedin parallel on a computer cluster. Programming such systemsis very challenging. In this work, we present DH-Falcon, a graph DSL (domain-specific language) which can be usedto implement...
Failure tolerant data encoding and storage is of paramount importance for data centers, supercomputers, data transfers, and many aspects of information technology. Reed-Solomon failure erasure codes and their variants are the basis for many applications in this field. Efficient implementation of these codes is challenging because they require computations in Galois fields, which are not supported...
Checkpoint/restart has been widely used to cope with fail-stop errors. The checkpointing frequency is most often optimized by assuming an exponential failure distribution. However, field studies show that most often failures do not follow a constant failure rate exponential distribution. Therefore, the optimal checkpointing frequency should be computed and tuned considering the different distributions...
Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness...
We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application...
In this research we describe the development and optimisation of a new Monte Carlo neutral particle transport mini-app, neutral. In spite of the success of previous research efforts to load balance the algorithm at scale, it is not clear how to take advantage of the diverse architectures being installed in the newest supercomputers. We explore different algorithmic approaches, and perform extensive...
Tasks coupled in an in situ workflow may not process data at the same speed, potentially causing overflows in the communication channel between them. To prevent this problem, software infrastructures for in situ workflows usually impose a strict FIFO policy that has the side-effect of slowing down faster tasks to the speed of the slower ones. This may not be the desired behavior; for example, a scientist...
Markov Chain Monte Carlo methods provide a tool for tackling high dimensional problems. With many-core systems readily available today, it is no surprise that leveraging parallelism in these samplers has been a subject of recent research. The focus has been on solutions for shared-memory architectures, however these perform poorly in a distributed-memory environment. This paper introduces a fully...
Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation...
Job runtime estimates provided by users are widely acknowledged to be overestimated and runtime overestimation can greatly degrade job scheduling performance. Previous studies focus on improving accuracy of job runtime estimates by reducing runtime overestimation, but fail to address the underestimation problem (i.e., the underestimation of job runtimes). Using an underestimated runtime is catastrophic...
High performance computing systems will need to operate with certain power budgets while maximizing performance in the exascale era. Such systems are built with power aware components, whose collective peak power may exceed the specified power budget. Cluster level power bounded computing addresses this power challenge by coordinating power among components within compute nodes and further adjusting...
The need for parallel task execution has been steadily growing in recent years since manufacturers mainly improve processor performance by scaling the number of installed cores instead of the frequency of processors. To make use of this potential, an essential technique to increase the parallelism of a program is to parallelize loops. However, a main restriction of available tools for automatic loop...
Virtual machine (VM) consolidation is necessary for increasing the server utilization; however, it also leads to VM performance degradation. This work presents a method to predict the consolidated VMs performance from the critical system events data. Experiments are designed to demonstrate the effect of system events like interrupts, page faults, mutex operations, and context switching on the consolidated...
As the memory and storage hierarchy get deeper and more complex, it is important to have new benchmarks and evaluation tools that allow us to explore the emerging middleware solutions to use this hierarchy. Skel is a tool aimed at automating and refining this process of studying HPC I/O performance. It works by generating application I/O kernel/benchmarks as determined by a domain-specific model....
Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....
In-memory key-value store is a crucial building block of large-scale web architecture. Given the growth of the data volume and the need for low-latency responses, cost-effective storage expansion and fast large-message processing are the major challenges. In this paper, we explore the design of key-value middleware that takes advantage of modern NVMe SSDs and RDMA interconnects to achieve high performance...
Task mapping is an important problem in parallel and distributed computing. The goal in task mapping is to find an optimal layout of the processes of an application (or a task) onto a given network topology. We target this problem in the context of staging applications. A staging application consists of two or more parallel applications (also referred to as staging tasks) which run concurrently and...
Scientific computing requires trust in results. In high-performance computing, trust is impeded by silent data corruption (SDC), in other words corruption that remains unnoticed. Numerical integration solvers are especially sensitive to SDCs because an SDC introduced in a certain step affects all the following steps. SDCs can even cause the solver to become unstable. Adaptive solvers can change the...
Aggregating millions of hardware components to construct an exascale computing platform will pose significant resilience challenges. In addition to slowdowns associated with detected errors, silent errors are likely to further degrade application performance. Moreover, silent data corruption (SDC) has the potential to undermine the integrity of the results produced by important scientific applications...
In this paper, we present a non-parametric dataanalytic soft-error detector. Our detector uses the key properties of Gaussian process regression. First, because Gaussian process regression provides confidence on the prediction, this confidence can be used to automatize construction of the detection range. Second, because the correlation model of a Gaussian process captures the similarity among neighboring...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.