The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There is extensive evidence indicating that modern computer workloads exhibit highly variability in their processing requirements. Under such workloads, traditional task assignment policies do not perform well. Size-based policies perform significantly better than traditional policies under highly variable workloads. The main limitation of existing size-based policies though is that these have been...
Data replication has been widely used as a mean of increasing the data availability of large-scale cloud storage systems where failures are normal. Aiming to provide cost-effective availability, and improve performance and load-balancing of cloud storage, this paper presents a cost-effective dynamic replication management scheme referred to as CDRM. A novel model is proposed to capture the relationship...
When multiple instances of an application running on multiple virtual machines, an interesting problem is how to utilize the fault handling result from one application instance to heal the same fault occurred on other sibling instances, and hence to ensure high service availability in a cloud computing environment. This paper presents SHelp, a lightweight runtime system that can survive software failures...
Event traces are helpful in understanding the performance behavior of parallel applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks on most cluster systems may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying...
Computational aeroacoustics (CAA) has emerged as a tool to complement theoretical and experimental approaches for robust and accurate prediction of sound levels from aircraft airframes and engines. CAA, unlike computational fluid dynamics (CFD), involves the accurate prediction of small-amplitude acoustic fluctuations and their correct propagation to the far field. In that respect, CAA poses significant...
The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrierless MapReduce framework results in equivalent...
Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed...
Finding and counting the occurrences of a collection of subgraphs within another larger network is a computationally hard problem, closely related to graph isomorphism. The subgraph count is by itself a very powerful characterization of a network and it is crucial for other important network measurements. G-tries are a specialized data-structure designed to store and search for subgraphs. By taking...
In this paper, we study the impact of tasks reallocation onto a multi-cluster environment where clusters are heterogeneous and use different resources management policies. In this context, we propose a reallocation mechanism that migrates waiting jobs from one cluster to another. We performed simulations using real traces to study benefits of reallocations. We compared two algorithms providing the...
Overlapping communication and computation has been devised as an attractive technique to alleviate the huge application's network requirements at large scale. Overlapping will allow to fully or partially hide the long communication delays suffered when transferring messages through the network. This will relax the application's network requirements, and hence allow to deploy more cost-effective network...
In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance the workload distribution across the GPUs and CPUs with the negligible runtime overhead, resulting in the better performance than the...
Operating systems have historically been implemented as independent layers between hardware and applications. User programs communicate with the OS through a set of well defined system calls, and do not have direct access to the hardware. The OS, in turn, communicates with the underlying architecture via control registers. Except for these interfaces, the three layers are practically oblivious to...
Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. However, this kind of approach is unable to provide the...
Operating system noise, or “jitter,” is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence that architectural and system characteristics have on the impact of OS noise at scale. In this paper, we examine the impact of three such system properties:...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.