The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the most important aspects that influences the performance of parallel applications is the speed of communication between their tasks. To optimize communication, tasks that exchange lots of data should be mapped to processing units that have a high network performance. This technique is called communication-aware task mapping and requires detailed information about the underlying network topology...
Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns...
As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped...
In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing output data volumes and increasing costs of data movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate scientific simulations and online analysis performed on the scientific...
Many cloud computing providers use overbookingto increase their low utilization ratios. This however increases therisk of performance degradation due to interference among co-located VMs. To address this problem we present a service leveland performance aware controller that: (1) provides performanceisolation for high QoS VMs, and (2) reduces the VM interferencebetween low QoS VMs by dynamically mapping...
In large scale data centers, a single fault can lead to correlated failures of several physical machines and the tasks running on them, simultaneously. Such correlated failures can severely damage the reliability of a service or a job. This paper models the impact of stochastic and correlated failures on job reliability in a data center. We focus on correlated failures caused by power outages or failures...
Dense systems with large number of cores per node are becoming increasinglypopular. Existing designs of the Process Management Interface (PMI) show poorscalability in terms of performance and memory consumption on such systems withlarge number of processes concurrently accessing the PMI interface. Ouranalysis shows the local socket-based communication scheme used by PMI to be amajor bottleneck. While...
Power management has become a central issue inlarge-scale computing clusters where a considerable amount ofenergy is consumed and a large operational cost is incurredannually. Traditional power management techniques have a centralizeddesign that creates challenges for scalability of computingclusters. In this work, we develop a framework for distributedpower budget allocation that maximizes the utility...
Due to the diversity in the applications that run in clusters, many different application frameworks have been developed, such as MapReduce for data-intensive batch jobs and Spark for interactive data analytics. A framework is first deployed in a cluster, and then starts executing a large set of jobs that are submitted over time. When multiple such frameworks with time-varying resource demands are...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.