The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We have been developing a light-weight workflow system called Pwrake to execute data-intensive many-task workflows with the help of high-performance parallel I/O of Gfarm file system. This paper discusses the design of fault tolerance mechanism implemented in Pwrake. To avoid a workflow abort in the occurrence of a worker node failure, Pwrake detects a node failure based on the result of a task retry...
Extreme scale HPC systems are expected to reach exascale performance around the year 2020. While it is widely known that theses systems pose new challenges regarding energy efficiency of architectures, concurrency and resiliency, they also challenge developers of applications trying to efficiently utilizing resources: Managing parallel control flows, hardware resources and dependencies is a complex...
In cloud computing and high performance computing, a large job is typically divided into many small tasks for parallel execution in a distributed environment. Due to different reasons, some tasks (so-called ‘stragglers’) are considerably slower than the others, delaying the completion of the job. We propose a new machine learning approach to automatically identify and diagnose the stragglers. To first...
Scientists in each experiment team share their data and use distributed resources for conducting their experiments. These experiments are being accompanied in collaboration with teams that are globally dispersed. Scientific data need to be replicated or cached at distributed locations around the world. Data locality problem and transferred data overhead are important challenges for scheduling such...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.