The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dependable real-time systems typically consist of tasks of mixed-criticality levels with associated fault tolerance (FT) requirements and scheduling them in a fault-tolerant manner to efficiently satisfy these requirements is a challenging problem. From the designers' perspective, the most natural way to specify the task criticalities is by expressing the reliability requirements at task level, without...
Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one...
An optimal checkpoint strategy for fault-tolerance in real-time systems is addressed in this paper. We consider multiple real-time tasks with arbitrary periods that are scheduled by Rate Monotonic (RM) algorithm. Equidistant checkpointing is maintained for each kind of task, while the width of checkpoint intervals is different with respect to the task. We propose a method to determine the optimal...
Large-scale scientific experiments are usually supported by scientific workflows that may demand high performance computing infrastructure. Within a given experiment, the same workflow may be explored with different sets of parameters. However, the parallelization of the workflow instances is hard to be accomplished mainly due to the heterogeneity of its activities. Many-Task computing paradigm seems...
We present an approach for scheduling of fault-tolerant embedded applications composed of soft and hard real-time processes running on distributed embedded systems. The hard processes are critical and must always complete on time. A soft process can complete after its deadline and its completion time is associated with a value function that characterizes its contribution to the quality-of-service...
Dependable communication is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in many real-time applications. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of industrial applications, mostly due to its efficient bandwidth utilization, ability to provide real-time guarantees, as...
Aiming at flight safety of high-altitude long-endurance unmanned aerial vehicle (UAV), a distributed fault-tolerant computer (FTC) was designed based on controller area network(CAN). According to the requirements of UAV control and the system structure of FTC, solutions of key issues (redundancy management, synchronization technology, scheduling strategy, CAN communication and software implementation...
Dependable communications is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in one way or the other in many real-time applications. Though many smaller systems are providing dependable services employing uniprocesssor solutions, stringent fault containment strategies etc., these practices are fast becoming inadequate...
MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However,...
The static scheduling algorithms of real time are developed based on the RMS, which mainly deal with periodic tasks. But for the chance to contain a mixture of non-cyclical and occasional task, the traditional rate monotonic scheduling algorithm is no longer applicable. This paper analyzes and improves RMS algorithm, and combines the improved algorithms with P/B algorithm. The system is not only able...
Complex scientific workflows are now commonly executed on global grids. With the increasing scale complexity, heterogeneity and dynamism of grid environments the challenges of managing and scheduling these workflows are augmented by dependability issues due to the inherent unreliable nature of large-scale grid infrastructure. In addition to the traditional fault tolerance techniques, specific checkpoint-recovery...
The onboard computer systems used in satellite launch vehicles have stringent timing requirements due the mission critical nature of their tasks. The complete control of launch vehicles is done by onboard computers (OBC) which relate to the navigation guidance, all prelaunch operations and generation of mission critical events. A fault in these systems could lead to a mission failure and catastrophic...
Even though highly distributed environments such as Clouds and Grids are increasingly used for e-science high performance applications, they still cannot deliver the robustness and reliability needed for widespread acceptance as ubiquitous scientific tools. To overcome this problem, existing systems resort to fault tolerance mechanisms such as task replication and task resubmission. In this paper...
Grid environment, being a collection of heterogeneous and geographically distributed resources, is prone to many kinds of failures such as process failures, resource and network failures. In this paper, we address the problem of resource failure. Resources in grid oscillate between being available and unavailable to the grid. When and how they do so, depends on the failure characteristics of the machines,...
In this paper we present a fault-tolerant, collaborative peer-to-peer object storage architecture with adaptive topology and efficient multidimensional range search capabilities. Every stored object has a fixed set of index properties, whose ranges of values form a multidimensional geometric property space. The architecture efficiently supports multidimensional range queries by mapping the peer identifiers...
Dependable real-time systems typically consist of tasks of multiple criticality levels and scheduling them in a fault tolerant manner is a challenging problem. Redundancy in the physical and temporal domains for achieving fault tolerance has been often dealt independently based on the types of errors one needs to tolerate. To our knowledge, there had been no work which tries to integrate fault tolerant...
Grid scheduling process is a main factor that affects system performance. If the grid scheduler is enabled to selecting proper resources and determining order of tasks in queue, each task is executed without missing their deadline and extra faults; and consequently, the response time of job is decreased. Since the grid uses heterogeneous resources, the possibility of failure occurrence in those resources...
Fault tolerant Grid scheduling is of vital importance in the Grid computing world. Task replication and checkpointing is two popular methods to achieve a fault tolerant scheduling. Replication method is not an applicable way in economic-based grid computing due to use a large number of resources. The cost of spent time must be paid by consumer for all participant nodes. In this paper, we proposed...
Grid computing allows one to unite pools of servers, storage systems, and networks from different domain with their specific management policies, into a single large system. The Grid Environment is dynamic and its domains act autonomously. Unfortunately, in such an environment failure may occur occasionally or a volatile host can delay the entire execution for a long period of time, which in turn...
FPGAs have been used widely in space related design engineers and the probability of fault occurring increases when they are subject to total ionization dose. In this paper, the problem of fault-tolerant is solved by task scheduling and a fault tolerant scheduling algorithm of hardware real-time tasks is proposed based on primary/backup copy. By scheduled backwards, the backup copy executes as late...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.