The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dependable real-time systems typically consist of tasks of mixed-criticality levels with associated fault tolerance (FT) requirements and scheduling them in a fault-tolerant manner to efficiently satisfy these requirements is a challenging problem. From the designers' perspective, the most natural way to specify the task criticalities is by expressing the reliability requirements at task level, without...
Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one...
An optimal checkpoint strategy for fault-tolerance in real-time systems is addressed in this paper. We consider multiple real-time tasks with arbitrary periods that are scheduled by Rate Monotonic (RM) algorithm. Equidistant checkpointing is maintained for each kind of task, while the width of checkpoint intervals is different with respect to the task. We propose a method to determine the optimal...
The paper analyses the load imbalance problem and the QoS-based fault-tolerant scheduling algorithm in Grid Resource Scheduling, and proposes a new scheduling algorithm based on the priority of a task-based parameters of Qos constrained scheduling strategy. The method is based on using the generalized stochastic Petri nets with inhibitor arc to establish the grid scheduling model and improve the Min-Min...
As the scale of high-performance computing systems continues to grow, the impact of failures on the systems is increasingly critical. Research has been performed on fault prediction and associated precautionary actions. While this approach is valuable, it is not adequate because of the inevitability of failures. Postfailure recovery is equally important; however, most current work relies mainly on...
Large-scale scientific experiments are usually supported by scientific workflows that may demand high performance computing infrastructure. Within a given experiment, the same workflow may be explored with different sets of parameters. However, the parallelization of the workflow instances is hard to be accomplished mainly due to the heterogeneity of its activities. Many-Task computing paradigm seems...
We present an approach for scheduling of fault-tolerant embedded applications composed of soft and hard real-time processes running on distributed embedded systems. The hard processes are critical and must always complete on time. A soft process can complete after its deadline and its completion time is associated with a value function that characterizes its contribution to the quality-of-service...
Dependable communication is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in many real-time applications. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of industrial applications, mostly due to its efficient bandwidth utilization, ability to provide real-time guarantees, as...
In multiprocessor systems, passive replication is a technique that trades processing power for increased reliability. One approach of passive replication, called primary-backup task scheduling, is often used in real-time multiprocessor systems to ensure that deadlines are met in spite of faults. Briefly, it consists in scheduling a secondary task conditionally, in such a way that the secondary task...
Aiming at flight safety of high-altitude long-endurance unmanned aerial vehicle (UAV), a distributed fault-tolerant computer (FTC) was designed based on controller area network(CAN). According to the requirements of UAV control and the system structure of FTC, solutions of key issues (redundancy management, synchronization technology, scheduling strategy, CAN communication and software implementation...
Critical to the successful deployment of grid systems is their ability to guarantee efficient meta-scheduling, namely optimal allocation of jobs across a pool of sites with diverse local scheduling policies. The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity of next-generation grids, the success of which relies on the development...
A distributed algorithm is self-stabilizing if after faults and attacks hit the system and place it in some arbitrary global state, the system recovers from this catastrophic situation without external intervention in finite time. In this paper, we consider the problem of constructing self-stabilizingly a locally maximizable task (such as constructing a maximal independent set, a maximal matching,...
Dependable communications is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in one way or the other in many real-time applications. Though many smaller systems are providing dependable services employing uniprocesssor solutions, stringent fault containment strategies etc., these practices are fast becoming inadequate...
The following topics are discussed: real-time applications; embedded technology; scheduling; operating systems; robust and fault-tolerant systems, thermal and energy aware systems; hardware-software codesign; systems modeling and design; and wireless sensor networks.
In this paper we are interested in mixed hard/soft real-time fault-tolerant applications mapped on distributed heterogeneous architectures. We use the Earliest Deadline First (EDF) scheduling for the hard real-time tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The bandwidth reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation,...
MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However,...
The static scheduling algorithms of real time are developed based on the RMS, which mainly deal with periodic tasks. But for the chance to contain a mixture of non-cyclical and occasional task, the traditional rate monotonic scheduling algorithm is no longer applicable. This paper analyzes and improves RMS algorithm, and combines the improved algorithms with P/B algorithm. The system is not only able...
Complex scientific workflows are now commonly executed on global grids. With the increasing scale complexity, heterogeneity and dynamism of grid environments the challenges of managing and scheduling these workflows are augmented by dependability issues due to the inherent unreliable nature of large-scale grid infrastructure. In addition to the traditional fault tolerance techniques, specific checkpoint-recovery...
This paper studies the dilemma between fault tolerance and energy efficiency in frame-based real-time systems. Given a set of K tasks to be executed on a system that supports L voltage levels, the proposed heuristic-based scheduling technique minimizes the energy consumption of tasks execution when faults are absent, and preserves feasibility under the worst case of fault occurrences. The proposed...
The onboard computer systems used in satellite launch vehicles have stringent timing requirements due the mission critical nature of their tasks. The complete control of launch vehicles is done by onboard computers (OBC) which relate to the navigation guidance, all prelaunch operations and generation of mission critical events. A fault in these systems could lead to a mission failure and catastrophic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.