Search results

chapter

Task-Level Probabilistic Scheduling Guarantees for Dependable Real-Time Systems - A Designer Centric Approach

H Aysan, R Dobrin, S Punnekkat

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops > 281 - 287

2011 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops

Dependable real-time systems typically consist of tasks of mixed-criticality levels with associated fault tolerance (FT) requirements and scheduling them in a fault-tolerant manner to efficiently satisfy these requirements is a challenging problem. From the designers' perspective, the most natural way to specify the task criticalities is by expressing the reliability requirements at task level, without...

article

QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

Xiaomin Zhu, Xiao Qin, Meikang Qiu

IEEE Transactions on Computers > 2011 > 60 > 6 > 800 - 812

Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one...

chapter

Schedulability and optimal checkpoint placement for real-time multi-tasks

S W Kwak, J.-M Yang

2010 IEEE International Conference on Industrial Engineering and Engineering Management > 778 - 782

2010 IEEE International Conference on Industrial Engineering & Engineering Management (IE&EM 2010)

An optimal checkpoint strategy for fault-tolerance in real-time systems is addressed in this paper. We consider multiple real-time tasks with arbitrary periods that are scheduled by Rate Monotonic (RM) algorithm. Equidistant checkpointing is maintained for each kind of task, while the width of checkpoint intervals is different with respect to the task. We propose a method to determine the optimal...

chapter

A GSPN-based Grid Resource Schedule Algorithm

Zhi-xiang Yuan, Jun Zhou, Hong Ye

2010 Ninth International Conference on Grid and Cloud Computing > 457 - 460

2010 9th International Conference on Grid and Cloud Computing (GCC 2010)

The paper analyses the load imbalance problem and the QoS-based fault-tolerant scheduling algorithm in Grid Resource Scheduling, and proposes a new scheduling algorithm based on the priority of a task-based parameters of Qos constrained scheduling strategy. The method is based on using the generalized stochastic Petri nets with inhibitor arc to establish the grid scheduling model and improve the Min-Min...

chapter

Automatic and coordinated job recovery for high performance computing

Wei Tang, Zhiling Lan, Narayan Desai, Daniel Buettner

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers > 1 - 9

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2010)

As the scale of high-performance computing systems continues to grow, the impact of failures on the systems is increasingly critical. Research has been performed on fault prediction and associated precautionary actions. While this approach is valuable, it is not adequate because of the inevitability of failures. Postfailure recovery is equally important; however, most current work relies mainly on...

chapter

Improving Many-Task computing in scientific workflows using P2P techniques

J Dias, E Ogasawara, Daniel de Oliveira, E Pacitti, more

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers > 1 - 10

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2010)

Large-scale scientific experiments are usually supported by scientific workflows that may demand high performance computing infrastructure. Within a given experiment, the same workflow may be explored with different sets of parameters. However, the parallelization of the workflow instances is hard to be accomplished mainly due to the heterogeneity of its activities. Many-Task computing paradigm seems...

chapter

Value-based scheduling of distributed fault-tolerant real-time systems with soft and hard timing constraints

V Izosimov, P Eles, Zebo Peng

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia > 31 - 40

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2010)

We present an approach for scheduling of fault-tolerant embedded applications composed of soft and hard real-time processes running on distributed embedded systems. The hard processes are critical and must always complete on time. A soft process can complete after its deadline and its completion time is associated with a value function that characterizes its contribution to the quality-of-service...

chapter

Efficient fault tolerant scheduling on Controller Area Network (CAN)

H Aysan, A Thekkilakattil, R Dobrin, S Punnekkat

2010 IEEE 15th Conference on Emerging Technologies&Factory Automation (ETFA 2010) > 1 - 8

2010 IEEE 15th Conference on Emerging Technologies & Factory Automation (ETFA 2010)

Dependable communication is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in many real-time applications. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of industrial applications, mostly due to its efficient bandwidth utilization, ability to provide real-time guarantees, as...

chapter

Towards Free Task Overloading in Passive Replication Based Real-time Multiprocessors

Wei Sun, Naixue Xiong, Laurence T Yang, Chunming Rong

2010 10th IEEE International Conference on Computer and Information Technology > 1735 - 1742

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

In multiprocessor systems, passive replication is a technique that trades processing power for increased reliability. One approach of passive replication, called primary-backup task scheduling, is often used in real-time multiprocessor systems to ensure that deadlines are met in spite of faults. Briefly, it consists in scheduling a secondary task conditionally, in such a way that the secondary task...

chapter

Implementation of a Distributed Fault-Tolerant Computer for UAV

Zhang Zengan, Chen Xin, Zhou Yueping

2010 International Conference on Electrical and Control Engineering > 5266 - 5269

2010 International Conference on Electrical and Control Engineering (ICECE 2010)

Aiming at flight safety of high-altitude long-endurance unmanned aerial vehicle (UAV), a distributed fault-tolerant computer (FTC) was designed based on controller area network(CAN). According to the requirements of UAV control and the system structure of FTC, solutions of key issues (redundancy management, synchronization technology, scheduling strategy, CAN communication and software implementation...

chapter

ARiA: A Protocol for Dynamic Fully Distributed Grid Meta-scheduling

A Brocco, A Malatras, Ye Huang, B Hirsbrunner

2010 IEEE 30th International Conference on Distributed Computing Systems > 86 - 95

2010 IEEE 30th International Conference on Distributed Computing Systems. ICDCS 2010

Critical to the successful deployment of grid systems is their ability to guarantee efficient meta-scheduling, namely optimal allocation of jobs across a pool of sites with diverse local scheduling policies. The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity of next-generation grids, the success of which relies on the development...

chapter

Stabilizing Locally Maximizable Tasks in Unidirectional Networks Is Hard

Toshimitsu Masuzawa, Sébastien Tixeuil

2010 IEEE 30th International Conference on Distributed Computing Systems > 718 - 727

2010 IEEE 30th International Conference on Distributed Computing Systems. ICDCS 2010

A distributed algorithm is self-stabilizing if after faults and attacks hit the system and place it in some arbitrary global state, the system recovers from this catastrophic situation without external intervention in finite time. In this paper, we consider the problem of constructing self-stabilizingly a locally maximizable task (such as constructing a maximal independent set, a maximal matching,...

chapter

Fault Tolerant Scheduling on Controller Area Network (CAN)

Hüseyin Aysan, Radu Dobrin, Sasikumar Punnekkat

2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops > 226 - 232

2010 13th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing Workshops (ISORCW Workshops 2010)

Dependable communications is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in one way or the other in many real-time applications. Though many smaller systems are providing dependable services employing uniprocesssor solutions, stringent fault containment strategies etc., these practices are fast becoming inadequate...

chapter

Title Page i

2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium > i

16th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2010)

The following topics are discussed: real-time applications; embedded technology; scheduling; operating systems; robust and fault-tolerant systems, thermal and energy aware systems; hardware-software codesign; systems modeling and design; and wireless sensor networks.

chapter

Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

Prabhat Kumar Saraswat, Paul Pop, Jan Madsen

2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium > 89 - 98

16th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2010)

In this paper we are interested in mixed hard/soft real-time fault-tolerant applications mapped on distributed heterogeneous architectures. We use the Earliest Deadline First (EDF) scheduling for the hard real-time tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The bandwidth reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation,...

chapter

Improving MapReduce fault tolerance in the cloud

Qin Zheng

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 6

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However,...

chapter

Research on Static Fault-Tolerance Scheduling Algorithm

Zhao Qi, Qu Haitao

2010 International Conference on Measuring Technology and Mechatronics Automation > 3 > 179 - 181

2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA 2010)

The static scheduling algorithms of real time are developed based on the RMS, which mainly deal with periodic tasks. But for the chance to contain a mixture of non-cyclical and occasional task, the traditional rate monotonic scheduling algorithm is no longer applicable. This paper analyzes and improves RMS algorithm, and combines the improved algorithms with P/B algorithm. The system is not only able...

chapter

Fault Tolerance and Recovery in Grid Workflow Management Systems

Elvin Sindrilaru, Alexandru Costan, Valentin Cristea

2010 International Conference on Complex, Intelligent and Software Intensive Systems > 475 - 480

Fourth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010)

Complex scientific workflows are now commonly executed on global grids. With the increasing scale complexity, heterogeneity and dynamism of grid environments the challenges of managing and scheduling these workflows are augmented by dependability issues due to the inherent unreliable nature of large-scale grid infrastructure. In addition to the traditional fault tolerance techniques, specific checkpoint-recovery...

chapter

Scheduling for energy efficiency and fault tolerance in hard real-time systems

Yu Liu, Han Liang, Kaijie Wu

2010 Design, Automation&Test in Europe Conference&Exhibition (DATE 2010) > 1444 - 1449

2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)

This paper studies the dilemma between fault tolerance and energy efficiency in frame-based real-time systems. Given a set of K tasks to be executed on a system that supports L voltage levels, the proposed heuristic-based scheduling technique minimizes the energy consumption of tasks execution when faults are absent, and preserves feasibility under the worst case of fault occurrences. The proposed...

chapter

An Improved Redundancy Scheme for the Optimal Utilization of Onboard Computers

R. Pillay, S. Punnekkat, S. Dasgupta

2009 Annual IEEE India Conference > 1 - 4

2009 Annual IEEE India Conference (INDICON 2009)

The onboard computer systems used in satellite launch vehicles have stringent timing requirements due the mission critical nature of their tasks. The complete control of launch vehicles is done by onboard computers (OBC) which relate to the navigation guidance, all prelaunch operations and generation of mission critical events. A fault in these systems could lead to a mission failure and catastrophic...

INFONA - science communication portal

Search results

Task-Level Probabilistic Scheduling Guarantees for Dependable Real-Time Systems - A Designer Centric Approach

QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

Schedulability and optimal checkpoint placement for real-time multi-tasks

A GSPN-based Grid Resource Schedule Algorithm

Automatic and coordinated job recovery for high performance computing

Improving Many-Task computing in scientific workflows using P2P techniques

Value-based scheduling of distributed fault-tolerant real-time systems with soft and hard timing constraints

Efficient fault tolerant scheduling on Controller Area Network (CAN)

Towards Free Task Overloading in Passive Replication Based Real-time Multiprocessors

Implementation of a Distributed Fault-Tolerant Computer for UAV

ARiA: A Protocol for Dynamic Fully Distributed Grid Meta-scheduling

Stabilizing Locally Maximizable Tasks in Unidirectional Networks Is Hard

Fault Tolerant Scheduling on Controller Area Network (CAN)

Title Page i

Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

Improving MapReduce fault tolerance in the cloud

Research on Static Fault-Tolerance Scheduling Algorithm

Fault Tolerance and Recovery in Grid Workflow Management Systems

Scheduling for energy efficiency and fault tolerance in hard real-time systems

An Improved Redundancy Scheme for the Optimal Utilization of Onboard Computers

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options