Search results

Items from 1 to 20 out of 31 results

chapter

Task-Level Probabilistic Scheduling Guarantees for Dependable Real-Time Systems - A Designer Centric Approach

H Aysan, R Dobrin, S Punnekkat

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops > 281 - 287

2011 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops

Dependable real-time systems typically consist of tasks of mixed-criticality levels with associated fault tolerance (FT) requirements and scheduling them in a fault-tolerant manner to efficiently satisfy these requirements is a challenging problem. From the designers' perspective, the most natural way to specify the task criticalities is by expressing the reliability requirements at task level, without...

article

QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

Xiaomin Zhu, Xiao Qin, Meikang Qiu

IEEE Transactions on Computers > 2011 > 60 > 6 > 800 - 812

Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one...

chapter

Schedulability and optimal checkpoint placement for real-time multi-tasks

S W Kwak, J.-M Yang

2010 IEEE International Conference on Industrial Engineering and Engineering Management > 778 - 782

2010 IEEE International Conference on Industrial Engineering & Engineering Management (IE&EM 2010)

An optimal checkpoint strategy for fault-tolerance in real-time systems is addressed in this paper. We consider multiple real-time tasks with arbitrary periods that are scheduled by Rate Monotonic (RM) algorithm. Equidistant checkpointing is maintained for each kind of task, while the width of checkpoint intervals is different with respect to the task. We propose a method to determine the optimal...

chapter

Improving Many-Task computing in scientific workflows using P2P techniques

J Dias, E Ogasawara, Daniel de Oliveira, E Pacitti, more

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers > 1 - 10

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2010)

Large-scale scientific experiments are usually supported by scientific workflows that may demand high performance computing infrastructure. Within a given experiment, the same workflow may be explored with different sets of parameters. However, the parallelization of the workflow instances is hard to be accomplished mainly due to the heterogeneity of its activities. Many-Task computing paradigm seems...

chapter

Value-based scheduling of distributed fault-tolerant real-time systems with soft and hard timing constraints

V Izosimov, P Eles, Zebo Peng

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia > 31 - 40

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2010)

We present an approach for scheduling of fault-tolerant embedded applications composed of soft and hard real-time processes running on distributed embedded systems. The hard processes are critical and must always complete on time. A soft process can complete after its deadline and its completion time is associated with a value function that characterizes its contribution to the quality-of-service...

chapter

Efficient fault tolerant scheduling on Controller Area Network (CAN)

H Aysan, A Thekkilakattil, R Dobrin, S Punnekkat

2010 IEEE 15th Conference on Emerging Technologies&Factory Automation (ETFA 2010) > 1 - 8

2010 IEEE 15th Conference on Emerging Technologies & Factory Automation (ETFA 2010)

Dependable communication is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in many real-time applications. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of industrial applications, mostly due to its efficient bandwidth utilization, ability to provide real-time guarantees, as...

chapter

Implementation of a Distributed Fault-Tolerant Computer for UAV

Zhang Zengan, Chen Xin, Zhou Yueping

2010 International Conference on Electrical and Control Engineering > 5266 - 5269

2010 International Conference on Electrical and Control Engineering (ICECE 2010)

Aiming at flight safety of high-altitude long-endurance unmanned aerial vehicle (UAV), a distributed fault-tolerant computer (FTC) was designed based on controller area network(CAN). According to the requirements of UAV control and the system structure of FTC, solutions of key issues (redundancy management, synchronization technology, scheduling strategy, CAN communication and software implementation...

chapter

Fault Tolerant Scheduling on Controller Area Network (CAN)

Hüseyin Aysan, Radu Dobrin, Sasikumar Punnekkat

2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops > 226 - 232

2010 13th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing Workshops (ISORCW Workshops 2010)

Dependable communications is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in one way or the other in many real-time applications. Though many smaller systems are providing dependable services employing uniprocesssor solutions, stringent fault containment strategies etc., these practices are fast becoming inadequate...

chapter

Improving MapReduce fault tolerance in the cloud

Qin Zheng

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 6

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However,...

chapter

Research on Static Fault-Tolerance Scheduling Algorithm

Zhao Qi, Qu Haitao

2010 International Conference on Measuring Technology and Mechatronics Automation > 3 > 179 - 181

2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA 2010)

The static scheduling algorithms of real time are developed based on the RMS, which mainly deal with periodic tasks. But for the chance to contain a mixture of non-cyclical and occasional task, the traditional rate monotonic scheduling algorithm is no longer applicable. This paper analyzes and improves RMS algorithm, and combines the improved algorithms with P/B algorithm. The system is not only able...

chapter

Fault Tolerance and Recovery in Grid Workflow Management Systems

Elvin Sindrilaru, Alexandru Costan, Valentin Cristea

2010 International Conference on Complex, Intelligent and Software Intensive Systems > 475 - 480

Fourth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010)

Complex scientific workflows are now commonly executed on global grids. With the increasing scale complexity, heterogeneity and dynamism of grid environments the challenges of managing and scheduling these workflows are augmented by dependability issues due to the inherent unreliable nature of large-scale grid infrastructure. In addition to the traditional fault tolerance techniques, specific checkpoint-recovery...

chapter

An Improved Redundancy Scheme for the Optimal Utilization of Onboard Computers

R. Pillay, S. Punnekkat, S. Dasgupta

2009 Annual IEEE India Conference > 1 - 4

2009 Annual IEEE India Conference (INDICON 2009)

The onboard computer systems used in satellite launch vehicles have stringent timing requirements due the mission critical nature of their tasks. The complete control of launch vehicles is done by onboard computers (OBC) which relate to the navigation guidance, all prelaunch operations and generation of mission critical events. A fault in these systems could lead to a mission failure and catastrophic...

chapter

A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact

K. Plankensteiner, R. Prodan, T. Fahringer

2009 Fifth IEEE International Conference on e-Science > 313 - 320

2009 5th IEEE International Conference on e-Science (e-Science 2009)

Even though highly distributed environments such as Clouds and Grids are increasingly used for e-science high performance applications, they still cannot deliver the robustness and reliability needed for widespread acceptance as ubiquitous scientific tools. To overcome this problem, existing systems resort to fault tolerance mechanisms such as task replication and task resubmission. In this paper...

chapter

Resource Failure Impact on Job Execution in Grid

P.K. Suri, M. Singh

2009 International Conference on Advances in Recent Technologies in Communication and Computing > 133 - 135

2009 International Conference on Advances in Recent Technologies in Communication and Computing. ARTCom 2009

Grid environment, being a collection of heterogeneous and geographically distributed resources, is prone to many kinds of failures such as process failures, resource and network failures. In this paper, we address the problem of resource failure. Resources in grid oscillate between being available and unavailable to the grid. When and how they do so, depends on the failure characteristics of the machines,...

chapter

A fault-tolerant peer-to-peer object storage architecture with multidimensional range search capabilities and adaptive topology

M.I. Andreica, E.-D. Tirsa, N. Tapus

2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing > 221 - 228

2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing (ICCP)

In this paper we present a fault-tolerant, collaborative peer-to-peer object storage architecture with adaptive topology and efficient multidimensional range search capabilities. Every stored object has a fixed set of index properties, whose ranges of values form a multidimensional geometric property space. The architecture efficiently supports multidimensional range queries by mapping the peer identifiers...

chapter

A Cascading Redundancy Approach for Dependable Real-Time Systems

H. Aysan, R. Dobrin, S. Punnekkat

2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications > 467 - 476

2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2009)

Dependable real-time systems typically consist of tasks of multiple criticality levels and scheduling them in a fault tolerant manner is a challenging problem. Redundancy in the physical and temporal domains for achieving fault tolerance has been often dealt independently based on the types of errors one needs to tolerate. To our knowledge, there had been no work which tries to integrate fault tolerant...

chapter

Fault-Tolerance Scheduling by Using Rough Set Based Multi-checkpointing on Economic Grids

A. Bouyer, A.H. Abdullah, H. Ebrahimpour, F. Nasrollahi

2009 International Conference on Computational Science and Engineering > 1 > 103 - 109

2009 International Conference on Computational Science and Engineering (CSE)

Grid scheduling process is a main factor that affects system performance. If the grid scheduler is enabled to selecting proper resources and determining order of tasks in queue, each task is executed without missing their deadline and extra faults; and consequently, the response time of job is decreased. Since the grid uses heterogeneous resources, the possibility of failure occurrence in those resources...

chapter

Using rough set based multi-checkpointing for fault-tolerance scheduling in economic grids

A. Bouyer, A.H. Abdullah, M.M. Sap

2009 First International Conference on Networked Digital Technologies > 321 - 326

2009 First International Conference on Networked Digital Technologies (NDT 2009)

Fault tolerant Grid scheduling is of vital importance in the Grid computing world. Task replication and checkpointing is two popular methods to achieve a fault tolerant scheduling. Replication method is not an applicable way in economic-based grid computing due to use a large number of resources. The cost of spent time must be paid by consumer for all participant nodes. In this paper, we proposed...

chapter

A Novel Fault-tolerant Particle Swarm Optimization Scheduler for Scheduling Independent Task in Grid Computing Environment

M. Nikkhah, A.M. Rahmani, M.H. Yektaie, M. Nikkhah

2009 Eighth IEEE/ACIS International Conference on Computer and Information Science > 489 - 493

2009 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS)

Grid computing allows one to unite pools of servers, storage systems, and networks from different domain with their specific management policies, into a single large system. The Grid Environment is dynamic and its domains act autonomously. Unfortunately, in such an environment failure may occur occasionally or a volatile host can delay the entire execution for a long period of time, which in turn...

chapter

A Hybrid Fault-Tolerant Scheduling Algorithm of Periodic and Aperiodic Real-Time Tasks to Partially Reconfigurable FPGAs

Jin-Yong Yin, Guo-Chang Guo, Yan-Xia Wu

2009 International Workshop on Intelligent Systems and Applications > 1 - 5

2009 International Workshop on Intelligent Systems and Applications

FPGAs have been used widely in space related design engineers and the probability of fault occurring increases when they are subject to total ionization dose. In this paper, the problem of fault-tolerant is solved by task scheduling and a fault tolerant scheduling algorithm of hardware real-time tasks is proposed based on primary/backup copy. By scheduled backwards, the backup copy executes as late...

Keywords:
FAULT TOLERANT COMPUTING
FAULT TOLERANT SYSTEMS

Publication date

Set your own date range

Publication type

book (28)
article (3)

Keywords

FAULT TOLERANCE (29)
REAL TIME SYSTEMS (14)
GRID COMPUTING (12)
PROCESSOR SCHEDULING (10)
SCHEDULES (8)
REAL-TIME SYSTEMS (7)
RESOURCE ALLOCATION (7)
CHECKPOINTING (6)
EMBEDDED SYSTEMS (6)
TRANSIENT ANALYSIS (6)
FAULT-TOLERANCE (5)
DATA MINING (4)
ENERGY CONSUMPTION (4)
POWER AWARE COMPUTING (4)
REDUNDANCY (4)
RELIABILITY (4)
BIOLOGICAL SYSTEM MODELING (3)
COMPUTATIONAL GRID (3)
COMPUTER ARCHITECTURE (3)
CONTROLLER AREA NETWORK (3)
CONTROLLER AREA NETWORKS (3)
DYNAMIC SCHEDULING (3)
PEER TO PEER COMPUTING (3)
PEER-TO-PEER COMPUTING (3)
PROGRAM PROCESSORS (3)
QUALITY OF SERVICE (3)
RESOURCE MANAGEMENT (3)
SCHEDULING ALGORITHM (3)
TASK ANALYSIS (3)
TIME FACTORS (3)
AEROSPACE ELECTRONICS (2)
AVAILABILITY (2)
COMPUTER CRASHES (2)
COMPUTERS (2)
DISTRIBUTED PROCESSING (2)
DISTRIBUTED SYSTEMS (2)
DYNAMIC POWER MANAGEMENT (2)
DYNAMIC VOLTAGE SCALING (2)
ENERGY CONSUMPTION REDUCTION (2)
ENERGY MINIMIZATION (2)
FAULT TOLERANT SCHEDULING (2)
FAULT-TOLERANCE SCHEDULING (2)
GRID SCHEDULING (2)
HARDWARE (2)
HEURISTIC ALGORITHMS (2)
INDEXES (2)
JOB SHOP SCHEDULING (2)
MONITORING (2)
NETWORKED EMBEDDED SYSTEMS (2)
PEER-TO-PEER (2)
PROBABILITY (2)
PROTOCOLS (2)
REAL-TIME (2)
REAL-TIME SYSTEM (2)
REPLICATION (2)
ROUGH SET (2)
ROUGH SET THEORY (2)
SCHEDULABILITY (2)
TASK REPLICATION (2)
TASK SCHEDULING (2)
TIME REDUNDANCY (2)
WORKFLOW MANAGEMENT SOFTWARE (2)
ACTIVE TASK MANAGEMENT (1)
ACTIVEBPEL WORKFLOW ENGINE (1)
ADAPTIVE SCHEDULING (1)
ADAPTIVE TOPOLOGY (1)
AEROSPACE COMPUTING (1)
AEROSPACE CONTROL (1)
AGENTS PLATFORM (1)
AIRCRAFT CONTROL (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALLOCATION POLICY (1)
AMAZON ELASTIC MAPREDUCE (1)
ARBITRARY PERIODS (1)
ARTIFICIAL SATELLITES (1)
ASSOCIATED FAULT TOLERANCE (1)
ATMOSPHERIC MODELING (1)
AUSTRIAN GRID ENVIRONMENT (1)
AUTOMATION (1)
AUTOMATION DOMAIN (1)
AUTOMOTIVE DOMAIN (1)
AUTOMOTIVE ENGINEERING (1)
AUTONOMIC WORKFLOW MANAGEMENT SYSTEM (1)
BACKUP (1)
BACKUP COPY (1)
BANDWIDTH (1)
BATTERY-OPERATED EMBEDDED SYSTEM (1)
BPEL (1)
CAN (1)
CAN (CONTROL AREA NETWORK) (1)
CAN COMMUNICATION (1)
CAN FAULT-TOLERANT MECHANISM (1)
CAN SCHEDULING (1)
CASCADING REDUNDANCY APPROACH (1)
CHECKPOINT INTERVALS (1)
CHECKPOINT-RECOVERY SCHEMES (1)
CIRCUIT FAULTS (1)
more

INFONA - science communication portal

Search results

Task-Level Probabilistic Scheduling Guarantees for Dependable Real-Time Systems - A Designer Centric Approach

QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

Schedulability and optimal checkpoint placement for real-time multi-tasks

Improving Many-Task computing in scientific workflows using P2P techniques

Value-based scheduling of distributed fault-tolerant real-time systems with soft and hard timing constraints

Efficient fault tolerant scheduling on Controller Area Network (CAN)

Implementation of a Distributed Fault-Tolerant Computer for UAV

Fault Tolerant Scheduling on Controller Area Network (CAN)

Improving MapReduce fault tolerance in the cloud

Research on Static Fault-Tolerance Scheduling Algorithm

Fault Tolerance and Recovery in Grid Workflow Management Systems

An Improved Redundancy Scheme for the Optimal Utilization of Onboard Computers

A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact

Resource Failure Impact on Job Execution in Grid

A fault-tolerant peer-to-peer object storage architecture with multidimensional range search capabilities and adaptive topology

A Cascading Redundancy Approach for Dependable Real-Time Systems

Fault-Tolerance Scheduling by Using Rough Set Based Multi-checkpointing on Economic Grids

Using rough set based multi-checkpointing for fault-tolerance scheduling in economic grids

A Novel Fault-tolerant Particle Swarm Optimization Scheduler for Scheduling Independent Task in Grid Computing Environment

A Hybrid Fault-Tolerant Scheduling Algorithm of Periodic and Aperiodic Real-Time Tasks to Partially Reconfigurable FPGAs

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options