The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Correlated failures have recently gained more attention in the research of failures in large scale systems. Recent studies have pointed out the negative effect of ignoring such failures when designing a fault tolerant scheme for large scale systems. In this paper, we explore the behaviors of temporal correlated failures arising from cyclic dependency among task nodes via an abstract model. Using this...
Failures are a permanent menace for the availability of Internet services. During the last decades, numerous fault-tolerant approaches have been proposed for the wide spectrum of Internet services, including stateful firewalls. Most of these solutions adopt reactive approaches to mask failures by replicating state-changes between replicas. However, reactive replication is a resource consuming task...
The use of good random numbers is essential to the integrity of many mission-critical systems. However, when such systems are replicated for Byzantine fault tolerance, a serious issue arises, i.e., how do we preserve the integrity of the systems while ensuring strong replica consistency? Despite the fact that there exists a large body of work on how to render replicas deterministic under the benign...
Storage applications are in urgent need of multi-erasure codes. But there is no consensus on the best coding technique. Hafner has presented a class of multi-erasure codes named HoVer codes [1]. This kind of codes has a unique data/parity layout which provides a range of implementation options that cover a large portion of the performance/efficiency trade-off space. Thus it can be applied to many...
Cluster systems provide an excellent environment to run computation hungry applications. However, due to being created using commodity components they are prone to failures. To overcome these failures we propose to use rollback-recovery, which consists of the checkpointing and recovery facilities. Checkpointing facilities have been the focus of many previous studies; however, the recovery facilities...
We present a routing algorithm that finds n disjoint shortest paths from the source node to n target nodes in the n-dimensional hypercube in O(n3log n)=O(log3NloglogN) time, where N=2n, provided that such disjoint shortest paths exist which can be checked in O(n5/2) time, improving the previous O(n4) routing algorithm.
Detecting event regions in a monitored environment is a canonical task of wireless sensor networks (WSNs). It is a hard problem because sensor nodes are prone to failures and have scarce energy. In this paper, we seek distributed and localized algorithms for fault-tolerant event region detection. Most existing algorithms only assume that events are spatially correlated, but we argue that events are...
In a mobile computing system, mobile hosts may move around cells, resulting in a considerable cost for locating and retrieving the recovery information, which is necessary for fault tolerance. To speed up the recovery, traditionally, recovery information is migrated according to the location of the mobile host. In this paper, a scheme for efficiently handling the recovery information is proposed....
Efficiently discovering services in terms of diversified service constraints in a dense MANET is a challenging issue. This paper proposes to build a distributed suffix tree on backbone nodes as XML-based services?? index to provide a concise profile for service descriptions. Moreover, a content-addressable P2P overlay and corresponding fault-tolerance mechanisms are introduced to support the distributed...
Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces...
Peer-to-Peer (P2P) networks have been proposed as one promising approach to provide better scalability for Networked Virtual Environment (NVE) systems, but P2P-NVE also increases the probability of cheating by allowing users to manage the states of objects. In this paper, we propose Delaunay State Management (DSM), a P2P-NVE state management scheme that divides the whole virtual world into many triangular...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.