The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cyclic redundancy codes (CRCs) are widely used in network transmission and data storage applications because they provide better error detection than lighter weight checksum techniques. 24- and 32-bit CRC computations are becoming necessary to provide sufficient error detection capability (Hamming distance) for critical embedded network applications. However, the computational cost of such CRCs can...
Code and data duplication has been identified as one of the important mechanisms for improving reliability. In a chip multiprocessor-based execution environment, while it is possible to hide the overhead of code duplication through parallelism, hiding the memory space overhead incurred by data duplication is more difficult. This paper presents a compiler-directed memory-conscious data duplication...
Embedded control systems consist of multiple components with different criticality levels interacting with each other. For example, in a passenger jet, the navigation system interacts with the passenger entertainment system in providing passengers the distance-to-destination information. It is imperative that failures in the non-critical subsystem should not compromise critical functionality. This...
Fault-tolerant time-triggered communication relies on the synchronization of local clocks. The startup problem is the problem of reaching a sufficient degree of synchronization after power-on of the system. The complexity of this problem naturally depends on the system assumptions. The system assumptions in this paper were compiled from cooperation with partners in the automotive and aeronautic industry...
In this paper we propose a generic frame for the implementation of a dual-core processor with two modes of operation. One is the safety mode that allows to run the two cores in lock step in a classical master/checker fashion. A clock delay of 1.5 clock cycles between master and checker establishes the temporal redundancy to minimize the potential for common mode faults. The second operation mode allows...
This experience report describes the design and implementation of safety-critical software and hardware for respiratory gating of a medical linear accelerator. Respiratory gating refers to a radiotherapy technique for treating cancer in the lung, liver, and abdomen, where tumors move while a patient breathes. A computer software program tracks the position of the tumor within the human body using...
Delays and errors are the frequent consequences of people having difficulty with a user interface. Such delays and errors can result in severe problems, particularly for mission-critical applications in which speed and accuracy are of the essence. User difficulty is often caused by interface-design defects that confuse or mislead users. Current techniques for isolating such defects are time-consuming...
Multithreaded servers with cache-coherent shared memory are the dominant type of machines used to run critical network services and database management systems. To achieve the high availability required for these tasks, it is necessary to incorporate mechanisms for error detection and recovery. Correct operation of the memory system is defined by the memory consistency model. Errors can therefore...
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Computer architects have typically addressed reliability issues by adding redundant hardware, but these techniques are often too expensive to be used widely. Software-only reliability techniques have shown promise in their ability to protect against soft-errors...
The advent of deep sub-micron technology has exacerbated reliability issues in on-chip interconnects. In particular, single event upsets, such as soft errors, and hard faults are rapidly becoming a force to be reckoned with. This spiraling trend highlights the importance of detailed analysis of these reliability hazards and the incorporation of comprehensive protection measures into all network-on-chip...
This paper presents the first hierarchical Byzantine fault-tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allows read-only queries to be performed locally within a site for the price of additional hardware. A prototype...
We analyze the problem of efficiently storing large amounts of data on a distributed set of servers that may be accessed concurrently from multiple clients by sending messages over an asynchronous network. Up to one third of the servers and an arbitrary number of clients may be faulty and exhibit Byzantine behavior. We provide the first simulation of a multiple-writer multiple-reader atomic read/write...
This paper establishes tight bounds on the best-case time-complexity of distributed atomic read/write storage implementations that tolerate worst-case conditions. We study asynchronous robust implementations where a writer and a set of reader processes (clients) access an atomic storage implemented over a set of 2t+b+1 server processes of which t can fail: b of these can be malicious and the rest...
In the asynchronous distributed system model, consensus is obtained in one communication step if all processes propose the same value. Assuming f<n/3, this is regardless of the failure detector output. A zero-degrading protocol reaches consensus in two communication steps in every stable run, i.e., when the failure detector makes no mistakes and its output does not change. We show that no leader-based...
We study consensus in a message-passing system where only some of the n2 links exhibit some synchrony. This problem was previously studied for systems with process crashes; we now consider Byzantine failures. We show that consensus can be solved in a system where there is at least one non-faulty process whose links are eventually timely; all other links can be arbitrarily slow. We also show that,...
In previous work, it has been shown how to solve atomic broadcast by reduction to consensus on messages. While this solution is theoretically correct, it has its limitations in practice, since executing consensus on large messages can quickly saturate the system. The problem can be addressed by executing consensus on message identifiers instead of the full messages, in order to decouple the size of...
This paper considers the eventual leader election problem in asynchronous message-passing systems where an arbitrary number t of processes can crash (t<n, where n is the total number of processes). It considers weak assumptions both on the initial knowledge of the processes and on the network behavior. More precisely, initially, a process knows only its identity and the fact that the process identities...
Self-propagating malware like worms and bots can dramatically impact the availability and reliability of the Internet. Techniques for the detection and mitigation of Internet threats using content prevalence and scan detectors are based on assumptions of how threats propagate. Some of these assumptions have recently been called into question by observations of huge discrepancies in the quantity of...
Despite the proliferation of detection and containment techniques in the worm defense literature, simple threshold-based methods remain the most widely deployed and most popular approach among practitioners. This popularity arises out of the simplistic appeal, ease of use, and independence from attack-specific properties such as scanning strategies and signatures. However, such approaches have known...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.