The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cloud computing has become increasingly popular by obviating the need for users to own and maintain complex computing infrastructure. However, due to their inherent complexity and large scale, production cloud computing systems are prone to various runtime problems caused by hardware and software failures. Dependability assurance is crucial for building sustainable cloud computing services. Although...
We present a self-repairing circuit for a mesh-connected processor array with faulty processing elements which are directly replaced by spare processing elements on two orthogonal lines at the edges of the array. First, the spare assignment problem is formalized as a matching problem in graph theory. Using the result, we present an algorithm for reconstructing the array in a convenient form for finding...
Despite the numerous prevention and protection techniques that have been developed, the exploitation of memory corruption vulnerabilities still represents a serious threat to the security of software systems and networks. Because of the adoption of the write or execute only policy (W¨'X) and address space layout randomization (ASLR), modern operate systems have been strengthened against code injection...
The evolution of systems during their operational lifetime is becoming ineluctable. Dependable systems, which continuously deliver trustworthy services, must evolve in order to comply with changes having different origins, e.g. new fault tolerance requirements, or changes in available resources. These evolutions must not violate their dependability properties, which leads to the notion of resilient...
Soft error rate (SER) of various radiation hardened latches is analyzed by simulation. SER is estimated by modeling the variety of current pulses triggered by particle strikes such as neutrons from space or alpha particles using Monte Carlo method. By using proposed method, we show that SER of various latches is accurately analyzed without conducting irradiation experiments. As for the soft error...
Although the need for the exactly-once request-response interaction pattern is ubiquitous in distributed systems, making it work in practice is anything but simple. Ensuring the at-most-once part of the invocation is relatively easy. Unfortunately, the same is not true for the at-least-once guarantee, which depends on the recovery from crashes of the client, the server and the network. This is what...
This paper develops and validates a methodology to detect small, incipient faults in software systems. Incipient faults such as memory leaks slowly deteriorate the software's performance over time and if left undetected, the end result is usually a complete system failure. The proposed method combines tools from information theory and statistics: entropy and principal component analysis (PCA). The...
A systematic process for eliciting safety trigger conditions is presented. Starting from a risk analysis of the monitored system, critical transitions to catastrophic system states are identified and handled in order to specify safety margins on them. The conditions for existence of such safety margins are given and an alternative solution is proposed if no safety margin can be defined. The proposed...
Recently, even operating systems are often compromised by the attackers. Since a compromised operating system affects all the applications including security software on top of it, the integrity of the operating system should be guaranteed. However, it is difficult to monitor the operating system securely. In this paper, we propose SPE Observer, which is a framework for securely monitoring operating...
Large-scale disasters may cause simultaneous failures of many components in information systems. In the design for disaster recovery, operational procedures to recover from simultaneous component failures need to be determined so as to satisfy the time-to-recovery objective within the limited budget. For this purpose, it is beneficial to identify the minimal unacceptable combination of component failures...
This paper proposes a model checking-based approach to verification of asynchronous consensus algorithms, an important class of distributed fault-tolerant algorithms. The proposed approach can be used to verify these algorithms against agreement, which is the key safety property of this class of algorithms. A consensus algorithm typically has runs of unbounded length and unbounded queues or sets of...
We consider the problem of finding an allocation of program modules to computing nodes in a network. The objective of this problem is to maximize the probability of successfully executing these modules. Nodes and links of the network are assumed to be subject to failures. We propose an algorithm for this problem which uses Binary Decision Diagrams (BDDs) extensively. BDDs have been used as a powerful...
This paper presents diagnosis methods for bridging faults between a clock line and a gate signal line. Scan-based simulation methods are applied while assuming that only scan-based flush tests are used. In view of the fact that initial states play an important role, we consider two possible scenarios: 1) all flip-flops are assumed to be reset table, and 2) flip-flops are not reset table. In order...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.