Deriving optimal checkpoint protocols for distributed shared memory architectures

Lorenzo Alvisi; Keith Marzullo

doi:10.1007/3-540-60042-6_8

Deriving optimal checkpoint protocols for distributed shared memory architectures

Source

Lecture Notes in Computer Science > Theory and Practice in Distributed Systems > 111-120

Abstract

Uncoordinated checkpointing is one technique used to build processes that can recover to a consistent state after crashing. This technique requires each process to periodically record its state in a checkpoint. Furthermore, the threads executing on each process log any non-deterministic action that they take following the latest checkpointed state. When a process crashes, a new process, initialized with the appropriate recorded local state, is created in its place. The new process restarts executing, and whenever one of its threads confronts a non-deterministic choice, the thread references the log in order to reproduce the same action performed before the crash. Thus, uncoordinated checkpointing implements an abstraction of a resilient process in which the crash of a process is translated into intermittent unavailability of that process.

We give a specification of the consistency property “no orphan threads” in the context of multithreaded processes running on a shared memory multiprocessor. We also give a definition of optimality for uncoordinated checkpointing protocols given a memory coherency protocol. We then use this specification to derive an existing uncoordinated checkpoint protocol and show that it is optimal. This protocol assumes that once a process crashes, no further processes crash until the first process completes recovery.

Identifiers

series ISSN :	0302-9743
series e-ISSN :	1611-3349
book ISBN :	978-3-540-60042-8
book e-ISBN :	978-3-540-49409-6
DOI	10.1007/3-540-60042-6_8

Authors

Lorenzo Alvisi

Cornell University, Department of Computer Science, Ithaca

Keith Marzullo

University of California at San Diego, Department of Computer Science and Engineering, La Jolla

Additional information

Data set: Springer

Publisher

Springer Berlin Heidelberg

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Deriving optimal checkpoint protocols for distributed shared memory architectures $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Lorenzo Alvisi

Keith Marzullo

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Deriving optimal checkpoint protocols for distributed shared memory architectures