Wester et al., 2009 - Google Patents

Tolerating Latency in Replicated State Machines Through Client Speculation.

Wester et al., 2009

Document ID: 8926955047835540443
Author: Wester B; Cowling J; Nightingale E; Chen P; Flinn J; Liskov B
Publication year: 2009
Publication venue: NSDI

External Links

Cited by

Snippet

Replicated state machines are an important and widely-studied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the …

Continue reading at www.usenix.org (HTML) (other versions)

238000000034 method 0 abstract description 21

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant details of failing over
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1658—Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30575—Replication, distribution or synchronisation of data between databases or within a distributed database; Distributed database system architectures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/835—Timestamp

Similar Documents

Publication	Publication Date	Title
Wester et al.	2009	Tolerating Latency in Replicated State Machines Through Client Speculation.
Mehdi et al.	2017	I {Can’t} Believe {It’s} Not Causal! Scalable Causal Consistency with No Slowdown Cascades
US10152398B2 (en)	2018-12-11	Pipelined data replication for disaster recovery
Almeida et al.	2013	ChainReaction: a causal+ consistent datastore based on chain replication
Nightingale et al.	2005	Speculative execution in a distributed file system
Marandi et al.	2014	Rethinking state-machine replication for parallelism
US10176210B2 (en)	2019-01-08	System and method for minimizing lock contention
Li et al.	2016	Sarek: Optimistic parallel ordering in byzantine fault tolerance
Brandenburger et al.	2017	Don’t trust the cloud, verify: Integrity and consistency for cloud object stores
Gupta et al.	2024	Just-in-time checkpointing: Low cost error recovery from deep learning training failures
Nightingale et al.	2006	Speculative execution in a distributed file system
Guerraoui et al.	2022	{uKharon}: A Membership Service for Microsecond Applications
Aksoy et al.	2019	Aegean: replication beyond the client-server model
Shi et al.	2016	Cheap and available state machine replication
Wang et al.	2022	A comprehensive study on fault tolerance in stream processing systems
Yang et al.	2022	Natto: Providing distributed transaction prioritization for high-contention workloads
Costa et al.	2013	On the performance of Byzantine fault-tolerant MapReduce
Stewart	2013	Reliable massively parallel symbolic computing: fault tolerance for a distributed Haskell
Padilha et al.	2016	Callinicos: Robust transactional storage for distributed data structures
Bouteiller	2015	Fault-tolerant MPI
Ni	2016	Mitigation of failures in high performance computing via runtime techniques
AT&T
Abusalah et al.	2017	Dependable cloud resources with guardian
Abusalah	2021	Dependable Cloud Resources For Big-Data Batch Processing & Streaming Frameworks
Abusalah et al.	2021	Multi-Framework Reliability Approach