Sebepou et al., 2011 - Google Patents
Cec: Continuous eventual checkpointing for data stream processing operatorsSebepou et al., 2011
View PDF- Document ID
- 2856896798677692507
- Author
- Sebepou Z
- Magoutis K
- Publication year
- Publication venue
- 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN)
External Links
Snippet
The checkpoint roll-backward methodology is the underlying technology of several fault- tolerance solutions for continuous stream processing systems today, implemented either using the memories of replica nodes or a distributed file system. In this scheme the …
- 238000011084 recovery 0 abstract description 53
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant details of failing over
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1471—Saving, restoring, recovering or retrying involving logging of persistent data for recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1658—Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network-specific arrangements or communication protocols supporting networked applications
- H04L67/10—Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sebepou et al. | Cec: Continuous eventual checkpointing for data stream processing operators | |
Noghabi et al. | Samza: stateful scalable stream processing at LinkedIn | |
US10474471B2 (en) | Methods and systems for performing a replay execution | |
Mehdi et al. | I {Can’t} Believe {It’s} Not Causal! Scalable Causal Consistency with No Slowdown Cascades | |
CN109101627B (en) | Heterogeneous database synchronization method and device | |
Wang et al. | Lineage stash: fault tolerance off the critical path | |
Basescu et al. | Robust data sharing with key-value stores | |
EP2572279B1 (en) | Method and system for achieving fault tolerant checkpointing across remote virtual machines | |
Goodhope et al. | Building LinkedIn's Real-time Activity Data Pipeline. | |
Junqueira et al. | Zab: High-performance broadcast for primary-backup systems | |
US8898509B2 (en) | Policy-based checkpointing fault tolerance across remote virtual machines | |
US9514208B2 (en) | Method and system of stateless data replication in a distributed database system | |
US8132043B2 (en) | Multistage system recovery framework | |
Brito et al. | Scalable and low-latency data processing with stream mapreduce | |
EP3910476A1 (en) | Event batching, output sequencing, and log based state storage in continuous query processing | |
US12086037B2 (en) | Scalable low-loss disaster recovery for data stores | |
Wester et al. | Tolerating Latency in Replicated State Machines Through Client Speculation. | |
van Dongen et al. | A performance analysis of fault recovery in stream processing frameworks | |
Zaharia et al. | Discretized streams: A fault-tolerant model for scalable stream processing | |
Martin et al. | User-constraint and self-adaptive fault tolerance for event stream processing systems | |
Wang et al. | A comprehensive study on fault tolerance in stream processing systems | |
Cui et al. | {HotRestore}: A Fast Restore System for Virtual Machine Cluster | |
Shen et al. | OPCAM: Optimal algorithms implementing causal memories in shared memory systems | |
CN115221240A (en) | Data processing method and device, electronic equipment and computer storage medium | |
Gomes et al. | Railgun: managing large streaming windows under MAD requirements |