Lam et al., 1993 - Google Patents
Managing Failures in Distributed SystemsLam et al., 1993
- Document ID
- 15004573703044059638
- Author
- Lam K
- Chung S
- Publication year
- Publication venue
- Europäischer Informatik Kongreß Architektur von Rechensystemen Euro-ARCH’93: München, 18.–19. Oktober 1993
External Links
Snippet
This paper discusses the problems we experienced with the use of a general purpose distributed operating system. It describes the problems with our approach to system availability, and identifies the major problem that makes the task of improving the availability …
- 238000000034 method 0 description 88
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant details of failing over
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2035—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network-specific arrangements or communication protocols supporting networked applications
- H04L67/10—Network-specific arrangements or communication protocols supporting networked applications in which an application is distributed across nodes in the network
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4204769B2 (en) | System and method for handling failover | |
US7657580B2 (en) | System and method providing virtual applications architecture | |
US5826021A (en) | Disconnected write authorization in a client/server computing system | |
US8856091B2 (en) | Method and apparatus for sequencing transactions globally in distributed database cluster | |
US20030187927A1 (en) | Clustering infrastructure system and method | |
CA2284376C (en) | Method and apparatus for managing clustered computer systems | |
US5748882A (en) | Apparatus and method for fault-tolerant computing | |
US10261872B2 (en) | Multilevel disaster recovery | |
US7366742B1 (en) | System and method for distributed discovery and management of frozen images in a storage environment | |
US20070061379A1 (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster | |
US11550820B2 (en) | System and method for partition-scoped snapshot creation in a distributed data computing environment | |
WO2003094016A1 (en) | High availability application view deployment | |
Huang et al. | NT-SwiFT: Software implemented fault tolerance on Windows NT | |
JP2000194678A (en) | Highly available asynchronous I / O in a cluster computer system | |
Narasimhan et al. | Strongly consistent replication and recovery of fault-tolerant CORBA applications | |
Narasimhan et al. | Strong replica consistency for fault-tolerant CORBA applications | |
Ren et al. | An adaptive algorithm for tolerating value faults and crash failures | |
CA2619778C (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring | |
Lam et al. | Managing Failures in Distributed Systems | |
Steggles et al. | The implementation of a distributed framework to support ‘Follow Me’applications | |
CN112069160A (en) | CAP (CAP phase error) data based cleaning synchronization method | |
Welch et al. | A resilient application-level failure detection system for distributed computing environments | |
Tadepalli | GEMS: A fault tolerant grid job management system | |
Lam et al. | Designing a system infrastructure for distributed programs | |
Endler et al. | Supporting distributed application management in Sampa |