WO2002001347A2 - Method and system for automatic re-assignment of software components of a failed host - Google Patents
Method and system for automatic re-assignment of software components of a failed host Download PDFInfo
- Publication number
- WO2002001347A2 WO2002001347A2 PCT/SE2001/001448 SE0101448W WO0201347A2 WO 2002001347 A2 WO2002001347 A2 WO 2002001347A2 SE 0101448 W SE0101448 W SE 0101448W WO 0201347 A2 WO0201347 A2 WO 0201347A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- host
- hosts
- monitoring
- component
- monitored
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2035—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
Definitions
- the present invention relates to networked and co-operating hosts, and particularly to a method and system for re-assigning a failed host original software components to at least one co-operating host.
- Computers have greatly evolved over the last half a century for becoming today a necessity in many areas of technology.
- Various activities are nowadays exclusively performed by computers, which allows greater and more reliable performance of tasks previously performed by humans.
- one dependable manner of proceeding is to assign specific task(s) to one particular computer and to link a number of computers in a computers' network.
- specific software applications may be run on particular computers for performing specific tasks.
- the computers may be networked, so that the computers' applications can communicate with each other, as initially setup by an operator, for achieving the desired final result.
- each computer, or host may be assigned a number of tasks, i.e. it is only that particular host that performs those tasks. Thereafter, the particular host, connected in a hosts' network, may have to share its processed information with other co-operating hosts.
- the networked hosts may be linked in "cascade", i.e. each host performs its tasks on the input information received from another host and then outputs the processed information to the next host in the "cascade" network. Failure of only one host in the cascade network results in the overall failure of the network of computers.
- failure of one host that performs tasks which output is essential to the other hosts' operation may result in critical faults being caused in the overall network, and/or in the incapacity of the hosts' network in accomplishing its global task.
- the typical prior art solution to the problem described hereinbefore is to send a technical operator for manually solve the host failure.
- various means are typically utilized for locating the problematic host, and a technician takes care for replacing any failed devices, if any, and/or to put the host in normal running condition.
- this solution usually takes significant time, and creates long periods of unavailability (downtime) of the hosts' network.
- Gerstel et al. are limited to a method and system for solving a link fault and they fail to teach or suggest how to manage a host failure in a network environment. It- would be advantageous to have a method and system for allowing automatic re-distribution of the tasks performed on a particular host in case of the failure of this particular host. It would be of even greater advantage to have a method and system allowing both the re-start of a component after a host failure, and, upon recovery of the failed host, automatic re-insertion of the component at its original logical location in a network of co-operating hosts.
- a group of co-operating hosts wherein at least one monitoring host monitors the activity of at least one monitored host.
- the monitoring host Upon detection of a failure of the monitored host, the monitoring host informs a Central Information Repository (CIR) of the failure of the monitored host.
- CIR Central Information Repository
- the CIR that may be physically a distributed database but is preferably logically centralized, further informs at least one back-up host, that may be another monitoring host, and the components that failed on the monitored host are re-started on the back-up host.
- the back-up hosts may be the same as the monitoring hosts, and in this particular case the failed components are restarted on the monitoring hosts.
- CM Component Manager
- It is yet another object of the invention to provide a network of co- operating hosts comprising: a monitored host running at least one software component; one or more monitoring hosts for monitoring an activity of said monitored host; one or more back-up hosts, each one of said back-up hosts comprising a Component Manager (CM), and at least one installed component; wherein when a failure occurs in said monitored host, said one or more monitoring hosts detect said failure and start said at least one software component on at least one of said back-up hosts.
- CM Component Manager
- CIR Central Information Repository
- CM Component Manager
- Figure a is a top level block diagram of a network of co-operating hosts according to an exemplary prior art implementation
- Figure 1.b is a top level block diagram of an Event Management System (EMS) according to an exemplary prior art implementation
- Figures 2 (a, b, and c) are high-level block diagram illustrating an exemplary preferred embodiment of the invention
- Figure 3 is a nodal operation and signal flow diagram illustrating an exemplary preferred embodiment of the invention.
- FIG. 4 is a high-level flowchart of another exemplary preferred embodiment of the invention.
- Figure 5 is a nodal operation and signal flow diagram illustrating yet another exemplary preferred embodiment of the invention.
- FIG. 1a wherein there is shown is a top level block diagram of a network 10 of co-operating hosts according to an exemplary prior art implementation.
- Hosts 12, 14, 16 and 18 (A, B, C and D) are linked through a network 20. It is understood for the purpose of the present example that all hosts 12-18 comprise the necessary network interfaces (not shown), i.e. network card, network connections, network software applications, etc, that allow each one to be appropriately connected to each other, through the network 20.
- Each host comprises an Operating System (OS), not shown, which supports various software applications, hereinafter called Components.
- OS Operating System
- Figure 1.a for example, Host A, in its Enabled (up and running) state runs three components C A1 , C A2 , and C A3 , Host B runs another three components C B1 , C B2 , and C B3 , Host C runs two components C C1 , and C C2 , while Host D runs a single component C D1 .
- the illustrated components are all part of a distributed application which runs onto the different hosts 12- 18. Therefore, the activity of the components is interrelated, some components activity being dependent upon other components proper output.
- EMS EMS 30 in charge of monitoring a network 32.
- the monitored network 32 may by any kind of network, such as for example a Local Area Network (LAN) over Ethernet, an Internet Protocol (IP) network, or a Public Local Mobile Network (PLMN).
- LAN Local Area Network
- IP Internet Protocol
- PLMN Public Local Mobile Network
- each network may have an associated EMS which is used by the network operator in charge of that network to monitor the network activity.
- the typical tasks of an EMS are to collect the events originating from the monitored network 32, to process the events (conversion, treatment, classification, etc.), to store the events and finally to display the events, in particular formats, onto network administrators' monitoring means.
- the Host 34 runs a component 36 dedicated to collecting (trapping) the events issued by the monitored network 32 via the Gateway 33.
- the component 36 traps the events and further outputs the events flow 38 to host 40 running component 42 which is dedicated to converting the incoming flow of events 38 into a user-friendly formatted events flow, 44.
- the information 44 is then sent to host 46 running a database component 48 for storing the event-related information 44.
- hosts 50, 52, 54, and 56 respectively run individual components (not shown) which are dedicated to the display of the event-related information 48, which may be accessed on a by-request basis.
- the global task of the EMS 30 is interrupted.
- One partial remedy known in the prior art to the aforementioned problem is to "duplicate" a host with a "mirror” having identical configuration.
- host 34 running component 36 that is dedicated to collecting the events from the network 32.
- a "mirror" host 34' running the same component as host 34, namely 36', may be incorporated in the EMS 30 and run in stand-by mode. If a failure is detected in host 34, then the stand-by host 34' takes over and assumes the tasks of the failed host 34.
- this solution implies duplicating each host of the network, thus doubling the costs of hardware equipment, while half of this equipment (the stand-by hosts) is only used in critical situations.
- FIG. 2 shows a high-level block diagram of an exemplary preferred embodiment of the present invention.
- Hosts 80, 82, 84, 86 and 88 are all connected to a network (not shown) and function in co-operation with each other.
- Each one of the hosts 80-88 runs at least one component, such as for example component 90 (C B1 ) for host 82 (B), that is typically a software application responsible for performing one or more particular tasks.
- the components running on the various hosts 80-88 may be in quasi-permanent communication with each other, in a by-request communication, or in any other known type of communication wherein information must be transmitted from one host to another.
- a Monitoring Partnership Program is implemented among at least two networked hosts that co-operate for achieving a global task.
- the participating hosts reciprocally monitor each other's activity and, upon detection of a fault, error, malfunction or other unavailability of a particular host, the fault or the like of a monitored host is detected by a co-operating monitoring host, the unavailable components that were running before the occurrence of the fault on the monitored host are detected as well and are started onto the partner monitoring host(s).
- Figure 2.a there is shown an exemplary high-level block diagram of a hosts' network in its normal operation wherein five different hosts (80-88) run various components C xi that inter-communicate with each other.
- CM Component Manager
- each monitoring host such as host 86 (D) also comprises a Library of Components (LC) 101 comprising the components C xi 103 t - running on the monitored hosts (such as the components 110 and 112 of the monitored host 84 (C)).
- LC Library of Components
- the LC 101 may be the same for all the network co-operating hosts, in which case it comprises the installed components 103; of all the co-operating hosts running in the network, or may be unique for each monitoring host, in which case it may comprise, besides the components naturally running on the particular host, only components 103 f that are running on the host(s) monitored by the monitoring host. It is to be understood that although the CL 101 and the installed components 103 ; are only represented for the monitoring host 86 (D), all the monitoring hosts, or even all the hosts may comprise such an LC 101. Furthermore, LC list 101 must not necessarily comprise the full version of the installed components 103 ; , but may alternatively comprise only a portion thereof that, when activated, can automatically contact a central server for performing the full download, and start of a particular component.
- host 84 fails. This may be caused by various sorts of problems, such as a power outage, a physical accident, a memory corruption, a crush of the OS or others.
- problems such as a power outage, a physical accident, a memory corruption, a crush of the OS or others.
- the CMs 102 and 106 of hosts According to the MPP, the CMs 102 and 106 of hosts
- CMs 102 and 106 may inquire of the identity of the components that were running on the host 84 (C) before the failure. This action (request) may be addressed to a Central Information Repository (CIR, not shown), such as to an
- LDAP server that has knowledge of the network topology, and of the particular components assigned and run onto each host.
- the request for the identity of the failed components may be performed toward any one of the hosts that may have knowledge of the network-related information, or purely skipped if each hosts has knowledge of the network-related information.
- the co-operating CMs 102 and 106 of the monitoring hosts may divide the responsibility of starting and running the failed components 110 and 112 according to a pre-defined scheme that will be discussed later in this document.
- the "displaced" components 110' and 112' of Figure 2.b are responsible for reinserting themselves at their original logical location by making the required original logical connections 89', 91', 93', 95', and 97 ' and synchronization. .
- the components 110 and 112 were initially running on host 84 (C), as shown in Figure 2. a, at their original logical location.
- host 84 (C) fails, as shown in Figure 2.b, the components 110 and 112 are "displaced" on host 82 and 86 respectively, i.e.
- each host participating to the MPP must comprise, or alternatively have access to copies of the components (software applications 103;) of its co-operating monitored hosts.
- host 82 (B) may comprise among its installed components 103 ; the component 110', which is started and run in the present example upon detection of the failure of host 84 (C).
- the back-up (monitoring) hosts participating to the MPP may i) contact the CIR 140 upon detection of a failed monitored host, download from the CIR 140 the required components that need to be re-started, and re-start these components, or, ii) may comprise a portion of the copy of the failed component(s), that may be activated upon detection of the failure of the original components, and that may further take care of the full download from the CIR 140 of the remaining portion of the component copies, which are then re-started on the back-up (monitoring) hosts.
- Figure 2.c wherein there is shown the scenario of the recovery of host 84 (C).
- the substitute components 110' and 112 ' are stopped by the CMs 102 and 106, and the original components 110 and 112 (i.e. their respective original copies) are started on host 84 (C) by the CM 104.
- the newly started components 110 and 112 of Figure 2.c are responsible for re-inserting themselves at their original logical location by making the required original logical connections 89, 91, 93, 95, and 97 and synchronization. This may be achieved by providing the components with information regarding with which other component it must communicate or alternatively and preferably, the newly started components can get this information from the CIR 140.
- FIG. 3 is a nodal operation and signal flow diagram of an exemplary preferred embodiment of the invention showing a possible actual implementation of an MPP with three hosts, wherein the activity of the (monitored) host 84 (C) is set to be supervised by the co-operating (monitoring) hosts 82 (B) and 86 (D).
- the activity of the (monitored) host 84 (C) is set to be supervised by the co-operating (monitoring) hosts 82 (B) and 86 (D).
- each host 82-86 comprises a Component Manager (CM) 102- 106, responsible for managing the Components running on the respective host.
- CM Component Manager
- the CM 102 of host 82 B
- the CM 104 of host 84 controls the running components 124, 126 and 128, while the CM 106 of host 86
- the network controls the running components 130 and 132.
- the network also comprises a Central Information Repository (CIR) 140, which is preferably a centralized or distributed LDAP server, that may contain a Component List (CL) 142 and a Component Manager List (CML) 144.
- the CL 142 comprises a plurality of Component records 146 ( containing information about the hosts' components, each component record having a field Preferred Host Name 146 ;.PHN for holding the identity of the host naturally running the component, and a field Actual Host Name 146; . , ⁇ , for holding the identity of the actual host running the component (in case of unavailability of the preferred host).
- the CML 144 preferably comprises a record 148; for each CM 102-106, andeachrecord 148; further comprises a field Monitored Hosts 148 ;.MH for holding the identity of the hosts monitored by each CM according to the MPP, and a field Operation State Attribute 148 ;.OSA for holding the status of the host's CM, such as "Enabled” when one particular host and its CM is up and running, or "Disabled" when the particular host and its CM is down or otherwise not available.
- a critical error occurs in host 84 (C) such that the host becomes unavailable.
- the host 84 (C) can alternatively become unavailable for any other reason.
- the partner hosts 82 (B) and 86 (D) monitor the activity of host 84 (C). This may be achieved for example by regularly sending a heartbeat request signal 152 from the monitoring hosts 82 and 86 to the monitored host 84 (C).
- a heartbeat request signal 152 assumed to be sent by host 82 (B)
- the host 84 (C) were to be enabled (i.e. up and running)
- the host 84 (C) is unavailable and the heartbeat response signals is not sent back to host 82 (B).
- host 82 (B) detects the absence of heartbeat response signal (ex.: timer timeout) and deduces that the host C is unavailable.
- timer timeout the absence of heartbeat response signal
- the detection of the unavailability of host 84 (C) can be also detected by other particular signaling implementations. For example, signal 152 may be skipped and host 84 may be set-up to regularly signal its Enabled state to its co-operating hosts according to the MPP. Failure to do so would result in the conclusion for its partner monitoring host(s) that the monitored host is unavailable. Other error messages may be used as well.
- a notification of unavailability 156 is sent from the CM 102 of host 82 to the CIR 140.
- the notification 156 may comprise the new state "Disabled" of the host 84
- the CIR 140 modifies the operational state attribute field 148 ; . OSA of the CM 148 ; corresponding to host 84 (C), from "Enabled” to "Disabled", action 158, in order to reflect the unavailable state of host 84 (C). Thereafter, the CIR 140 locates in the CML 144, using the field Monitored Hosts 148 ; . MH of the records 148 ; if other CMs of other hosts, are responsible of the failed host 84 (C).
- host 86 (D) is detected in action 160 as being also responsible of the failed host 84 (C).
- the CIR 140 further informs the CM 106 of host 86 that host 84 (C) became unavailable, by sending an indication, action 162. It is to be noted that in the particular scenario wherein only one monitoring host is responsible of a monitored host that failed, action 160 returns no other CM's identity and therefore action 162 is skipped.
- each one of the monitoring CMs 102 and 106 running on hosts 82 (B) and 86 (D) are aware that host 84 (C) is unavailable, and in actions
- the CIR 140 uses the host C identity for extracting from the CL 142 each component identity whose Actual Host Name entry of field 146 ⁇ matches the identity of host 84 (C), action 168, and returns this information (a list of components, 173) to the CMS 102 and 106 of the monitoring hosts 82 (B) and 86 (C) in actions 171 and 172.
- action 168 can be separately performed for example twice, after individual receipt of messages 164 and 166.
- the CMs 102 and 106 select which components (from the components list, 173) each one is to take care of, and take also the responsibility of monitoring the hosts previously monitored by host 84 (C), in a manner that is yet to be described.
- CM 102 of the host 82 (B) is assigned the responsibility of starting and running components 124 and 126, while the responsibility of starting and running component 128 is assigned to the CM 106 of host 86 (D).
- the monitoring hosts 82 (B) and 86 (D) are also the ones that inherit, after the failure of the monitored host 84 (C) of the responsibility of monitoring the hosts previously monitored by host 84 (C). Therefore, in action 176, the CM 102 starts the installed components 103; corresponding to the failed components 124 and 126 that becomes the running components 124' and 126', not shown, which are copies of software applications identical to components 124 and 126 that became unavailable on host 84 (C), with the difference that they are launched on host 82 (B).
- the CM 106 starts the installed component 103 ; that corresponds to the failed component 128, that becomes the components 128', which is the same software application as component 128, with the difference that it is launched on host 86 (D).
- each newly launched component 124', 126' on host 82 (B) and component 128' on host 86 (D) is activated in actions 180 and 182 respectively, by establishing the logical connections with other co-operating components from within the same host, or from the other networked hosts.
- the started components themselves may have the responsibility and the capacity of establishing the logical connections with their respective cooperating components.
- the CM 102 of host 82 (B) Upon receipt by the CM 102 of the monitoring host 82 (B) of the component list 173 (the same procedure applies to host 86 (D) as well but for simplification purposes will be only described in relation to the host 82 (B)), the CM 102 of host 82 (B) takes over the responsibility of monitoring the hosts previously monitored by host 84 (C). This may be achieved by updating the record 148;. MH of the monitoring host 82 (B) for further including the hosts previously monitored by host 84 (C). Thereafter, the CM 102 of host 82 (B) selects one component from the list 173, such as for example the first component (component 124) from the list, action 200. Alternatively, the selection of the components from the list 173 may be made randomly, or according to other logic as believed appropriate and implemented by the network operator.
- the CIR 140 is then queried and the Actual Host Name entry corresponding to the selected component 124 is obtained, action 202. At the same time, a Lock Record Action is performed for this particular component's record
- the Actual Host Name Entry obtained in action 202 is compared with the identity of Host C (previously obtained in action 154 or action 162), action 204. If the comparison is a perfect match, i.e. the Actual Host Name Entry obtained in action 202 is the same with the identity of the failed monitored Host C, it is deduced that in the meantime no other monitoring host (such as the partner monitoring host 86 (D)) has already taken charge of this component, and therefore an update is performed, action 206, from the monitoring host 82 (B) to the CIR 140, for changing the Actual Host Name entry in the field 146;.
- the partner monitoring host 86 (D) has already taken charge of this component
- Action 206 may comprise an update request being sent from the CM 102 to the CIR 140, the actual update at the CIR and an update acknowledge being sent back from the CIR to the CM 102. Thereafter, the CM 102 requests a Lock Release for the record 146;, and the CIR releases the Lock, action 208.
- the CM 102 deletes the selected component from the list of remaining components 173, action 209, and writes or keeps in its memory the selected component identity, action 210, that allows it to continue with subsequent actions (176 and 180) shown in Figure 3 and described beforehand for this particular component. It is to be noted that the order of actions 209 and 210 can be inverted. With reference being made back to action 204 of Figure 4, if the result of the comparison is not a perfect match, i.e.
- the failed components selection of action 174 may be performed according to a pre-determined arrangement wherein particular hosts can automatically be assigned the responsibility of certain failed components. For example, it could be predetermined that in case of failure of host 84 (C), components 124 and 126 would be re-assigned to host 82 (B) while component 128 would be assigned to host 86 (D) without performing the decisional sequence of Figure 4. This pre-determined information may be stored in the CIR 140 and transmitted to the monitoring hosts, or in the monitoring hosts 82 (B) and 86 (D) themselves.
- FIG. 5 is a nodal operation and signal flow diagram showing the sequence of actions performed upon recovery of the host 84 (C).
- action 300 host 84 (C) recovers after a period of unavailability, and its CM 104 starts and becomes Enabled and running.
- the host 84 (C) is "up and running" its CM 104 becomes aware that there are no components running and, following action 300, it signals the CIR 140 and queries for the identity of the components it should be running, action 302. This may be achieved by sending its host identity along with a request for components.
- the CIR 140 may extract from the CL 142 the identity of the components that would preferably run on host 84 (C), action 303, by consulting for example the Preferred Host name field 146 ;.PHN of the component records 146 ; of list 142. All the components whose entry of field 146 ;.PHN matches the host 84 (C) identity are returned in a component list 304 of components that would be preferably run on host 84 (C), in action 306.
- the host 84 (C) starts each of the components of the list 304 one at a time, such as for example component 124 in action 308, and for each such component performs the following actions.
- the CM 104 sends a request for update of the list 142 to the CIR 140, by including in the request the component identity (ex. Component 124's identity).
- a Lock Record is performed on the record 146; of the component 124 and the entry of the field 146, .AHN is read, action 314.
- the Actual Host Name entry read in action 314 is returned to the host 84 (C) in action 316.
- the host 84 (B) since it was host 82 (B) that temporarily took charge of component 124, it is host B' identity that is returned in action 316.
- the CM 104 of the host 84 (C) Upon receipt of the message 326 that confirms that host 82 (B) shut down the component 104, the CM 104 of the host 84 (C) sends an Update Actual Host Name request message, action 328, to the CIR 140 for requesting the change of the record 146 i5 particularly of the field MOI.
- AHN corresponding to the component 124 from host 82 (B) to host 84 (C) in order to reflect that host 84 (C) took over the responsibility for running the selected component.
- the CIR 140 Upon receipt of the Update Actual Host Name request message, the CIR 140 proceeds to the update of field 146 ; , action 330, releases the lock on the record field 146 ; , action 332, and returns back to the CM 104 of the host 84 (C) an update acknowledgement, action 334.
- action 336 i.e. the component 124 proceeds to its insertion at its natural logical location by communicating with its co-operating components and by establishing the required connections.
- action 336 may also comprise a certain synchronization of the data status of the newly started component 124 with the component 124'. For example, the old data status of component 124' running on host 82 (B) may be read in action 322 before shutting down the component 124', and may be transmitted to the CM 104 of host 84 (C) in action 326 along with the Release acknowledge message.
- the CM 104 of host 84 (C) may use the old data status read from component 124' for synchronizing the newly started component 124, i.e. the old data status read from component 124' would become the new data status of the newly started component 124.
- Actions 320-334 are preferably only performed if the entry in the filed 146 ;.AHN (the actual host name entry returned in action 316) is different from the host 84 (C) identity, i.e. only if another monitoring host did actually temporarily took charge of the component 124 (exceptions may occur such as for example in the case of a resources overload of the monitoring host).
- the actions 320-334 are therefore preferably skipped.
- At least one of, or both the hosts 82 (B) and 86 (D) may not be monitoring hosts, but rather only assume the function of re-starting the failed components of the monitored host 84 (C).
- the function of monitoring the status of the monitored host 84 (C) may be assign to one or more host(s) different from the host(s) whose function is to re-start the failed component(s).
- host 80 (A) may be the monitoring host of host 84 (C), and may first detect the failure of the monitored host 84 (C).
- the back-up hosts 82 (B) and 86 (D) that have the responsibility of restarting and running the failed components of the monitored host as described hereinbefore with reference to Figures 3, 4 and 5, with the exception that the monitoring host that first detects the failure of the monitored host, action 154, may be different from the hosts 82 (B) and 86 (D) that actually re-start the failed components 124', 126', and 128', actions 176-182.
- the back-up hosts may be any type of co-operating host that has installed copies of the software components of its corresponding monitored host, or have access to these copies such as for example from the CIR 140.
- the CIR 140 can be any type of unified or distributed database application, such as for example a centralized or distributed LDAP server.
- the CIR 140 is an LDAP server
- advantage can be obtained from the particular functionalities of LDAP.
- some notifications such as but not limited to actions 160 and 162 can be automated by placing a "notification request upon change" request in the LDAP sever regarding the Operational State Attribute of Host C.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2001266503A AU2001266503A1 (en) | 2000-06-30 | 2001-06-21 | Method and system for automatic re-assignment of software components of a failedhost |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60911100A | 2000-06-30 | 2000-06-30 | |
US09/609,111 | 2000-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002001347A2 true WO2002001347A2 (en) | 2002-01-03 |
WO2002001347A3 WO2002001347A3 (en) | 2002-06-20 |
Family
ID=24439380
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SE2001/001448 WO2002001347A2 (en) | 2000-06-30 | 2001-06-21 | Method and system for automatic re-assignment of software components of a failed host |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2001266503A1 (en) |
WO (1) | WO2002001347A2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006042775A2 (en) | 2004-10-15 | 2006-04-27 | Siemens Aktiengesellschaft | Method and device for redundancy control of electrical devices |
US7055052B2 (en) | 2002-11-21 | 2006-05-30 | International Business Machines Corporation | Self healing grid architecture for decentralized component-based systems |
WO2005008498A3 (en) * | 2003-07-21 | 2006-08-17 | Symbium Corp | Embedded system administration |
US7200781B2 (en) | 2003-05-14 | 2007-04-03 | Hewlett-Packard Development Company, L.P. | Detecting and diagnosing a malfunctioning host coupled to a communications bus |
US7676621B2 (en) | 2003-09-12 | 2010-03-09 | Hewlett-Packard Development Company, L.P. | Communications bus transceiver |
US8140677B2 (en) | 2002-11-21 | 2012-03-20 | International Business Machines Corporation | Autonomic web services hosting service |
US8302100B2 (en) | 2000-01-18 | 2012-10-30 | Galactic Computing Corporation Bvi/Bc | System for balance distribution of requests across multiple servers using dynamic metrics |
US8316131B2 (en) | 2000-11-10 | 2012-11-20 | Galactic Computing Corporation Bvi/Bc | Method and system for providing dynamic hosted service management across disparate accounts/sites |
WO2013019339A1 (en) * | 2011-08-01 | 2013-02-07 | Alcatel Lucent | Hardware failure mitigation |
US8429049B2 (en) | 2000-07-17 | 2013-04-23 | Galactic Computing Corporation Bvi/Ibc | Method and system for allocating computing resources |
US8489741B2 (en) | 2002-11-21 | 2013-07-16 | International Business Machines Corporation | Policy enabled grid architecture |
US8555238B2 (en) | 2005-04-15 | 2013-10-08 | Embotics Corporation | Programming and development infrastructure for an autonomic element |
CN114020512A (en) * | 2021-11-03 | 2022-02-08 | 中国工商银行股份有限公司 | Background task processing method and device, computer equipment and storage medium |
US12271867B1 (en) | 2020-02-10 | 2025-04-08 | State Farm Mutual Automobile Insurance Company | Predicting resource lifecycles and managing resources in enterprise networks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000010822A (en) * | 1998-06-25 | 2000-01-14 | Yokogawa Electric Corp | Distributed object down detector |
US6195760B1 (en) * | 1998-07-20 | 2001-02-27 | Lucent Technologies Inc | Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network |
US6202149B1 (en) * | 1998-09-30 | 2001-03-13 | Ncr Corporation | Automated application fail-over for coordinating applications with DBMS availability |
-
2001
- 2001-06-21 WO PCT/SE2001/001448 patent/WO2002001347A2/en active Application Filing
- 2001-06-21 AU AU2001266503A patent/AU2001266503A1/en not_active Abandoned
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8302100B2 (en) | 2000-01-18 | 2012-10-30 | Galactic Computing Corporation Bvi/Bc | System for balance distribution of requests across multiple servers using dynamic metrics |
US8429049B2 (en) | 2000-07-17 | 2013-04-23 | Galactic Computing Corporation Bvi/Ibc | Method and system for allocating computing resources |
US8538843B2 (en) | 2000-07-17 | 2013-09-17 | Galactic Computing Corporation Bvi/Bc | Method and system for operating an E-commerce service provider |
US8316131B2 (en) | 2000-11-10 | 2012-11-20 | Galactic Computing Corporation Bvi/Bc | Method and system for providing dynamic hosted service management across disparate accounts/sites |
US7055052B2 (en) | 2002-11-21 | 2006-05-30 | International Business Machines Corporation | Self healing grid architecture for decentralized component-based systems |
US8140677B2 (en) | 2002-11-21 | 2012-03-20 | International Business Machines Corporation | Autonomic web services hosting service |
US8489741B2 (en) | 2002-11-21 | 2013-07-16 | International Business Machines Corporation | Policy enabled grid architecture |
US7200781B2 (en) | 2003-05-14 | 2007-04-03 | Hewlett-Packard Development Company, L.P. | Detecting and diagnosing a malfunctioning host coupled to a communications bus |
WO2005008498A3 (en) * | 2003-07-21 | 2006-08-17 | Symbium Corp | Embedded system administration |
US8661548B2 (en) | 2003-07-21 | 2014-02-25 | Embotics Corporation | Embedded system administration and method therefor |
US7725943B2 (en) | 2003-07-21 | 2010-05-25 | Embotics Corporation | Embedded system administration |
EP2317440A2 (en) | 2003-07-21 | 2011-05-04 | Embotics Corporation | Embedded system administration and method therefor |
US7676621B2 (en) | 2003-09-12 | 2010-03-09 | Hewlett-Packard Development Company, L.P. | Communications bus transceiver |
WO2006042775A2 (en) | 2004-10-15 | 2006-04-27 | Siemens Aktiengesellschaft | Method and device for redundancy control of electrical devices |
WO2006042775A3 (en) * | 2004-10-15 | 2007-02-08 | Siemens Ag | Method and device for redundancy control of electrical devices |
US8555238B2 (en) | 2005-04-15 | 2013-10-08 | Embotics Corporation | Programming and development infrastructure for an autonomic element |
WO2013019339A1 (en) * | 2011-08-01 | 2013-02-07 | Alcatel Lucent | Hardware failure mitigation |
CN103718535A (en) * | 2011-08-01 | 2014-04-09 | 阿尔卡特朗讯公司 | Hardware failure mitigation |
JP2014522052A (en) * | 2011-08-01 | 2014-08-28 | アルカテル−ルーセント | Reduce hardware failure |
US8856585B2 (en) | 2011-08-01 | 2014-10-07 | Alcatel Lucent | Hardware failure mitigation |
KR101504882B1 (en) * | 2011-08-01 | 2015-03-20 | 알까뗄 루슨트 | Hardware failure mitigation |
US12271867B1 (en) | 2020-02-10 | 2025-04-08 | State Farm Mutual Automobile Insurance Company | Predicting resource lifecycles and managing resources in enterprise networks |
CN114020512A (en) * | 2021-11-03 | 2022-02-08 | 中国工商银行股份有限公司 | Background task processing method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2002001347A3 (en) | 2002-06-20 |
AU2001266503A1 (en) | 2002-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4307673B2 (en) | Method and apparatus for configuring and managing a multi-cluster computer system | |
WO2017177941A1 (en) | Active/standby database switching method and apparatus | |
JP4204769B2 (en) | System and method for handling failover | |
US9785691B2 (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster | |
US6145089A (en) | Server fail-over system | |
WO2002001347A2 (en) | Method and system for automatic re-assignment of software components of a failed host | |
US20090049054A1 (en) | Method and apparatus for sequencing transactions globally in distributed database cluster | |
CN111427728B (en) | State management method, main/standby switching method and electronic equipment | |
CN102394914A (en) | Cluster brain-split processing method and device | |
WO2007038617A2 (en) | Methods and systems for validating accessibility and currency of replicated data | |
CN112506702A (en) | Data center disaster tolerance method, device, equipment and storage medium | |
CN115145782A (en) | A server switching method, MooseFS system and storage medium | |
JP3887130B2 (en) | High availability computer system and data backup method in the same system | |
CN115277379B (en) | Distributed lock disaster recovery processing method and device, electronic equipment and storage medium | |
CN108173971A (en) | A kind of MooseFS high availability methods and system based on active-standby switch | |
CN108600284B (en) | Ceph-based virtual machine high-availability implementation method and system | |
CN115878361A (en) | Node management method and device for database cluster and electronic equipment | |
CN118018463A (en) | Fault processing method, device, equipment and readable storage medium | |
US20040024807A1 (en) | Asynchronous updates of weakly consistent distributed state information | |
CN113765690A (en) | Cluster switching method, system, device, terminal, server and storage medium | |
CN108900331A (en) | A kind of distributed type assemblies management method and distributed type assemblies | |
CN111752488A (en) | Management method and device of storage cluster, management node and storage medium | |
CN113868022A (en) | Master-slave switching method and device for database | |
CN114036129A (en) | A Database Switching Method to Reduce Data Loss | |
JP2001022627A (en) | Database synchronization method and method between multiple devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |