CN102394914A

CN102394914A - Cluster brain-split processing method and device

Info

Publication number: CN102394914A
Application number: CN2011102825734A
Authority: CN
Inventors: 王婷; 张书宁
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2011-09-22
Filing date: 2011-09-22
Publication date: 2012-03-28

Abstract

The invention provides a cluster brain-split processing method and a device, relates to the field of computer technology application, which solves the problem that the brain-split processing mode is single, so that the cluster working efficiency is influenced. The method comprises the following steps that: each node in a cluster detects a heartbeat line between the node and other nodes in the cluster; when any heartbeat line cannot be detected in each node in the cluster, and the node stops the business on the node. The technical scheme provided by the invention is suitable for high-availability cluster, and realizes flexible and effective brain-split processing.

Description

Cluster fissure processing method and device

Technical field

The present invention relates to the computer technology application field, relate in particular to a kind of cluster fissure processing method and device.

Background technology

High available Clustering is widely used in technical field of memory.In order to guarantee the operate as normal of high available cluster, need each node in the cluster that normal activity is arranged when external service is provided, thereby guarantee externally to provide stable service.Provide in the process of service at cluster, because the variation of environment, node may take place like this or such fault, and causes node to break from cluster, the fissure phenomenon occurs.Because when fissure took place, the former service that provides of the node of disconnection now maybe be undesired, causes the cluster cisco unity malfunction, so, detect and respond fissure fast and accurately, can improve the performance of cluster.

The way of existing response fissure and recovery nodes mainly is directly the node of disconnection to be closed and restart computer system; Restore the initial environment of computer on the disconnected node; After restoring completion, this node being added in cluster provides service again again, guarantees that the service that provides afterwards on this node is stable.This method can guarantee the stability that service is provided of node computer; Yet under many circumstances; For example: the disconnection of netting twine etc., directly restarting computer system does not have great necessity, and behind computer system starting, will reinitialize information as requested; This will be a relatively time-consuming procedure, reduce efficient.To sum up, the processing mode to fissure in the prior art is single, has influenced the cluster operating efficiency.

Summary of the invention

The invention provides a kind of cluster fissure processing method and device, solved the fissure processing mode singlely, influence the problem of cluster operating efficiency.

A kind of cluster fissure processing method comprises:

Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster;

When the cluster interior nodes detected less than any heartbeat line, this node was ended the business on this node.

Preferably, said when the cluster interior nodes detects less than any heartbeat line, this node is ended also to comprise after the step of the business on this node:

Said node detection to and cluster in behind the heartbeat line heartbeating recovery between each node, the business on this node is reopened.

Preferably, said when the cluster interior nodes detects less than any heartbeat line, the business that this node is ended on this node is specially:

In the time can't detecting any heartbeat line in the sense cycle that the cluster interior nodes is presetting, this node is ended the business on this node.

Preferably, above-mentioned cluster fissure processing method also comprises:

The cluster interior nodes can detect and the part cluster in during heartbeat line between other nodes, judge detect less than heartbeat failure.

The present invention also provides a kind of cluster fissure processing unit, comprising:

The heartbeat administration module is used for detecting the heartbeat line between cluster interior nodes and other nodes of cluster;

The cluster management module is used for when detecting less than any heartbeat line between cluster interior nodes and other nodes of cluster, ending the business on this cluster interior nodes.

Preferably, said cluster management module also is used for behind the heartbeat line heartbeating recovery that detects between cluster interior nodes and other nodes of said cluster, the business on this cluster interior nodes being reopened.

Preferably, said heartbeat administration module, also be used for can detect and other nodes of part cluster between the heartbeat line time, judge to detect less than heartbeat failure.

The invention provides a kind of cluster fissure processing method and device; Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster, when the cluster interior nodes detects less than any heartbeat line, the business on this this node of node termination; Replaced the system of directly restarting of the prior art with the termination business; Save recovery time, improved the accuracy that the fissure phenomenon is handled, guaranteed system works efficient.

Description of drawings

The cluster fissure method that Fig. 1 provides for embodiments of the invention one is to the flow chart of fissure response;

The cluster fissure method that Fig. 2 provides for embodiments of the invention one is recovered the flow chart of response to fissure;

A kind of cluster fissure process flow figure that Fig. 3 provides for embodiments of the invention two;

A kind of cluster fissure processing unit structural representation that Fig. 4 provides for embodiments of the invention two.

Embodiment

Under many circumstances, for example: the disconnection of netting twine etc., directly restarting computer system does not have great necessity, and behind computer system starting, will reinitialize information as requested, and this will be a relatively time-consuming procedure, reduce efficient.

In order to address the above problem, embodiments of the invention provide a kind of cluster fissure processing method and device, fast detecting and response fissure, stop on this node shared resource, stop the business service that this node provides, guarantee the fail safe of shared resource; Behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently.Not only guarantee the safety of resource, improved the speed of cluster recovery and the performance of high-availability system simultaneously.

Hereinafter will combine accompanying drawing that embodiments of the invention are elaborated.Need to prove that under the situation of not conflicting, embodiment among the application and the characteristic among the embodiment be combination in any each other.

At first combine accompanying drawing, embodiments of the invention one are described.

The embodiment of the invention provides a kind of cluster fissure processing method and device, in the available cluster of height, after node finds that heartbeat is broken off, can directly shut-down operation system, and just stop on this node shared resource, stop the business service that this node provides; Behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently.The method has not only guaranteed the safety of resource, has improved the speed of cluster recovery simultaneously, improves the performance of high-availability system.The cluster fissure processing unit that the embodiment of the invention provides comprises: heartbeat administration module, cluster management module and local resource administration module.

In conjunction with above-mentioned cluster fissure processing unit, the cluster fissure processing method of using the embodiment of the invention to provide, the flow process that the node that the fissure phenomenon takes place is handled is following:

1) in the heartbeat administration module, heartbeat module regularly detects the information of every heartbeat line of all nodes in the cluster.In the time that system is provided with in advance, if continue not detect the information of heartbeat line, this judges this heartbeat failure.In a node, if all heartbeat line fault is all then judged other nodes disconnections in this node and the cluster.

2) in the cluster management module, when this module is received heartbeat module heartbeat ON-and OFF-command, can carry out a series of nodal informations and judge, confirm the processing method of node at last.If this node is the node that breaks from cluster, this node will be not can directly shut-down operation system, but startup local resource administration module (3) stop on this node shared resource, stop the business service that this node provides.Other normal node will be taken over the business on this disconnected node in the cluster, and service externally is provided.

3) the heartbeat administration module still detects the information of every heartbeat line of each node behind heartbeat failure, behind the heartbeat message that detects fault heartbeat line again, sends the order of heartbeating recovery and gives the cluster management module.

4) after the order that receives heartbeating recovery, the cluster management module will be made different operation according to the current state of cluster.As the cluster normal node can be directly, the service of recovery nodes fast and efficiently; Like cluster has been the fissure state, with the service of the whole cluster of fast quick-recovery.

After node breaks from cluster, can directly shut-down operation system, and just stop on this node shared resource, stop the business service that this node provides, guaranteed the fail safe of shared resource; The present invention has simultaneously increased the heartbeating recovery testing mechanism, behind this node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently, and improved the speed of cluster recovery, improve the performance of high-availability system.

To combine accompanying drawing that the present invention is carried out more detailed description below:

The master server of cluster management also is a node in the cluster, and this node can initiatively distribute the resource of cluster, to different servers, service is provided externally the various service assignment of cluster; Simultaneously, master server is also directly relevant with the user, and the user directly is assigned on the node of appointment by this node the operation of cluster.

Accompanying drawing 1 is the described fissure responding process of embodiment of the invention figure.The cluster management module is given in the heartbeat that detects certain node when the heartbeat administration module dead order of sending node when cluster breaks; The cluster management module is at first deleted and is upgraded the clustered node information list; And whether computing node is host node, and whether decision node is this node then, if be that this node breaks from cluster; The local resource administration module will stop on this node shared resource, stop the business service that this node provides, wait for the resurrection of heartbeat; Node breaking off is not under the situation of this node; Calculate the start node number of cluster, whether the decision node number is the high available cluster mode of 1+1 of 2 nodes, in the high available cluster of 2 nodes; This node is PING third party IP address initiatively; Judge whether this node also breaks from network, if this node breaks from network, the local resource administration module will stop on this node shared resource, stop the business service that this node provides; Do not wait for the resurrection of heartbeat, not then take over the master server of cluster management; Under the cluster situation of multinode; The half the size of contrast existing node number of cluster and start node number; If existing node number is less than a half; The local resource administration module will stop on this node shared resource, stop the business service that this node provides, the node number that wait for to bring back to life heartbeat is greater than 1/2; When existing node number equals 1/2, judge whether there is master server in the existing node; When existing node greater than 1/2 the time, judge then whether the node that breaks off is master server, if the node that breaks off is a master server, this node will calculate the information of this node, makes a strategic decision and whether takes over master server; If disconnected node is not master server, judge then whether this node is master server, if master server, the business on the disconnected node of then shifting is to other movable nodes.

Fig. 2 is heartbeating recovery responding process figure.The cluster management module is given in the order that sending node recovers when the heartbeat administration module detects the heartbeating recovery of node, and the cluster management module is at first sent the message of several times request adding and given all nodes in the cluster.For all nodes in the cluster; After the request of receiving adds order, will join nodal information in the node listing information on this node, in the cluster all nodes all cognitive the existence of node; Judge then whether this node is primary server joint; If node is a master server, this node will be replied the message of recovery nodes, inform the existence of master server; For the heartbeating recovery node, the request of transmission will be waited for the answer message of some time wait master server after adding message, if receive the answer message of master server, then node adds in the cluster, can start in the cluster and serve; Were it not for the answer message of receiving master server; Explain that master server does not exist; This recovery nodes will be sent again the main clothes of decision-making device and ordered to all nodes in the cluster, after each node is received this order, and information of computing node all; Make a strategic decision out new master server in the cluster restarts the service of cluster.

Cluster fissure processing method and device that the embodiment of the invention provided; Can respond the order that heartbeat is broken off fast; Stop local business and shared resource; And master server will guarantee the fail safe of resource to have guaranteed professional continuity simultaneously service assignment on the disconnected node to other normal nodes; Simultaneously, when the node heartbeating recovery, can be directly, the service of recovery nodes fast and efficiently, improved the speed of cluster recovery, improve the performance of high-availability system.

Below in conjunction with accompanying drawing, embodiments of the invention two are described.

The embodiment of the invention provides a kind of cluster fissure processing method, and it is as shown in Figure 3 to use this method to accomplish the flow process that fissure node in the cluster is handled, and comprising:

Heartbeat line between other nodes in each this node of node detection and the cluster in step 301, the cluster;

Step 302, when the cluster interior nodes detects less than any heartbeat line, this node is ended the business on this node;

Step 303, said node detection to and cluster in behind the heartbeat line heartbeating recovery between each node, the business on this node is reopened.

Step 304, the cluster interior nodes can detect and the part cluster in during heartbeat line between other nodes, judge detect less than heartbeat failure;

After step 301, if the cluster interior nodes can detect one or more heartbeat line, but can't detect whole heartbeat line the time, explain that fissure does not take place this node, at this moment, decidable detect less than heartbeat failure.

The embodiment of the invention also provides a kind of cluster fissure processing unit, and its structure is as shown in Figure 4, comprising:

Heartbeat administration module 401 is used for detecting the heartbeat line between cluster interior nodes and other nodes of cluster;

Cluster management module 402 is used for when detecting less than any heartbeat line between cluster interior nodes and other nodes of cluster, ending the business on this cluster interior nodes.

Preferably, said cluster management module 402 also is used for behind the heartbeat line heartbeating recovery that detects between cluster interior nodes and other nodes of said cluster, the business on this cluster interior nodes being reopened.

Preferably, said heartbeat administration module 401, also be used for can detect and other nodes of part cluster between the heartbeat line time, judge to detect less than heartbeat failure.

Above-mentioned cluster fissure processing unit can be integrated on interior each node of cluster, to accomplish the monitoring and the fissure of each node is handled.

The cluster fissure processing unit that the embodiment of the invention provides; Can combine with a kind of cluster fissure processing method that embodiments of the invention are provided; Heartbeat line between other nodes in each this node of node detection and the cluster in the cluster, when the cluster interior nodes detects less than any heartbeat line, the business on this this node of node termination; Replaced the system of directly restarting of the prior art with the termination business; Save recovery time, improved the accuracy that the fissure phenomenon is handled, guaranteed system works efficient.

The all or part of step that the one of ordinary skill in the art will appreciate that the foregoing description program circuit that can use a computer is realized; Said computer program can be stored in the computer-readable recording medium; Said computer program (like system, unit, device etc.) on the relevant hardware platform is carried out; When carrying out, comprise one of step or its combination of method embodiment.

Alternatively, all or part of step of the foregoing description also can use integrated circuit to realize, these steps can be made into integrated circuit modules one by one respectively, perhaps a plurality of modules in them or step is made into the single integrated circuit module and realizes.Like this, the present invention is not restricted to any specific hardware and software combination.

Each device/functional module/functional unit in the foregoing description can adopt the general calculation device to realize, they can concentrate on the single calculation element, also can be distributed on the network that a plurality of calculation element forms.

Each device/functional module/functional unit in the foregoing description is realized with the form of software function module and during as independently production marketing or use, can be stored in the computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be a read-only memory, disk or CD etc.

Any technical staff who is familiar with the present technique field can expect changing or replacement in the technical scope that the present invention discloses easily, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the described protection range of claim.

Claims

1. A cluster split-brain processing method, characterized in that, comprising:

Each node in the cluster detects the heartbeat between the node and other nodes in the cluster;

When a node in the cluster fails to detect any heartbeat, the node terminates the business on the node.

2. According to claim 1, the cluster split-brain processing method according to claim 1, is characterized in that, when the nodes in the cluster do not detect any heartbeat line, the node also includes after the step of suspending the business on this node:

After the node detects that the heartbeat between the nodes in the cluster is restored, the service on the node is reopened.

3. The cluster split-brain processing method according to claim 1, characterized in that, when the node in the cluster fails to detect any heartbeat line, the node suspends the business on the node as follows:

When a node in the cluster fails to detect any heartbeat within the preset detection period, the node suspends the business on the node.

4. cluster split-brain processing method according to claim 1, is characterized in that, the method also comprises:

When the nodes in the cluster can detect the heartbeat between other nodes in some clusters, it is determined that the heartbeat failure cannot be detected.

5. A cluster split-brain processing device, characterized in that it comprises:

The heartbeat management module is used to detect the heartbeat between the nodes in the cluster and other nodes in the cluster;

The cluster management module is used to stop the service on the node in the cluster when no heartbeat between the node in the cluster and other nodes in the cluster is detected.

6. cluster split-brain processing device according to claim 5, is characterized in that,

The cluster management module is further configured to reopen the service on the node in the cluster after detecting that the heartbeat between the node in the cluster and other nodes in the cluster is restored.

7. cluster split-brain processing device according to claim 5, is characterized in that,

The heartbeat management module is further configured to determine an undetectable heartbeat failure when a heartbeat connection with other nodes in a part of the cluster can be detected.