CN1302411C

CN1302411C - Central control method for large machine group system

Info

Publication number: CN1302411C
Application number: CNB021594813A
Authority: CN
Inventors: 李电森; 许正华; 冯锐; 肖利民
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2002-12-31
Filing date: 2002-12-31
Publication date: 2007-02-28
Anticipated expiration: 2022-12-31
Also published as: CN1512376A

Abstract

The present invention relates to a central control method for a large cluster system. In the cluster system, candidate control nodes which are first added into a cluster become respectively a main node and auxiliary main nodes, wherein the main body collects and maintains the global state information of all the nodes in the cluster system; the auxiliary main nodes carry out real-time backup of the global information stored in the main node and takes over the work of the main node when the main node fails or is closed. By means of a redundant design model of a cluster system control module based on a centralized control strategy, the present invention reduces the complexity of a cluster control algorithm and solves the problem of a single failure point which puzzles the centralized control strategy; besides, the problem of data inconsistency in redundant design is solved, and the reliability and the running performance of the system are enhanced.

Description

The centralized control method of large-scale Network of Workstation

Technical field:

The present invention relates to a kind of centralized control method of large-scale Network of Workstation, especially a kind of redundancy control method of the Network of Workstation control module based on centralized control strategy belongs to computing machine and networking technology area.

Background technology:

The development of business application has promoted commercial group of planes development of technology greatly.Compare with the science computing application, business application has some distinguishing features of self: the granularity of task is little, but the quantity of request service is big; Demand to system processing power is constantly to strengthen; The business application key of some key areas is often used, and for example: banking industry, telecommunications industry need Network of Workstation that the service of high-quality and high reliability can be provided.The superperformance that Network of Workstation need have could well satisfy the business application requirement, that is: a group of planes provides solid material base for realizing the high available and load balancing in the global system, and a group of planes also will be with good expansibility simultaneously.

Will be for the user provide high available service, the management of needs solution such as resource and monitoring, the scheduling of task, the key issues such as arbitration of competition, therefore needing special control module finishes above work.

Network of Workstation is a kind of typical distribution formula computing environment, and when designing control module for it, it is available to exist two kinds of dissimilar control strategies: centralized control strategy and distributed control strategy.

In distributed control strategy, all nodes all participate in the control decision process of the overall situation, and each node can both access decision support information, that is: group of planes status information.For fear of the single failure point, distributed control algolithm is carried out full redundancy to status information, and promptly each node is preserved the copy of a status information, so that this accessing is provided; During state each time upgraded, the interstitial content N in the corresponding current group of planes produced N bar message, and node of every message-driven is carried out one and upgraded operation.

This shows that a potential advantage of distributed control strategy is its high reliability, but its design and realization relative complex; And, the full redundancy of each node status information has been caused the waste of host resource; State information updating needs to produce N bar message (N is the current interstitial content that joins in the group of planes) at least each time, particularly when group of planes of initial structure, it is linear that the renewal trigger event of status information and node are counted N, the peak value of the internet message number that a whole group of planes produced is directly proportional substantially with (N*N), has taken more Internet resources.

The advantage of centralized control strategy mainly is that the message transmission is simple and communication overhead is less between node, and, only need a minimum node to participate in the control of the overall situation at any time.But centralized control strategy forms the single failure point easily in distributed system, and this also is the shortcoming of its maximum.

In the distributed system case study on implementation of traditional centralized control strategy of employing, adopt the redundancy scheme of main frame reserve for fear of the single failure dot system more.The basic thought of Preparation Method is behind the main frame: a main frame is all arranged at any one time as master server, it finishes all work, if this master server had lost efficacy, the server of reserve will be born its task.The client only carries out alternately with master server in the Preparation Method behind main frame, and in order to realize the data consistency between master server and the backup server, the Data Update of being responsible for triggering in the backup server by master server is operated usually.What master server and backup server were carried out is the diverse algorithm flows of two covers.

Referring to Fig. 1, it is the simple description of a write operation agreement in the Preparation Method behind the main frame.Client's request makes master server revise inner status information, and this retouching operation may be very complicated in large scale system; Master server is carried out the action of the 3rd step subsequently, triggers the renewal operation of backup server, if go on foot the update strategy that adopts coarseness the 4th, promptly carries out the renewal of big data block by information type, certainly will cause waste of network resources; If adopt fine-grained update strategy, will increase communication protocol again greatly and upgrade operation complexity.The key of problem is: the traffic model in the main frame reserve mechanism has increased the difficulty of system control module design implementation.Simultaneously, in main frame reserve mechanism, the maintenance of master server high availability generally depends on (active/standby part) server self and finishes, and client computer does not participate in safeguarding that therefore, the adaptivity of system is relatively poor.

Summary of the invention

Fundamental purpose of the present invention is: at the deficiencies in the prior art, a kind of centralized control method of general large-scale Network of Workstation is proposed, this method is based on the Redundancy Design model of the Network of Workstation control module of centralized control strategy, attempt to reduce the complexity of group of planes control algolithm, solve the single failure point problem of the centralized control strategy of puzzlement.

Another object of the present invention is: at the deficiencies in the prior art, propose a kind of centralized control method of general large-scale Network of Workstation, attempt to solve the data consistency problem in the Redundancy Design, improve the reliability and the runnability of system.

The object of the present invention is achieved like this:

A kind of centralized control method of large-scale Network of Workstation in Network of Workstation, makes the candidate's Control Node that adds a group of planes at first become host node and auxiliary host node, and the global state information of all nodes in the Network of Workstation is collected and safeguarded to host node; Auxiliary host node is backed up in realtime to the global state information in the host node, and takes over the work of host node when host node breaks down or closes;

Be provided with in candidate's Control Node and be used for the conforming control module of maintain global state information, this control module makes a node become described candidate's Control Node by keeper's configuration;

Also be provided with ordinary node in the described Network of Workstation, this ordinary node is not established control module, and this ordinary node communicates by letter with host node, drives host node global state information is safeguarded;

Described ordinary node is also communicated by letter with auxiliary host node, drives auxiliary host node the global state information in the host node is carried out redundancy backup;

Carry out periodicity between host node and the auxiliary host node or, be used for the global state information of host node is carried out redundancy backup by the transmission of event driven data sync;

Described host node and auxiliary host node move simultaneously;

The generation of described host node and auxiliary host node specifically comprises:

Step 100: node adds a group of planes;

Step 101: judge whether host node or auxiliary host node add, if add execution in step 106;

Step 102: judge whether this node is candidate's Control Node, if not, withdraw from;

Step 103: obtain control;

Step 104: judge whether to obtain success,, add a group of planes after waiting for a period of time again if unsuccessful;

Step 105: starter motor group control parts promptly start host node or auxiliary host node;

Step 106: carry out following workflow.

To have only candidate's Control Node to add fashionable when Network of Workstation, and host node and auxiliary host node operate on the same node; When in the Network of Workstation during more than candidate's Control Node of an adding, then will assist host node to move on candidate's Control Node beyond the host node, i.e. start-up control module on new candidate's Control Node, and revise corresponding auxiliary host node run location information; Concrete operation is as follows:

Step 200: node adds a group of planes;

Step 201: judge that host node and auxiliary host node are whether on same node; If not on same node, finish;

Step 202: judge whether new node is candidate's host node, if not, finish;

Step 203: new node is set is auxiliary host node;

Step 204: judge whether setting is successful, unsuccessful as if being provided with, finish;

Step 205: the auxiliary host node setting of host node cancellation oneself;

Step 206: finish.

Can communicate between ordinary node and the Control Node, ordinary node at first detects the existing state of host node and auxiliary host node, if host node or auxiliary host node break down, then carries out corresponding fault handling.

When host node breaks down, the auxiliary host node of fault discovery node notice, auxiliary host node is taken over this fault host node becomes new host node; Then, new host node is selected candidate Control Node available in the current system, and makes it become new auxiliary host node; If there is not available candidate Control Node in the Network of Workstation, then new host node keeps the auxiliary host node role of oneself.

When auxiliary host node broke down, fault discovery node notice host node was reselected candidate's Control Node available in the current system as auxiliary host node by host node; If there is not available candidate's Control Node in the system, host node makes oneself becomes auxiliary host node; And before this auxiliary host node migration was finished, host node all no longer continued to receive new Data Update task.

If host node and auxiliary host node break down simultaneously, the fault discovery node will self be made as host node and auxiliary host node, and rebuilds group of planes global state information.

Above-mentioned global state information comprises at least: node status information, service status information, node resource load information; This global state information is made amendment by the report drive controlling node of ordinary node, or is independently made amendment by host node and auxiliary host node.Ordinary node sends to host node and auxiliary host node by the mode of simulation cast communication respectively with message, and host node and auxiliary host node are safeguarded local global state information separately.

Communicating by letter of carrying out between above-mentioned ordinary node and the Control Node comprises at least:

Step 300: connect with host node; Connect with auxiliary host node;

Step 301: send data to host node, and receive replying of host node; Send data to auxiliary host node, and receive replying of auxiliary host node;

Step 302: replying of host node and replying of auxiliary host node are compared,, then trigger the data sync operation between the Control Node if find that the reply data that host node and auxiliary host node sent is inconsistent; Execution in step 304;

Step 303: send data to host node; Send data to auxiliary host node;

Step 304: disconnection is connected with host node, disconnects being connected with auxiliary host node.

Wherein, step 300 specifically comprises:

Step 3001: host node sends replying of " whether host node being identical with auxiliary host node ";

Step 3002: ordinary node receives this and replys;

Step 3003: if host node and auxiliary host node are not same nodes, execution in step 3005;

Step 3004: global flag is set;

Step 3005: connect with auxiliary host node.

Also further comprise after the above-mentioned step 3004: read this global flag,, then abandon operation with auxiliary host node if find that host node is identical with auxiliary host node.

The present invention passes through the Redundancy Design model based on the Network of Workstation control module of centralized control strategy, has reduced the complexity of group of planes control algolithm, has solved the single failure point problem that perplexs centralized control strategy; Simultaneously, solve the data consistency problem in the Redundancy Design, improved the reliability and the runnability of system.

Description of drawings

Fig. 1 is the simple active and standby part of agreement synoptic diagram of write operation in the prior art;

Fig. 2 is a group of planes controlling models synoptic diagram of the present invention;

Fig. 3 is for producing the process flow diagram of host node and auxiliary host node in the Network of Workstation of the present invention;

The process flow diagram that Fig. 4 separates with auxiliary host node for host node in the Network of Workstation of the present invention;

Fig. 5 is Control Node fault detect of the present invention and processing flow chart;

Fig. 6 is one of external event traffic model of the present invention;

Fig. 7 is two of external event traffic model of the present invention

Fig. 8 is an internal event traffic model of the present invention;

Fig. 9 is the present invention's primitive process flow diagram that connects.

Embodiment:

Followingly the technical program is elaborated with reference to specific embodiments and the drawings.

In Network of Workstation, on the node that has control module is installed, and is candidate's Control Node by administrator configurations, be called candidate's Control Node, the node that has is not installed control module, is called ordinary node.Make certain two candidate's Control Node become host node and auxiliary host node, and operation simultaneously.Host node is real Control Node, has collected the global state information of all nodes in the group of planes on host node.But, preserve these information if having only host node, when it breaks down, global information will be lost, and then cause system normally to move, therefore, auxiliary host node carries out real-time backup to the global information in the host node, and can take over the work of host node when host node breaks down.In Fig. 1,1. data transfer path has guaranteed the collection of host node to global state information.In general the method for the global information of auxiliary host node backup host node has two kinds, and a kind of method is: make all nodes also transmit data to auxiliary host node in host node transmission data, promptly by the data transfer path among Fig. 1 2.; Another kind method is: carry out the data sync transmission between host node and auxiliary host node, promptly by the data transfer path among Fig. 1 3., this synchronous transmission can be periodic, also can be by event driven.

In Network of Workstation, ordinary node does not have the ability of maintain global state information, in order to guarantee to make the fast as far as possible Control Node that has in the system, to make the candidate's Control Node that adds a group of planes at first become host node and auxiliary host node, i.e. start-up control module as soon as possible, add if non-candidate's Control Node at first starts and asks, then must withdraw from adition process and wait for adding again.

Host node and auxiliary host node should be the nodes that adds a group of planes at first, because the state that when the start-up control module, may occur competing, be that a plurality of nodes are thought of as simultaneously and are host node and auxiliary host node, therefore must realize " mutual exclusion " to the operation of start-up control module, produce process flow diagram such as Fig. 3 of host node and auxiliary host node:

Step 100: node adds a group of planes;

Step 101: test Control Node state;

Step 102: judge whether host node and auxiliary host node add, if add execution in step 107; If do not add, then forward step 103 to;

Step 103: judge whether it is the candidate Control Node, if not, withdraw from; If, execution in step 104;

Step 104: obtain control;

Step 105: judge whether to obtain success,, add a group of planes after waiting for a period of time again if unsuccessful; If success, execution in step 106;

Step 106: starter motor group control parts promptly start host node or auxiliary node.

Step 107: carry out following workflow.

Begin to add fashionable at node, host node and auxiliary host node operate on the same node, along with more candidate's Control Node adds a group of planes, need to assist host node to move on other candidate's Control Node, it is exactly start-up control module on new candidate's Control Node that what is called will be assisted the host node migration, and revise relevant auxiliary host node run location information, as Fig. 4.

Step 200: node adds a group of planes;

Step 201: judge that host node and auxiliary host node are whether on same node; If on same node, execution in step 202; If not on same node, finish;

Step 203: judge whether new node is the candidate node, if, execution in step 204; Or not to finish;

Step 204: new node is set is auxiliary host node;

Step 205: judge whether setting is successful, if success, execution in step 206; If unsuccessful, finish;

Step 206: the auxiliary host node role who cancels oneself;

Step 207: finish.

When system moves, for guaranteeing the fault of fast as far as possible discovery Control Node, and produce the Control Node that makes new advances according to certain rule and come the taking over fault node, present embodiment adopts " event-driven " type trouble shooting mechanism, that is: Network of Workstation is in operational process, can carry out frequent communicating by letter between ordinary node and the Control Node, some operation is periodically to carry out, as the report of node load information; Some operation is prominent method execution, as the report of fault node etc.With the communication process of Control Node in, what ordinary node at first will be done is exactly the existing state that detects host node and auxiliary host node, therefore, if host node or auxiliary host node break down, system will can find and carry out respective handling very soon.

When host node broke down, the auxiliary host node of fault discovery node notice made its adapter become new host node.Then, new host node selects candidate's Control Node available in the current system to make it become new auxiliary major node.If there is not available candidate's Control Node in the system, then new host node keeps the auxiliary host node role of oneself; When auxiliary host node breaks down, fault discovery node notice host node, the node of being reselected other by host node is as auxiliary host node, if there is not available candidate's Control Node in the system, host node makes oneself becomes auxiliary host node; If host node and auxiliary host node break down simultaneously, the fault discovery node should be attempted and will self be made as host node and auxiliary host node, and rebuilds group of planes global state information.

Its flow process is as shown in Figure 5:

Step 300: the existing state that detects host node and auxiliary host node;

Step 301: if break down, detection failure type then;

Step 302: if auxiliary host node fault, notice host node, execution in step 3030; If the host node fault, the auxiliary host node of notice, execution in step 3040; If host node and auxiliary host node all break down, execution in step 306;

Step 3030: select the candidate Control Node to make auxiliary host node;

Step 3031: judge whether success, if success, execution in step 306; If unsuccessful, host node is taken over auxiliary host node role.

Step 3040: auxiliary host node will be controlled oneself and will be made as host node;

Step 3041: select candidate to get Control Node and be auxiliary host node;

Step 3042: judge whether success, if success, execution in step 3043; If unsuccessful, execution in step 3044;

Step 3043: the auxiliary host node role of cancellation;

Step 3044: keep auxiliary host node role.

Step 305: the fault discovery node will self be made as host node and auxiliary host node, execution in step 203;

Step 306: finish.

In the present invention, global state information is the aggregate of numerous information, and it comprises: node status information, service status information, node resource load information etc.Global state information is a kind of multidate information, and in centralized control strategy, all that this locality is the up-to-date global state information of all nodes is reported to Control Node, and Control Node gathers information, put in order, and with this foundation as arbitration decisions.Therefore, how safeguarding global state information is one of the most key problem of group of planes Control System Design.

Because adopt the redundancy structure of many Control Node, global state information is carried out redundancy backup between Control Node, so control module must be safeguarded the consistance of redundant data.In system's operational process, the variation of global state information may come from two aspects: the report drive controlling node of ordinary node is revised global state information (external event); The modification global state information (internal event) that Control Node is autonomous.For dissimilar incidents, system should adopt corresponding strategy to come the consistance of service data.

For external event, the mode by the simulation cast communication makes ordinary node that message is sent to host node and auxiliary host node respectively.As Fig. 6, host node 1 is safeguarded local global state information separately with auxiliary host node 2, avoids data sync operation each other.In order to realize the visit transparency of many host nodes, as Fig. 7, the data communication operation can be encapsulated as some intercommunication primitives, so that the atomicity operation to be provided.

Intercommunication primitive functional description (is example with TCP communication protocol):

＜connect: connect with host node; Connect with auxiliary host node.

＜send and the reception data: send data to host node, and reception is replied; Send data to auxiliary host node, and reception is replied; Host node replied and assist host node to reply compare, this function is equivalent to realize to the inconsistent detection of data.If discovery host node 1 is inconsistent with the reply data that auxiliary host node 2 is sent, should trigger the data sync operation between the Control Node.

＜transmission data 〉: send data to host node; Send data to auxiliary host node.

＜disconnection connects 〉: disconnection is connected with host node, disconnects being connected with auxiliary host node.

Ordinary node is communicated by letter respectively with auxiliary host node to host node, when same Control Node receives the Data Update operation requests of two identical (repetitions), can successively cause twice identical renewal operation.If operation is fallen and is added type, to the adding up of certain global variable, obviously error in data will appear for example, if resource has been shared in the application of mutual exclusion in the operation, also may cause the deadlock of Control Node, produce error in data.Therefore, must avoid same Control Node to receive the Data Update operation requests of repetition.

For this reason, need be to intercommunication primitive correct above-mentioned:

The process that connects is as shown in Figure 9:

Step 600: beginning;

Step 601: host node sends replying of " whether host node being identical with auxiliary host node ";

Step 602: receive and reply;

Step 603: judge whether host node and auxiliary host node are same nodes, if, execution in step 604; If not, execution in step 605;

Step 604: global flag is set;

Step 605: connect with auxiliary host node;

Step 606: finish.

Other primitive: read global flag,, abandon the operation of corresponding and auxiliary host node if find that host node is identical with auxiliary host node.

The best way that guarantees data consistency makes all strange land Data Update operations act in agreement exactly.For external event, the Data Update of acting in agreement is to guarantee easily, because Data Update operation in strange land has same cradle.And being steering logic (usually by some periodic inspection tasks) by control module self, caused internal event, it is just very difficult to guarantee that these data among nodes renewal operations act in agreement, because do not have a unified global clock between a plurality of Control Node.

In the present embodiment, for internal event, the Data Update operation in order to realize acting in agreement should add synchronous operation, as Fig. 8 between host node and auxiliary host node.We why can add synchronous operation be because: under identical data environment, if a Control Node has produced certain internal event, another Control Node also will inevitably produce identical internal event.

When auxiliary host node moves, also might cause the inconsistent of data.Therefore, (auxiliary host node) determined assist the migration of host node as long as host node---and regardless of this operation is to carry out at once or wait in line, and before migration was finished, host node all should no longer continue to receive new Data Update task.

In addition, the control module repeated message of also tackling different nodes is judged.Because ordinary node has independence to the Control Node reporting state information, so the situation that a plurality of nodes are reported to Control Node with regard to same failure problems (for example, auxiliary host node fault) may occur.Whether system must repeat to judge that this is relevant with the Message Processing pattern that system is adopted to message.Exist tangible CLIENT/SERVER relation in group of planes inside, the message report person is CLIENT, and Message Processing person is SERVER.From the program architecture analysis, control module comes down to the SERVER program in the CLIENT/SERVER structure.In order to improve the processing power of message, control module adopts concurrent server model (multi-process) usually, and the processing of message lacks mutually to be coordinated.System must avoid the repeatedly processing to identical message, therefore should judge the validity (whether out-of-date) of message, if find to receive repetition message, should abandon.

It should be noted that at last: above embodiment is only unrestricted in order to explanation the present invention, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement the present invention, and not breaking away from the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1. A centralized control method for a large-scale cluster system, characterized in that: in the cluster system, the candidate control node that joins the cluster first becomes the master node and the auxiliary master node, and the master node collects and maintains the overall situation of all nodes in the cluster system State information; the auxiliary master node backs up the global state information in the master node in real time, and takes over the work of the master node when the master node fails or shuts down;

The candidate control node is provided with a control module for maintaining the consistency of the global state information, and the control module makes a node become the candidate control node through the configuration of the administrator;

The cluster system is also provided with a common node, the common node does not have a control module, and the common node communicates with the master node to drive the master node to maintain the global state information;

The common node also communicates with the auxiliary master node to drive the auxiliary master node to perform redundant backup of the global state information in the master node;

Periodic or event-driven synchronous data transmission between the master node and the auxiliary master node for redundant backup of the global state information in the master node;

The primary node and the auxiliary primary node are running concurrently;

The generation of the master node and auxiliary master node specifically includes:

Step 100: the node joins the cluster;

Step 101: Determine whether the master node or the auxiliary master node has joined, if so, go to step 106;

Step 102: Determine whether the node is a candidate control node, if not, exit;

Step 103: Obtain the control right;

Step 104: Determine whether the acquisition is successful, if not, wait for a period of time and rejoin the fleet;

Step 105: Start the cluster control component, that is, start the master node or the auxiliary master node;

Step 106: Carry out the following workflow.

2. The centralized control method of a large cluster system according to claim 1, characterized in that: when only one candidate control node joins the cluster system, the master node and the auxiliary master node run on the same node; when there are more than one candidate control node in the cluster system When a candidate control node is added, the auxiliary master node is migrated to a candidate control node other than the master node, that is, the control module is started on the new candidate control node, and the corresponding operating position information of the auxiliary master node is modified; the specific operation as follows:

Step 200: the node joins the cluster;

Step 201: Determine whether the primary node and the auxiliary primary node are on the same node; if not, end;

Step 202: judging whether the new node is a candidate control node, if not, end;

Step 203: setting the new node as an auxiliary master node;

Step 204: judging whether the setting is successful, if the setting is not successful, end;

Step 205: the master node cancels its own auxiliary master node settings;

Step 206: end.

3. The centralized control method of a large cluster system according to claim 1, characterized in that: the method further comprises: communication between the common node and the control node; the common node first detects the survival of the master node and the auxiliary master node state, if the primary node or secondary primary node fails, perform corresponding fault handling.

4. The method for centralized control of a large-scale cluster system according to claim 3, characterized in that: when the master node fails, the failure discovery node notifies the auxiliary master node, and the auxiliary master node takes over the failed master node to become a new master node; then , the new master node selects the candidate control node available in the current system and makes it the new auxiliary master node; if there is no candidate control node available in the cluster system, the new master node retains its role as the auxiliary master node.

5. The centralized control method of a large computer cluster system according to claim 3, characterized in that: when the auxiliary master node fails, the fault discovery node notifies the master node, and the master node reselects a candidate control node available in the current system as the master node. Auxiliary master node; if there is no candidate control node available in the system, the master node makes itself an auxiliary master node; and until the migration of the auxiliary master node is completed, the master node will not continue to receive new data update tasks.

6. The centralized control method of a large cluster system according to claim 3, characterized in that: if the primary node and the auxiliary primary node fail at the same time, the failure discovery node sets itself as the primary node and the secondary primary node, and rebuilds the cluster Global state information.

7. The centralized control method of a large computer cluster system according to claim 1, wherein said global state information at least includes: node state information, service state information, and node resource load information.

8. The centralized control method of a large computer cluster system according to claim 1, characterized in that: said global state information is modified by the report-driven control node of the common node, or is independently modified by the master node and the auxiliary master node.

9. The centralized control method of a large cluster system according to claim 8, characterized in that: ordinary nodes send messages to the master node and the auxiliary master node respectively by means of simulated multicast communication, and the master node and the auxiliary master node respectively maintain Local global state information.

10. The centralized control method of a large cluster system according to claim 9, characterized in that: the communication between common nodes and control nodes at least includes:

Step 300: establish a connection with the master node; establish a connection with the auxiliary master node;

Step 301: Send data to the master node and receive a response from the master node; send data to the auxiliary master node and receive a response from the auxiliary master node;

Step 302: Compare the response of the master node with the response of the assistant master node, if the response data sent by the master node and the assistant master node are found to be inconsistent, trigger a data synchronization operation between the control nodes; execute step 304;

Step 303: Send data to the master node; send data to the auxiliary master node;

Step 304: Disconnect from the primary node, and disconnect from the auxiliary primary node.

11. The centralized control method of a large cluster system according to claim 10, characterized in that: Step 300 specifically includes:

Step 3001: the master node sends a response of "whether the master node and the auxiliary master node are the same";

Step 3002: the common node receives the response;

Step 3003: If the primary node and the auxiliary primary node are not the same node, perform step 3005;

Step 3004: set the global flag;

Step 3005: Establish a connection with the auxiliary master node.

12. The centralized control method of a large cluster system according to claim 11, characterized in that: after step 3004, it further includes: reading the global flag, and if it is found that the master node is the same as the auxiliary master node, abandoning the relationship with the auxiliary master node operation.