CN1302411C - Central control method for large machine group system - Google Patents
Central control method for large machine group system Download PDFInfo
- Publication number
- CN1302411C CN1302411C CNB021594813A CN02159481A CN1302411C CN 1302411 C CN1302411 C CN 1302411C CN B021594813 A CNB021594813 A CN B021594813A CN 02159481 A CN02159481 A CN 02159481A CN 1302411 C CN1302411 C CN 1302411C
- Authority
- CN
- China
- Prior art keywords
- node
- master node
- auxiliary
- host node
- cluster system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000006854 communication Effects 0.000 claims description 10
- 238000004891 communication Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 7
- 238000013508 migration Methods 0.000 claims description 5
- 230000005012 migration Effects 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 3
- 230000004083 survival effect Effects 0.000 claims 1
- 238000011217 control strategy Methods 0.000 abstract description 16
- 238000013461 design Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000011343 solid material Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Landscapes
- Computer And Data Communications (AREA)
- Hardware Redundancy (AREA)
Abstract
The present invention relates to a central control method for a large cluster system. In the cluster system, candidate control nodes which are first added into a cluster become respectively a main node and auxiliary main nodes, wherein the main body collects and maintains the global state information of all the nodes in the cluster system; the auxiliary main nodes carry out real-time backup of the global information stored in the main node and takes over the work of the main node when the main node fails or is closed. By means of a redundant design model of a cluster system control module based on a centralized control strategy, the present invention reduces the complexity of a cluster control algorithm and solves the problem of a single failure point which puzzles the centralized control strategy; besides, the problem of data inconsistency in redundant design is solved, and the reliability and the running performance of the system are enhanced.
Description
Technical field:
The present invention relates to a kind of centralized control method of large-scale Network of Workstation, especially a kind of redundancy control method of the Network of Workstation control module based on centralized control strategy belongs to computing machine and networking technology area.
Background technology:
The development of business application has promoted commercial group of planes development of technology greatly.Compare with the science computing application, business application has some distinguishing features of self: the granularity of task is little, but the quantity of request service is big; Demand to system processing power is constantly to strengthen; The business application key of some key areas is often used, and for example: banking industry, telecommunications industry need Network of Workstation that the service of high-quality and high reliability can be provided.The superperformance that Network of Workstation need have could well satisfy the business application requirement, that is: a group of planes provides solid material base for realizing the high available and load balancing in the global system, and a group of planes also will be with good expansibility simultaneously.
Will be for the user provide high available service, the management of needs solution such as resource and monitoring, the scheduling of task, the key issues such as arbitration of competition, therefore needing special control module finishes above work.
Network of Workstation is a kind of typical distribution formula computing environment, and when designing control module for it, it is available to exist two kinds of dissimilar control strategies: centralized control strategy and distributed control strategy.
In distributed control strategy, all nodes all participate in the control decision process of the overall situation, and each node can both access decision support information, that is: group of planes status information.For fear of the single failure point, distributed control algolithm is carried out full redundancy to status information, and promptly each node is preserved the copy of a status information, so that this accessing is provided; During state each time upgraded, the interstitial content N in the corresponding current group of planes produced N bar message, and node of every message-driven is carried out one and upgraded operation.
This shows that a potential advantage of distributed control strategy is its high reliability, but its design and realization relative complex; And, the full redundancy of each node status information has been caused the waste of host resource; State information updating needs to produce N bar message (N is the current interstitial content that joins in the group of planes) at least each time, particularly when group of planes of initial structure, it is linear that the renewal trigger event of status information and node are counted N, the peak value of the internet message number that a whole group of planes produced is directly proportional substantially with (N*N), has taken more Internet resources.
The advantage of centralized control strategy mainly is that the message transmission is simple and communication overhead is less between node, and, only need a minimum node to participate in the control of the overall situation at any time.But centralized control strategy forms the single failure point easily in distributed system, and this also is the shortcoming of its maximum.
In the distributed system case study on implementation of traditional centralized control strategy of employing, adopt the redundancy scheme of main frame reserve for fear of the single failure dot system more.The basic thought of Preparation Method is behind the main frame: a main frame is all arranged at any one time as master server, it finishes all work, if this master server had lost efficacy, the server of reserve will be born its task.The client only carries out alternately with master server in the Preparation Method behind main frame, and in order to realize the data consistency between master server and the backup server, the Data Update of being responsible for triggering in the backup server by master server is operated usually.What master server and backup server were carried out is the diverse algorithm flows of two covers.
Referring to Fig. 1, it is the simple description of a write operation agreement in the Preparation Method behind the main frame.Client's request makes master server revise inner status information, and this retouching operation may be very complicated in large scale system; Master server is carried out the action of the 3rd step subsequently, triggers the renewal operation of backup server, if go on foot the update strategy that adopts coarseness the 4th, promptly carries out the renewal of big data block by information type, certainly will cause waste of network resources; If adopt fine-grained update strategy, will increase communication protocol again greatly and upgrade operation complexity.The key of problem is: the traffic model in the main frame reserve mechanism has increased the difficulty of system control module design implementation.Simultaneously, in main frame reserve mechanism, the maintenance of master server high availability generally depends on (active/standby part) server self and finishes, and client computer does not participate in safeguarding that therefore, the adaptivity of system is relatively poor.
Summary of the invention
Fundamental purpose of the present invention is: at the deficiencies in the prior art, a kind of centralized control method of general large-scale Network of Workstation is proposed, this method is based on the Redundancy Design model of the Network of Workstation control module of centralized control strategy, attempt to reduce the complexity of group of planes control algolithm, solve the single failure point problem of the centralized control strategy of puzzlement.
Another object of the present invention is: at the deficiencies in the prior art, propose a kind of centralized control method of general large-scale Network of Workstation, attempt to solve the data consistency problem in the Redundancy Design, improve the reliability and the runnability of system.
The object of the present invention is achieved like this:
A kind of centralized control method of large-scale Network of Workstation in Network of Workstation, makes the candidate's Control Node that adds a group of planes at first become host node and auxiliary host node, and the global state information of all nodes in the Network of Workstation is collected and safeguarded to host node; Auxiliary host node is backed up in realtime to the global state information in the host node, and takes over the work of host node when host node breaks down or closes;
Be provided with in candidate's Control Node and be used for the conforming control module of maintain global state information, this control module makes a node become described candidate's Control Node by keeper's configuration;
Also be provided with ordinary node in the described Network of Workstation, this ordinary node is not established control module, and this ordinary node communicates by letter with host node, drives host node global state information is safeguarded;
Described ordinary node is also communicated by letter with auxiliary host node, drives auxiliary host node the global state information in the host node is carried out redundancy backup;
Carry out periodicity between host node and the auxiliary host node or, be used for the global state information of host node is carried out redundancy backup by the transmission of event driven data sync;
Described host node and auxiliary host node move simultaneously;
The generation of described host node and auxiliary host node specifically comprises:
Step 100: node adds a group of planes;
Step 101: judge whether host node or auxiliary host node add, if add execution in step 106;
Step 102: judge whether this node is candidate's Control Node, if not, withdraw from;
Step 103: obtain control;
Step 104: judge whether to obtain success,, add a group of planes after waiting for a period of time again if unsuccessful;
Step 105: starter motor group control parts promptly start host node or auxiliary host node;
Step 106: carry out following workflow.
To have only candidate's Control Node to add fashionable when Network of Workstation, and host node and auxiliary host node operate on the same node; When in the Network of Workstation during more than candidate's Control Node of an adding, then will assist host node to move on candidate's Control Node beyond the host node, i.e. start-up control module on new candidate's Control Node, and revise corresponding auxiliary host node run location information; Concrete operation is as follows:
Step 200: node adds a group of planes;
Step 201: judge that host node and auxiliary host node are whether on same node; If not on same node, finish;
Step 202: judge whether new node is candidate's host node, if not, finish;
Step 203: new node is set is auxiliary host node;
Step 204: judge whether setting is successful, unsuccessful as if being provided with, finish;
Step 205: the auxiliary host node setting of host node cancellation oneself;
Step 206: finish.
Can communicate between ordinary node and the Control Node, ordinary node at first detects the existing state of host node and auxiliary host node, if host node or auxiliary host node break down, then carries out corresponding fault handling.
When host node breaks down, the auxiliary host node of fault discovery node notice, auxiliary host node is taken over this fault host node becomes new host node; Then, new host node is selected candidate Control Node available in the current system, and makes it become new auxiliary host node; If there is not available candidate Control Node in the Network of Workstation, then new host node keeps the auxiliary host node role of oneself.
When auxiliary host node broke down, fault discovery node notice host node was reselected candidate's Control Node available in the current system as auxiliary host node by host node; If there is not available candidate's Control Node in the system, host node makes oneself becomes auxiliary host node; And before this auxiliary host node migration was finished, host node all no longer continued to receive new Data Update task.
If host node and auxiliary host node break down simultaneously, the fault discovery node will self be made as host node and auxiliary host node, and rebuilds group of planes global state information.
Above-mentioned global state information comprises at least: node status information, service status information, node resource load information; This global state information is made amendment by the report drive controlling node of ordinary node, or is independently made amendment by host node and auxiliary host node.Ordinary node sends to host node and auxiliary host node by the mode of simulation cast communication respectively with message, and host node and auxiliary host node are safeguarded local global state information separately.
Communicating by letter of carrying out between above-mentioned ordinary node and the Control Node comprises at least:
Step 300: connect with host node; Connect with auxiliary host node;
Step 301: send data to host node, and receive replying of host node; Send data to auxiliary host node, and receive replying of auxiliary host node;
Step 302: replying of host node and replying of auxiliary host node are compared,, then trigger the data sync operation between the Control Node if find that the reply data that host node and auxiliary host node sent is inconsistent; Execution in step 304;
Step 303: send data to host node; Send data to auxiliary host node;
Step 304: disconnection is connected with host node, disconnects being connected with auxiliary host node.
Wherein, step 300 specifically comprises:
Step 3001: host node sends replying of " whether host node being identical with auxiliary host node ";
Step 3002: ordinary node receives this and replys;
Step 3003: if host node and auxiliary host node are not same nodes, execution in step 3005;
Step 3004: global flag is set;
Step 3005: connect with auxiliary host node.
Also further comprise after the above-mentioned step 3004: read this global flag,, then abandon operation with auxiliary host node if find that host node is identical with auxiliary host node.
The present invention passes through the Redundancy Design model based on the Network of Workstation control module of centralized control strategy, has reduced the complexity of group of planes control algolithm, has solved the single failure point problem that perplexs centralized control strategy; Simultaneously, solve the data consistency problem in the Redundancy Design, improved the reliability and the runnability of system.
Description of drawings
Fig. 1 is the simple active and standby part of agreement synoptic diagram of write operation in the prior art;
Fig. 2 is a group of planes controlling models synoptic diagram of the present invention;
Fig. 3 is for producing the process flow diagram of host node and auxiliary host node in the Network of Workstation of the present invention;
The process flow diagram that Fig. 4 separates with auxiliary host node for host node in the Network of Workstation of the present invention;
Fig. 5 is Control Node fault detect of the present invention and processing flow chart;
Fig. 6 is one of external event traffic model of the present invention;
Fig. 7 is two of external event traffic model of the present invention
Fig. 8 is an internal event traffic model of the present invention;
Fig. 9 is the present invention's primitive process flow diagram that connects.
Embodiment:
Followingly the technical program is elaborated with reference to specific embodiments and the drawings.
In Network of Workstation, on the node that has control module is installed, and is candidate's Control Node by administrator configurations, be called candidate's Control Node, the node that has is not installed control module, is called ordinary node.Make certain two candidate's Control Node become host node and auxiliary host node, and operation simultaneously.Host node is real Control Node, has collected the global state information of all nodes in the group of planes on host node.But, preserve these information if having only host node, when it breaks down, global information will be lost, and then cause system normally to move, therefore, auxiliary host node carries out real-time backup to the global information in the host node, and can take over the work of host node when host node breaks down.In Fig. 1,1. data transfer path has guaranteed the collection of host node to global state information.In general the method for the global information of auxiliary host node backup host node has two kinds, and a kind of method is: make all nodes also transmit data to auxiliary host node in host node transmission data, promptly by the data transfer path among Fig. 1 2.; Another kind method is: carry out the data sync transmission between host node and auxiliary host node, promptly by the data transfer path among Fig. 1 3., this synchronous transmission can be periodic, also can be by event driven.
In Network of Workstation, ordinary node does not have the ability of maintain global state information, in order to guarantee to make the fast as far as possible Control Node that has in the system, to make the candidate's Control Node that adds a group of planes at first become host node and auxiliary host node, i.e. start-up control module as soon as possible, add if non-candidate's Control Node at first starts and asks, then must withdraw from adition process and wait for adding again.
Host node and auxiliary host node should be the nodes that adds a group of planes at first, because the state that when the start-up control module, may occur competing, be that a plurality of nodes are thought of as simultaneously and are host node and auxiliary host node, therefore must realize " mutual exclusion " to the operation of start-up control module, produce process flow diagram such as Fig. 3 of host node and auxiliary host node:
Step 100: node adds a group of planes;
Step 101: test Control Node state;
Step 102: judge whether host node and auxiliary host node add, if add execution in step 107; If do not add, then forward step 103 to;
Step 103: judge whether it is the candidate Control Node, if not, withdraw from; If, execution in step 104;
Step 104: obtain control;
Step 105: judge whether to obtain success,, add a group of planes after waiting for a period of time again if unsuccessful; If success, execution in step 106;
Step 106: starter motor group control parts promptly start host node or auxiliary node.
Step 107: carry out following workflow.
Begin to add fashionable at node, host node and auxiliary host node operate on the same node, along with more candidate's Control Node adds a group of planes, need to assist host node to move on other candidate's Control Node, it is exactly start-up control module on new candidate's Control Node that what is called will be assisted the host node migration, and revise relevant auxiliary host node run location information, as Fig. 4.
Step 200: node adds a group of planes;
Step 201: judge that host node and auxiliary host node are whether on same node; If on same node, execution in step 202; If not on same node, finish;
Step 203: judge whether new node is the candidate node, if, execution in step 204; Or not to finish;
Step 204: new node is set is auxiliary host node;
Step 205: judge whether setting is successful, if success, execution in step 206; If unsuccessful, finish;
Step 206: the auxiliary host node role who cancels oneself;
Step 207: finish.
When system moves, for guaranteeing the fault of fast as far as possible discovery Control Node, and produce the Control Node that makes new advances according to certain rule and come the taking over fault node, present embodiment adopts " event-driven " type trouble shooting mechanism, that is: Network of Workstation is in operational process, can carry out frequent communicating by letter between ordinary node and the Control Node, some operation is periodically to carry out, as the report of node load information; Some operation is prominent method execution, as the report of fault node etc.With the communication process of Control Node in, what ordinary node at first will be done is exactly the existing state that detects host node and auxiliary host node, therefore, if host node or auxiliary host node break down, system will can find and carry out respective handling very soon.
When host node broke down, the auxiliary host node of fault discovery node notice made its adapter become new host node.Then, new host node selects candidate's Control Node available in the current system to make it become new auxiliary major node.If there is not available candidate's Control Node in the system, then new host node keeps the auxiliary host node role of oneself; When auxiliary host node breaks down, fault discovery node notice host node, the node of being reselected other by host node is as auxiliary host node, if there is not available candidate's Control Node in the system, host node makes oneself becomes auxiliary host node; If host node and auxiliary host node break down simultaneously, the fault discovery node should be attempted and will self be made as host node and auxiliary host node, and rebuilds group of planes global state information.
Its flow process is as shown in Figure 5:
Step 300: the existing state that detects host node and auxiliary host node;
Step 301: if break down, detection failure type then;
Step 302: if auxiliary host node fault, notice host node, execution in step 3030; If the host node fault, the auxiliary host node of notice, execution in step 3040; If host node and auxiliary host node all break down, execution in step 306;
Step 3030: select the candidate Control Node to make auxiliary host node;
Step 3031: judge whether success, if success, execution in step 306; If unsuccessful, host node is taken over auxiliary host node role.
Step 3040: auxiliary host node will be controlled oneself and will be made as host node;
Step 3041: select candidate to get Control Node and be auxiliary host node;
Step 3042: judge whether success, if success, execution in step 3043; If unsuccessful, execution in step 3044;
Step 3043: the auxiliary host node role of cancellation;
Step 3044: keep auxiliary host node role.
Step 305: the fault discovery node will self be made as host node and auxiliary host node, execution in step 203;
Step 306: finish.
In the present invention, global state information is the aggregate of numerous information, and it comprises: node status information, service status information, node resource load information etc.Global state information is a kind of multidate information, and in centralized control strategy, all that this locality is the up-to-date global state information of all nodes is reported to Control Node, and Control Node gathers information, put in order, and with this foundation as arbitration decisions.Therefore, how safeguarding global state information is one of the most key problem of group of planes Control System Design.
Because adopt the redundancy structure of many Control Node, global state information is carried out redundancy backup between Control Node, so control module must be safeguarded the consistance of redundant data.In system's operational process, the variation of global state information may come from two aspects: the report drive controlling node of ordinary node is revised global state information (external event); The modification global state information (internal event) that Control Node is autonomous.For dissimilar incidents, system should adopt corresponding strategy to come the consistance of service data.
For external event, the mode by the simulation cast communication makes ordinary node that message is sent to host node and auxiliary host node respectively.As Fig. 6, host node 1 is safeguarded local global state information separately with auxiliary host node 2, avoids data sync operation each other.In order to realize the visit transparency of many host nodes, as Fig. 7, the data communication operation can be encapsulated as some intercommunication primitives, so that the atomicity operation to be provided.
Intercommunication primitive functional description (is example with TCP communication protocol):
<connect: connect with host node; Connect with auxiliary host node.
<send and the reception data: send data to host node, and reception is replied; Send data to auxiliary host node, and reception is replied; Host node replied and assist host node to reply compare, this function is equivalent to realize to the inconsistent detection of data.If discovery host node 1 is inconsistent with the reply data that auxiliary host node 2 is sent, should trigger the data sync operation between the Control Node.
<transmission data 〉: send data to host node; Send data to auxiliary host node.
<disconnection connects 〉: disconnection is connected with host node, disconnects being connected with auxiliary host node.
Ordinary node is communicated by letter respectively with auxiliary host node to host node, when same Control Node receives the Data Update operation requests of two identical (repetitions), can successively cause twice identical renewal operation.If operation is fallen and is added type, to the adding up of certain global variable, obviously error in data will appear for example, if resource has been shared in the application of mutual exclusion in the operation, also may cause the deadlock of Control Node, produce error in data.Therefore, must avoid same Control Node to receive the Data Update operation requests of repetition.
For this reason, need be to intercommunication primitive correct above-mentioned:
The process that connects is as shown in Figure 9:
Step 600: beginning;
Step 601: host node sends replying of " whether host node being identical with auxiliary host node ";
Step 602: receive and reply;
Step 603: judge whether host node and auxiliary host node are same nodes, if, execution in step 604; If not, execution in step 605;
Step 604: global flag is set;
Step 605: connect with auxiliary host node;
Step 606: finish.
Other primitive: read global flag,, abandon the operation of corresponding and auxiliary host node if find that host node is identical with auxiliary host node.
The best way that guarantees data consistency makes all strange land Data Update operations act in agreement exactly.For external event, the Data Update of acting in agreement is to guarantee easily, because Data Update operation in strange land has same cradle.And being steering logic (usually by some periodic inspection tasks) by control module self, caused internal event, it is just very difficult to guarantee that these data among nodes renewal operations act in agreement, because do not have a unified global clock between a plurality of Control Node.
In the present embodiment, for internal event, the Data Update operation in order to realize acting in agreement should add synchronous operation, as Fig. 8 between host node and auxiliary host node.We why can add synchronous operation be because: under identical data environment, if a Control Node has produced certain internal event, another Control Node also will inevitably produce identical internal event.
When auxiliary host node moves, also might cause the inconsistent of data.Therefore, (auxiliary host node) determined assist the migration of host node as long as host node---and regardless of this operation is to carry out at once or wait in line, and before migration was finished, host node all should no longer continue to receive new Data Update task.
In addition, the control module repeated message of also tackling different nodes is judged.Because ordinary node has independence to the Control Node reporting state information, so the situation that a plurality of nodes are reported to Control Node with regard to same failure problems (for example, auxiliary host node fault) may occur.Whether system must repeat to judge that this is relevant with the Message Processing pattern that system is adopted to message.Exist tangible CLIENT/SERVER relation in group of planes inside, the message report person is CLIENT, and Message Processing person is SERVER.From the program architecture analysis, control module comes down to the SERVER program in the CLIENT/SERVER structure.In order to improve the processing power of message, control module adopts concurrent server model (multi-process) usually, and the processing of message lacks mutually to be coordinated.System must avoid the repeatedly processing to identical message, therefore should judge the validity (whether out-of-date) of message, if find to receive repetition message, should abandon.
It should be noted that at last: above embodiment is only unrestricted in order to explanation the present invention, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement the present invention, and not breaking away from the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.
Claims (12)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB021594813A CN1302411C (en) | 2002-12-31 | 2002-12-31 | Central control method for large machine group system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB021594813A CN1302411C (en) | 2002-12-31 | 2002-12-31 | Central control method for large machine group system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1512376A CN1512376A (en) | 2004-07-14 |
| CN1302411C true CN1302411C (en) | 2007-02-28 |
Family
ID=34237493
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB021594813A Expired - Fee Related CN1302411C (en) | 2002-12-31 | 2002-12-31 | Central control method for large machine group system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1302411C (en) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7734753B2 (en) * | 2004-10-12 | 2010-06-08 | International Business Machines Corporation | Apparatus, system, and method for facilitating management of logical nodes through a single management module |
| CN100490446C (en) * | 2005-03-22 | 2009-05-20 | 中国科学院计算技术研究所 | Synchronous receiving system of machine group |
| CN100452798C (en) * | 2005-03-24 | 2009-01-14 | 中国科学院计算技术研究所 | High-reliability system of machine group and design method thereof |
| US7827137B2 (en) * | 2007-04-19 | 2010-11-02 | Emc Corporation | Seeding replication |
| CN102904761B (en) * | 2012-10-24 | 2016-08-17 | 浙江宇视科技有限公司 | The method of a kind of NVR stacking and NVR |
| CN103095845B (en) * | 2013-01-31 | 2016-08-03 | 汉柏科技有限公司 | A kind of method and system realizing distributed communication |
| CN104518995B (en) * | 2013-09-26 | 2018-09-21 | 中国电信股份有限公司 | Interchanger virtualization system based on distributed structure/architecture |
| CN103986771A (en) * | 2014-05-22 | 2014-08-13 | 浪潮电子信息产业股份有限公司 | High-availability cluster management method independent of shared storage |
| US9424149B2 (en) * | 2014-07-01 | 2016-08-23 | Sas Institute Inc. | Systems and methods for fault tolerant communications |
| CN105187548A (en) * | 2015-09-25 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | Cluster monitoring information collection method and system |
| CN107257298A (en) * | 2017-07-27 | 2017-10-17 | 郑州云海信息技术有限公司 | A kind of fault handling method and device |
| CN113156803A (en) * | 2021-02-03 | 2021-07-23 | 南京华鹞信息科技有限公司 | Task-oriented unmanned aerial vehicle cluster resource management and fault-tolerant control method |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1308278A (en) * | 2001-02-15 | 2001-08-15 | 华中科技大学 | IP fault-tolerant method for colony server |
| WO2001084338A2 (en) * | 2000-05-02 | 2001-11-08 | Sun Microsystems, Inc. | Cluster configuration repository |
| CN1336589A (en) * | 2000-07-28 | 2002-02-20 | 国际商业机器公司 | Method and system for failure recovery for data management and application program |
| WO2002021276A1 (en) * | 2000-09-08 | 2002-03-14 | Goahead Software Inc>. | A system and method for managing clusters containing multiple nodes |
-
2002
- 2002-12-31 CN CNB021594813A patent/CN1302411C/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001084338A2 (en) * | 2000-05-02 | 2001-11-08 | Sun Microsystems, Inc. | Cluster configuration repository |
| CN1336589A (en) * | 2000-07-28 | 2002-02-20 | 国际商业机器公司 | Method and system for failure recovery for data management and application program |
| WO2002021276A1 (en) * | 2000-09-08 | 2002-03-14 | Goahead Software Inc>. | A system and method for managing clusters containing multiple nodes |
| CN1308278A (en) * | 2001-02-15 | 2001-08-15 | 华中科技大学 | IP fault-tolerant method for colony server |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1512376A (en) | 2004-07-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1302411C (en) | Central control method for large machine group system | |
| CN1224905C (en) | Resource action in clustered computer system incorporating prepare operation | |
| CN1190733C (en) | Method and system for failure recovery for data management and application program | |
| CN104199666B (en) | A kind of application program Dynamic Configuration and device | |
| CN101038591A (en) | Method and system for synchronizing data base | |
| CN101887367B (en) | Multi-level parallel programming method | |
| CN101030154A (en) | System and method for relocating running applications to topologically remotely located computing systems | |
| CN1312922A (en) | Fault tolerant computer system | |
| CN1258728C (en) | Application method and network system of fully distributed protected information processing system real-time database | |
| CN1946058A (en) | Soft exchange device allopatric disaster recovery solution system and its method for software exchange network | |
| CN1910555A (en) | Geographically distributed clusters | |
| CN1595360A (en) | System and method for executing jobs in a distributed computing architecture | |
| CN101034364A (en) | Method, device and system for implementing RAM date backup | |
| CN1198227A (en) | Device and method of controlling intergroup resource utilization | |
| CN1251103C (en) | Method for improving serviceability of business machine group | |
| CN1177152A (en) | Method for coordinating reproduction of series files of one access file | |
| CN111984274B (en) | Method and device for automatically deploying ETCD cluster by one key | |
| CN112559143A (en) | Task scheduling method and system and computing device | |
| CN106850598B (en) | Uniform resource management system and method for whole-ship computing environment | |
| CN1866283A (en) | System and method for implementing regular system triggering | |
| CN1630160A (en) | Method for establishing electric network monitoring disaster back-up system | |
| CN105446805B (en) | Shell script subprocess management method and system | |
| CN1482773A (en) | Implementation of Fault Tolerant Transmission Control Protocol | |
| CN1512329A (en) | Control method for machine group adaptation | |
| CN112667349B (en) | Kubernetes-based distributed election method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070228 Termination date: 20201231 |