CN105634779B - Operation processing method and device of main and standby equipment - Google Patents
Operation processing method and device of main and standby equipment Download PDFInfo
- Publication number
- CN105634779B CN105634779B CN201410614104.1A CN201410614104A CN105634779B CN 105634779 B CN105634779 B CN 105634779B CN 201410614104 A CN201410614104 A CN 201410614104A CN 105634779 B CN105634779 B CN 105634779B
- Authority
- CN
- China
- Prior art keywords
- equipment
- host apparatus
- stand
- link
- connectivity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 6
- 238000001514 detection method Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000005540 biological transmission Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims 1
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 22
- 230000000694 effects Effects 0.000 abstract description 3
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 206010010144 Completed suicide Diseases 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- OHKOGUYZJXTSFX-KZFFXBSXSA-N ticarcillin Chemical compound C=1([C@@H](C(O)=O)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)C=CSC=1 OHKOGUYZJXTSFX-KZFFXBSXSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0811—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Hardware Redundancy (AREA)
Abstract
本发明提供了一种主备设备的运行处理方法及装置,其中,上述方法包括:在确定主用设备和第一备用设备失联后,主用设备检测该主用设备与其他设备的链路连通性,同时第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除主用设备和备用设备之外的设备;主用设备根据链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理,以及第一备用设备根据链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理。采用上述技术方案,解决了相关技术中由于不区分是节点故障还是链路故障,而导致降低系统稳定性的问题,进而达到了增强了系统稳定性的效果。
The present invention provides an operation processing method and device for active and standby equipment, wherein the method includes: after determining that the active equipment and the first standby equipment are out of contact, the active equipment detects the links between the active equipment and other equipment At the same time, the first backup device detects the link connectivity between the first backup device and other devices, wherein the other devices are in the cluster system where the master device and the first backup device are located. Devices other than the active device and the backup device; the active device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity, and the first backup device processes the operation of the primary device and/or the first backup device according to the detection result of the link connectivity The operation of the primary device and/or the first backup device is handled. By adopting the above technical solution, the problem in the related art that the stability of the system is reduced due to not distinguishing whether it is a node failure or a link failure is solved, thereby achieving the effect of enhancing the system stability.
Description
技术领域technical field
本发明涉及通信领域,具体而言,涉及一种主备设备的运行处理方法及装置。The present invention relates to the communication field, in particular, to a method and device for operating and processing master and backup equipment.
背景技术Background technique
在集群系统中,一般承载的通常会配置为主用设备和备用设备的形式。其中,主用设备主要执行相关功能;而备用设备作为主用设备的备份存在。当主用设备宕机时,备用设备就会升级为主用设备接替原主用设备的相关工作,维持业务的不中断,而当备用设备宕机时,主用设备会重新选择出新的备用设备。基于上述技术方案,主用设备和备用红色贝协同增强的系统的稳定性。In a cluster system, the general bearer is usually configured in the form of active equipment and standby equipment. Wherein, the active device mainly performs related functions; and the standby device exists as a backup of the active device. When the primary device is down, the backup device will be upgraded to the primary device to take over the work of the original primary device to maintain uninterrupted business. When the backup device is down, the primary device will reselect a new backup device. Based on the above technical solution, the stability of the system enhanced by the cooperation of the active equipment and the standby red shell.
现有技术中,集群系统的主用设备和备用设备间一般使用心跳报文进行保活。当主用设备和备用设备节点保活失效时,并不能确认是节点故障还是链路故障,进而导致主用设备和备用设备按照错误的路径演化,当超过一定时间阈值收不到对端的心跳报文,则认为对方发生了异常,并启动备用设备升级为主用设备或者重新选择备用设备的操作。但是,当主用设备和备用设备之间的保活链路出现闪断并恢复后,系统可能会演化成“双主”(即备用设备也会切换为主用设备,但原主用设备仍然还在运行中)。目前市场上相关技术中在集群设备出现双主后一般都是通过重整系统重启进行恢复,但是这样会降低系统的稳定性,使得用户体验度差。In the prior art, heartbeat messages are generally used to keep alive between the active device and the standby device of the cluster system. When the node keep-alive of the active device and the standby device fails, it is impossible to confirm whether it is a node failure or a link failure, which leads to the evolution of the active device and the backup device along the wrong path. , it will consider that the other party has an abnormality, and start the operation of upgrading the backup device to the active device or reselecting the backup device. However, when the keep-alive link between the active device and the backup device is disconnected and restored, the system may evolve into a "dual-active" (that is, the backup device will also switch to the active device, but the original active device is still in use. running). At present, in the related technologies on the market, after dual masters appear on the cluster devices, the system is generally restored by restarting the system, but this will reduce the stability of the system and make the user experience poor.
针对相关技术中,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,尚未提出有效的解决方案。In the related technology, after the main device and the backup device lose connection, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to the master device, causing two master devices to run in the system, reducing the The problem of system stability and poor user experience has not been proposed yet.
发明内容Contents of the invention
为了解决上述技术问题,本发明提供了一种主备设备的运行处理方法及装置。In order to solve the above technical problems, the present invention provides an operation processing method and device for master and backup equipment.
根据本发明的一个方面,提供了一种主备设备的运行处理方法,包括:在确定主用设备和第一备用设备失联后,所述主用设备检测该主用设备与其他设备的链路连通性,同时所述第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述第一备用设备之外的设备;所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,以及所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。According to one aspect of the present invention, there is provided an operation processing method of a master device, including: after determining that the master device loses contact with the first backup device, the master device detects the link between the master device and other devices At the same time, the first backup device detects the link connectivity between the first backup device and other devices, wherein the other devices are in the trunking system where the active device and the first backup device are located , a device other than the active device and the first backup device; The operation is processed, and the first backup device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity.
优选地,所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,包括:当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the active device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity, including: when the active device detects that the link When the link is connected, replace the first backup device with the second backup device; when the active device detects that the link is not connected, prohibit the operation of the active device within a second predetermined period of time .
优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first backup device processes the operation of the master device and the first backup device according to the detection result of the link connectivity, including: when the first backup device detects that the link When the path is connected, it is judged whether the main equipment is running; when the main equipment is not running, the first backup equipment is used as the main equipment.
优选地,在所述主用设备正在运行时,则在第三预定时间段内禁止所述第一备用设备运行。Preferably, when the master device is running, the first backup device is prohibited from running within a third predetermined time period.
优选地,通过以下至少之一方式判断所述主用设备是否正在运行:通过所述主用设备和所述第一备用设备外的第三方告知;通过所述第一备用设备在转发面消息传输通道的指定信息检测。Preferably, it is determined whether the active device is running in at least one of the following ways: notification by a third party other than the active device and the first backup device; message transmission on the forwarding plane by the first backup device Channel-specified information detection.
优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路未连通时,则在第一预定时间段内禁止所述第一备用设备运行。Preferably, the first backup device processes the operation of the master device and the first backup device according to the detection result of the link connectivity, including: when the first backup device detects that the link When the road is not connected, the first backup device is prohibited from running within the first predetermined time period.
优选地,确定主用设备和第一备用设备失联,包括:当所述主用设备和/或所述第一备用设备未接收到保活报文时,确定所述主用设备和所述第一备用设备失联。Preferably, determining that the active device is out of contact with the first backup device includes: when the active device and/or the first backup device do not receive a keep alive message, determining that the active device and the first backup device The first backup device lost contact.
根据本发明的另一个方面,还提供了一种主备设备的运行处理系统,包括:主用设备,用于在确定主用设备和第一备用设备失联后,检测该主用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或第一备用设备的运行进行处理,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述第一备用设备之外的设备;所述第一备用设备,用于在所述主用设备检测所述主用设备与其他设备的链路连通性时,检测所述第一备用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。According to another aspect of the present invention, there is also provided an operation processing system for active and standby equipment, including: the active equipment, which is used to detect that the active equipment is disconnected from other active equipment after it is determined that the active equipment is out of contact with the first standby equipment. link connectivity of the device, and process the operation of the active device and/or the first backup device according to the detection result of the link connectivity, wherein the other devices are the active device and the first backup device. In the trunking system where the first backup device is located, a device other than the active device and the first backup device; the first backup device is used to detect the active device on the active device When there is link connectivity with other devices, detect the link connectivity between the first backup device and other devices, and perform a check on the active device and/or the first device according to the link connectivity detection result The operation of the standby device is handled.
优选地,所述主用设备还用于当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;以及当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the master device is further configured to replace the first backup device with a second backup device when the master device detects that the link is connected; and when the master device detects that the link is When the link is not connected, prohibit the operation of the master device within a second predetermined time period.
优选地,所述第一备用设备还用于当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first backup device is further configured to determine whether the active device is running when the first backup device detects that the link is connected; when the active device is not running, The first standby device is used as the active device.
通过本发明,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,解决了相关技术中在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。According to the present invention, after the primary device and the backup device are out of contact, the primary device and the backup device simultaneously detect the link connectivity between each other and other devices in the cluster system, and then according to the detection results of the detected link connectivity The technical solution for processing by the main device and/or the backup device solves the problem that in related technologies, after the main device and the backup device lose connection, the backup device will directly switch to the main device because it cannot distinguish whether it is a node failure or a link failure. The use of equipment causes two main equipment to run in the system, which reduces the system stability and makes the user experience poor, which in turn enhances the system stability and improves the user experience.
附图说明Description of drawings
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:
图1是根据本发明实施例的主备设备的运行处理方法的流程图;FIG. 1 is a flow chart of an operation processing method of a master device and a backup device according to an embodiment of the present invention;
图2是根据本发明优选实施例的集群系统的结构框图;Fig. 2 is a structural block diagram of a cluster system according to a preferred embodiment of the present invention;
图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图;Fig. 3 is a schematic diagram of processing after the active and standby equipment detects link connectivity according to a preferred embodiment of the present invention;
图4是根据本发明实施例的主备设备的运行处理系统的结构框图;Fig. 4 is a structural block diagram of an operation processing system of a master device and a backup device according to an embodiment of the present invention;
图5是根据本发明实施例的主备设备的运行处理系统的另一结构框图。Fig. 5 is another structural block diagram of the operation processing system of the master device and the backup device according to the embodiment of the present invention.
具体实施方式Detailed ways
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present invention will be described in detail with reference to the drawings and examples. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
相关技术中,由于当主用设备和备用设备保活失效时,并不能确认是设备故障或链路故障,进而导致主用设备和备用设备可能按照错误的路径演化,即备用设备直接切换为主用设备,导致在系统中存在两个主用设备的问题,提供了以下技术方案。In the related technology, when the active device and the backup device fail to keep alive, it cannot be confirmed that it is a device failure or a link failure, which may lead to the evolution of the active device and the backup device along the wrong path, that is, the backup device directly switches to the master device. The device causes the problem that there are two active devices in the system, and the following technical solution is provided.
为了解决上述技术问题,在本实施例中提供了一种主备设备的运行处理方法,图1是根据本发明实施例的主备设备的运行处理方法的流程图,如图1所示,该流程包括如下步骤:In order to solve the above technical problems, this embodiment provides a method for processing the operation of the master and backup devices. Figure 1 is a flowchart of the method for processing the operation of the master and backup devices according to an embodiment of the present invention. As shown in Figure 1, the The process includes the following steps:
步骤S102,在确定主用设备和第一备用设备失联后,主用设备检测该主用设备与其他设备的链路连通性,同时第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,上述其他设备为上述主用设备和第一备用设备所在的集群系统中,除上述主用设备和上述备用设备之外的设备;Step S102, after determining that the master device and the first backup device are out of contact, the master device detects the link connectivity between the master device and other devices, and at the same time, the first backup device detects the link connectivity between the first backup device and other devices. Road connectivity, wherein the above-mentioned other devices are devices other than the above-mentioned primary devices and the above-mentioned backup devices in the cluster system where the above-mentioned primary devices and the first backup devices are located;
步骤S104,主用设备根据上述链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理,以及上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理。Step S104, the master device processes the operation of the master device and/or the first backup device according to the detection result of the link connectivity, and the first backup device processes the operation of the master device according to the detection result of the link connectivity. device and/or the operation of the aforementioned first standby device is handled.
通过上述各个步骤,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the above steps, after the primary device and the backup device lose connection, the primary device and the backup device simultaneously detect their link connectivity with other devices in the cluster system, and then according to the detection results of the detected link connectivity The technical solution for dealing with the main device and/or the backup device, after the connection between the main device and the backup device is lost, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to the main device, causing The two main devices run in the system, which reduces the system stability and makes the user experience poor, and then achieves the effect of enhancing the system stability and improving the user experience.
对于步骤S104所体现的技术方案,在本发明实施例可以从以下几个方面体现:The technical solution embodied in step S104 can be embodied in the following aspects in the embodiment of the present invention:
(1)上述主用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理,包括:当上述主用设备检测上述链路为连通时,则将上述第一备用设备更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。(1) The above-mentioned main device processes the operation of the above-mentioned main device and/or the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned main device detects that the above-mentioned link is connected, then replacing the first backup device with a second backup device; when the master device detects that the link is not connected, prohibiting the master device from running within a second predetermined time period.
(2)第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路为连通时,则判断上述主用设备是否正在运行;在上述主用设备未在运行时,将上述第一备用设备作为主用设备,在上述主用设备正在运行时,则在第三预定时间段内禁止上述第一备用设备运行。(2) The first backup device processes the operation of the above-mentioned active device and the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned first backup device detects that the above-mentioned link is connected, then judge Whether the above-mentioned primary device is running; when the above-mentioned primary device is not running, use the above-mentioned first backup device as the primary device, and when the above-mentioned primary device is running, prohibit the above-mentioned first Standby equipment is running.
(3)上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路未连通时,则在第一预定时间段内禁止上述第一备用设备运行。(3) The above-mentioned first backup device processes the operation of the above-mentioned active device and the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned first backup device detects that the above-mentioned link is not connected, then The operation of the above-mentioned first backup device is prohibited within a first predetermined period of time.
需要说明的是,上述(1)-(3),即主用设备根据链路连通性对主用设备和/或备用设备的运行进行处理,以及备用设备根据链路连通性对主用设备和/或备用设备的运行进行处理的过程,是可以结合判断的,主用设备的判断过程和备用设备的判断过程并不矛盾,两个过程是可以共存的。It should be noted that the above (1)-(3), that is, the active device processes the operation of the active device and/or the backup device according to the link connectivity, and the backup device processes the operation of the active device and/or the backup device according to the link connectivity. The processing process of/or the operation of the backup device can be combined with judgment. The judgment process of the main device and the judgment process of the backup device are not contradictory, and the two processes can coexist.
实际上,在主用设备和备用设备失联后,主用设备侧和备用设备侧是同时检测链路的连通性,而当主用设备检测链路的连通性时,当上述主用设备检测上述链路为连通时,则上述主用设备将上述第一备用设备更换为第二备用设备,即链路没有发生故障,但主用设备和备用设备仍然失联,那么说明上述第一备用设备存在故障,需要更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。In fact, after the master device and the backup device lose contact, the master device side and the backup device side detect the connectivity of the link at the same time, and when the master device detects the connectivity of the link, when the master device detects the above-mentioned When the link is connected, the above-mentioned primary device replaces the above-mentioned first backup device with the second backup device, that is, the link does not fail, but the primary device and the backup device are still disconnected, then the above-mentioned first backup device exists failure, it needs to be replaced with the second backup device; when the above-mentioned main device detects that the above-mentioned link is not connected, the operation of the above-mentioned main device is prohibited within the second predetermined time period.
备用设备判断主用设备是否正在运行可以有多种方式,在本发明实施例的一个可选示例中,通过以下至少之一方式判断上述主用设备是否正在运行:通过上述主用设备和上述第一备用设备外的第三方告知;通过上述第一备用设备在转发面消息传输通道的指定信息检测。There are many ways for the backup device to judge whether the main device is running. In an optional example of the embodiment of the present invention, it is judged whether the above-mentioned main device is running by at least one of the following methods: through the above-mentioned main device and the above-mentioned second Notification by a third party outside the backup device; detection of specified information on the forwarding plane message transmission channel of the first backup device.
可选地,在步骤S102中,可以通过执行以下过程确定主用设备和第一备用设备失联:当上述主用设备和/或上述第一备用设备未接收到保活报文时,确定上述主用设备和上述第一备用设备失联。Optionally, in step S102, it may be determined that the master device and the first backup device are out of contact by performing the following process: when the master device and/or the first backup device do not receive a keep-alive message, determine that the The active device loses contact with the above-mentioned first backup device.
综上所述,主用设备和备用设备基于与系统中其他设备的连通状态,计算出自身链路的连通性,其中,连通性取值TURE(T),说明主用设备和备用设备之间的链路为连通的,或FALSE(F),说明主用设备和备用设备之间的链路是未连通的。To sum up, the active device and the standby device calculate the connectivity of their own links based on the connectivity status with other devices in the system, where the value of connectivity is TURE(T), indicating that The link is connected, or FALSE (F), indicating that the link between the active device and the backup device is not connected.
并且,主用设备和备用设备间采用双向保活,双向检测,以保证链路两端同一时间内感知保活链路的状态变化。任一方向保活失效,则判定主用设备和备用设备失联。In addition, two-way keep-alive and two-way detection are used between the active device and the backup device to ensure that both ends of the link perceive the status changes of the keep-alive link at the same time. If the keepalive fails in either direction, it is determined that the active device and the backup device are disconnected.
本发明实施例提供的上述技术方案:在主用设备和备用设备失联后,主备设备计算自身链路的连通性,计算结果可能出现如下4种情况,如下表一所示:The above-mentioned technical solution provided by the embodiment of the present invention: After the primary device and the backup device lose connection, the primary and backup devices calculate the connectivity of their own links, and the calculation results may appear in the following four situations, as shown in Table 1 below:
表一Table I
以下对上述四种情况进行简单说明:The following is a brief description of the above four situations:
a)TT,主备设备连通性检测都是TURE(T);a) TT, the connectivity detection of the active and standby equipment is both TURE(T);
主用设备选择新的备用设备;而备用设备通过第3方机制探测主用设备是否在位,探测结果主用设备正在运行则备用设备在预设时间段内暂停使用,探测结果为主用设备没有在运行则备用设备转为主用设备。The primary device selects a new backup device; the backup device detects whether the primary device is in place through a third-party mechanism. If the detection result is that the primary device is running, the backup device will be suspended for a preset period of time, and the detection result is the primary device. If it is not running, the standby device becomes the active device.
b)FT,主用设备连通性检测是FALSE(F),备用设备连通性检测是TURE(T);b) FT, the connectivity detection of the primary device is FALSE (F), and the connectivity detection of the backup device is TURE (T);
主在预设时间内暂停使用;备用设备通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果主用设备没有正在运行则备用设备转为主用设备。The primary device is suspended for a preset time; the standby device detects whether the primary device is running through a third-party mechanism. If the detection result is that the primary device is running, the backup device is suspended. Primary device.
c)TF,主用设备连通性检测是TURE(T),备用设备连通性检测是FALSE(F);c) TF, the connectivity detection of the primary device is TURE (T), and the connectivity detection of the backup device is FALSE (F);
主用设备选择新的备用设备;备用设备暂停使用。The primary device selects a new backup device; the backup device is suspended.
d)FF,主备连通性检测都是FALSE(F);d) FF, the primary and backup connectivity checks are both FALSE (F);
主用设备和备用设备暂停使用。The active equipment and the standby equipment are suspended.
上述四种情况可以大概总结为:主用设备和备用设备中无论哪方检测连通性是FASLE(F),均在预定时间段内暂停运行;主用设备和备用设备中无论哪方检测连通性是TURE(T),主用设备则选择新的备用设备;备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主正在运行则备用设备暂停使用,探测结果主没有在运行则备用设备转为主用设备。The above four situations can be roughly summarized as follows: no matter which one of the main equipment and the standby equipment detects that the connectivity is FASLE(F), the operation will be suspended within a predetermined period of time; is TURE(T), the active device selects a new standby device; the standby device detects whether the active device is running through a third-party mechanism, and if the detection result indicates that the active device is running, the standby device will be suspended; if the detection result is not active, then The backup device becomes the active device.
为了更好的理解上述主用设备和备用设备失联后的技术方案,以下结合优选实施例进行说明,但不限定本发明实施例:In order to better understand the above-mentioned technical solution after the main device and the standby device lose connection, the following description will be made in conjunction with preferred embodiments, but not limiting the embodiments of the present invention:
首先,对本发明优选实施例的集群系统进行简单说明,如图2所示,集群系统按照划分为若干个设备。为了方便描述,图2中只描述了3个设备。主用设备和备用设备双向保活,双向检测。主用设备和备用设备与系统中其他设备进行连通性检测。First, a brief description will be given of the cluster system in the preferred embodiment of the present invention. As shown in FIG. 2 , the cluster system is divided into several devices. For convenience of description, only three devices are described in FIG. 2 . Two-way keep-alive and two-way detection for the active device and the backup device. The primary device and the standby device perform connectivity detection with other devices in the system.
其中,主用设备和备用设备与系统中其他设备进行连通性检测,采用基于消息的检测机制,可采用但不限于如下方案:通信链路检测,比如TCP链路、TIPC链路等,异步消息保活,由于上述方案为相关技术中常用的技术手段,本发明实施例对此不再赘述。Among them, the primary device and the backup device perform connectivity detection with other devices in the system, using a message-based detection mechanism, which can adopt but not limited to the following schemes: communication link detection, such as TCP link, TIPC link, etc., asynchronous message Keeping alive, since the foregoing solution is a commonly used technical means in the related art, the embodiment of the present invention will not repeat it here.
图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图,如图3所示,图3所示意的技术方案可以总结为:主用设备和备用设备间互发心跳报文,主用设备和备用设备各自接收和检查收到的报文。通过保活报文检测到主备失联后,连通性检测为FASLE(F)者重启自己;连通性检测为TRUE(T)者,如果是主用设备则选择新的备用设备,如果是备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果未主用设备未正在运行则备用设备转为主用设备。Fig. 3 is a schematic diagram of the processing after the active and standby equipment detects the connection of the link according to a preferred embodiment of the present invention. message, the active device and the standby device receive and check the received message respectively. After detecting the disconnection of the primary and backup devices through the keep-alive message, the device whose connectivity detection is FASLE(F) restarts itself; the device whose connectivity detection is TRUE(T), if it is the primary device, selects a new backup device, and if it is the backup device The device detects whether the active device is running through a third-party mechanism. If the detection result shows that the active device is running, the backup device will be suspended. If the detection result is not that the active device is not running, the backup device will become the active device.
需要说明的是,图3中的“在位”可以理解为是否在运行,“自杀”可以理解为在预定时间段内不使用。It should be noted that "in-position" in FIG. 3 can be understood as whether it is running, and "suicide" can be understood as not being used within a predetermined period of time.
在本发明实施例中,还提供了一种主备设备的运行处理系统,图4为根据本发明实施例的主备设备的运行处理系统的结构框图,如图4所示,包括:In an embodiment of the present invention, an operation processing system of an active/standby device is also provided. FIG. 4 is a structural block diagram of an operation processing system of an active/standby device according to an embodiment of the present invention. As shown in FIG. 4 , it includes:
主用设备40,用于在确定主用设备40和第一备用设备42失联后,检测该主用设备40与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理,其中,其他设备44为主用设备40和第一备用设备42所在的集群系统中,除主用设备40和第一备用设备42之外的设备;The master device 40 is configured to detect the link connectivity between the master device 40 and other devices 44 after determining that the master device 40 and the first backup device 42 are out of contact, and according to the detection result of the link connectivity Processing the operation of the main device 40 and/or the first backup device 42, wherein, other devices 44 are in the cluster system where the main device 40 and the first backup device 42 are located, except the main device 40 and the first backup device Equipment other than 42;
第一备用设备42,用于在主用设备40检测主用设备40与其他设备44的链路连通性时,检测第一备用设备42与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理。The first backup device 42 is configured to detect the link connectivity between the first backup device 42 and other devices 44 when the master device 40 detects the link connectivity between the master device 40 and other devices 44, and according to the link The detection result of the road connectivity is processed for the operation of the master device 40 and/or the first backup device 42 .
通过上述系统内各个设备的综合作用,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the comprehensive function of each device in the above-mentioned system, after the primary device and the backup device lose connection, the primary device and the backup device simultaneously detect the link connectivity with other devices in the cluster system, and then according to the detected link The technical solution for processing the primary device and/or the backup device based on the connectivity detection results. After the primary device and the backup device lose connection, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to The main device causes two main devices to run in the system, which reduces the system stability and makes the user experience poor, which in turn enhances the system stability and improves the user experience.
可选地,如图5所示,主用设备40还用于当主用设备40检测上述链路为连通时,则将第一备用设备42更换为第二备用设备46;以及当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行;第一备用设备42还用于当第一备用设备42检测上述链路为连通时,则判断主用设备40是否正在运行;在主用设备40未在运行时,将第一备用设备42作为主用设备。Optionally, as shown in FIG. 5 , the master device 40 is also used to replace the first backup device 42 with the second backup device 46 when the master device 40 detects that the above-mentioned link is connected; and when the above-mentioned master device When it is detected that the above-mentioned link is not connected, the operation of the above-mentioned main device is prohibited within the second predetermined time period; the first backup device 42 is also used to judge that the main device is connected when the first backup device 42 detects that the above-mentioned link is connected. 40 is running; when the master device 40 is not running, use the first backup device 42 as the master device.
综上所述,本发明实施例达到了以下技术效果:解决了相关技术中备用设备直接切换而导致的“双主”的问题,正确的检测保活链路的实际状况,并制定系统统一的演化路径,避免设备单独演化,以防止出现上述双主现象的发生,提高了系统的稳定性。To sum up, the embodiment of the present invention achieves the following technical effects: it solves the problem of "dual master" caused by the direct switching of the standby equipment in the related art, correctly detects the actual status of the keep-alive link, and formulates a unified system The evolution path avoids the separate evolution of the device, so as to prevent the occurrence of the above-mentioned dual master phenomenon and improve the stability of the system.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as a read-only memory (ROM, Read -Only Memory), random access memory (RAM, Random AccessMemory), magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, computer, server, or network equipment, etc.) execute the present invention methods described in the various examples.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的对象在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here The steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410614104.1A CN105634779B (en) | 2014-11-04 | 2014-11-04 | Operation processing method and device of main and standby equipment |
PCT/CN2015/073275 WO2016070530A1 (en) | 2014-11-04 | 2015-02-25 | Method and system for processing operation of primary and standby device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410614104.1A CN105634779B (en) | 2014-11-04 | 2014-11-04 | Operation processing method and device of main and standby equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105634779A CN105634779A (en) | 2016-06-01 |
CN105634779B true CN105634779B (en) | 2019-09-03 |
Family
ID=55908461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410614104.1A Active CN105634779B (en) | 2014-11-04 | 2014-11-04 | Operation processing method and device of main and standby equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105634779B (en) |
WO (1) | WO2016070530A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019036892A1 (en) * | 2017-08-22 | 2019-02-28 | 深圳瀚飞科技开发有限公司 | Remote communication detection system and detection method for online monitoring platform |
CN107688547B (en) * | 2017-08-23 | 2020-06-16 | 苏州浪潮智能科技有限公司 | A method and system for switching between active and standby controllers |
CN107579860A (en) * | 2017-09-29 | 2018-01-12 | 新华三技术有限公司 | Node electoral machinery and device |
CN109728981A (en) * | 2019-03-19 | 2019-05-07 | 江苏汇智达信息科技有限公司 | A kind of cloud platform fault monitoring method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101207408A (en) * | 2006-12-22 | 2008-06-25 | 中兴通讯股份有限公司 | Apparatus and method of synthesis fault detection for main-spare taking turns |
CN101674199A (en) * | 2009-09-22 | 2010-03-17 | 中兴通讯股份有限公司 | Method for realizing switching during network fault and finders |
CN101729290A (en) * | 2009-11-04 | 2010-06-09 | 中兴通讯股份有限公司 | Method and device for realizing business system protection |
CN102480423A (en) * | 2010-11-30 | 2012-05-30 | 中兴通讯股份有限公司 | Method and system for protecting layer 2 tunneling protocol (L2TP) network |
CN103560955A (en) * | 2013-10-24 | 2014-02-05 | 华为技术有限公司 | Method and device for switching between redundancy devices |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8094569B2 (en) * | 2008-12-05 | 2012-01-10 | Cisco Technology, Inc. | Failover and failback of communication between a router and a network switch |
US8244125B2 (en) * | 2009-01-21 | 2012-08-14 | Calix, Inc. | Passive optical network protection switching |
CN102742222B (en) * | 2011-06-29 | 2015-05-13 | 华为技术有限公司 | Method and apparatus for maintaining connectivity of transmission lines |
US8675479B2 (en) * | 2011-07-12 | 2014-03-18 | Tellabs Operations, Inc. | Methods and apparatus for improving network communication using ethernet switching protection |
CN103931139B (en) * | 2013-03-19 | 2017-02-15 | 华为技术有限公司 | Method and device for redundancy protection, and device and system |
-
2014
- 2014-11-04 CN CN201410614104.1A patent/CN105634779B/en active Active
-
2015
- 2015-02-25 WO PCT/CN2015/073275 patent/WO2016070530A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101207408A (en) * | 2006-12-22 | 2008-06-25 | 中兴通讯股份有限公司 | Apparatus and method of synthesis fault detection for main-spare taking turns |
CN101674199A (en) * | 2009-09-22 | 2010-03-17 | 中兴通讯股份有限公司 | Method for realizing switching during network fault and finders |
CN101729290A (en) * | 2009-11-04 | 2010-06-09 | 中兴通讯股份有限公司 | Method and device for realizing business system protection |
CN102480423A (en) * | 2010-11-30 | 2012-05-30 | 中兴通讯股份有限公司 | Method and system for protecting layer 2 tunneling protocol (L2TP) network |
CN103560955A (en) * | 2013-10-24 | 2014-02-05 | 华为技术有限公司 | Method and device for switching between redundancy devices |
Also Published As
Publication number | Publication date |
---|---|
CN105634779A (en) | 2016-06-01 |
WO2016070530A1 (en) | 2016-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10764119B2 (en) | Link handover method for service in storage system, and storage device | |
CN106330475B (en) | A method and device for managing active and standby nodes in a communication system and a high-availability cluster | |
CN110730125B (en) | Message forwarding method and device, dual-active system and communication equipment | |
CN105634779B (en) | Operation processing method and device of main and standby equipment | |
US9417939B2 (en) | Dynamic escalation of service conditions | |
CN103560898B (en) | A kind of port status method to set up, the system of selection of port priority and device | |
WO2016095344A1 (en) | Link switching method and device, and line card | |
CN103491134A (en) | Container monitoring method and device and agency service system | |
WO2018107891A1 (en) | Network-communication function exception processing method, application processor, and computer storage medium | |
CN108092857A (en) | A kind of distributed system heartbeat detecting method and relevant apparatus | |
WO2017071384A1 (en) | Message processing method and apparatus | |
CN113535480A (en) | Data disaster recovery system and method | |
CN105376785A (en) | Processing method for network communication function abnormity, application processor and mobile terminal | |
WO2017036165A1 (en) | Link fault detection method and apparatus | |
CN110708275A (en) | Protocol message processing method and device | |
CN105517030A (en) | Method for processing abnormality of network communication function, modem and mobile terminal | |
CN110661599B (en) | HA implementation method, device and storage medium between main node and standby node | |
CN117370316A (en) | High availability management method and device for database, electronic equipment and storage medium | |
CN110603798B (en) | System and method for providing elastic consistency platform with high availability | |
WO2017146718A1 (en) | Ring protection network division | |
US10491421B2 (en) | Ring protection network module | |
CN105376787B (en) | A processing method and application processor for network communication function abnormality | |
CN115296982A (en) | Node switching method and device based on database, electronic equipment and storage medium | |
CN105071960B (en) | Pseudo-wire dispositions method, fault handling method, relevant device and dual-homing protection system | |
CN108174417B (en) | Main/standby switching method and device, related electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190717 Address after: 210012 Nanjing, Yuhuatai District, South Street, Bauhinia Road, No. 68 Applicant after: Nanjing Zhongxing Software Co., Ltd. Address before: 518057 Nanshan District science and technology, Guangdong Province, South Road, No. 55, No. Applicant before: ZTE Corporation |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |