[go: up one dir, main page]

CN105634779B - Operation processing method and device of main and standby equipment - Google Patents

Operation processing method and device of main and standby equipment Download PDF

Info

Publication number
CN105634779B
CN105634779B CN201410614104.1A CN201410614104A CN105634779B CN 105634779 B CN105634779 B CN 105634779B CN 201410614104 A CN201410614104 A CN 201410614104A CN 105634779 B CN105634779 B CN 105634779B
Authority
CN
China
Prior art keywords
equipment
host apparatus
stand
link
connectivity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410614104.1A
Other languages
Chinese (zh)
Other versions
CN105634779A (en
Inventor
杨青海
毕忠良
杨骐
尹旺中
陈宗立
朱田
黄文伟
周海山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing ZTE New Software Co Ltd
Original Assignee
Nanjing ZTE New Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing ZTE New Software Co Ltd filed Critical Nanjing ZTE New Software Co Ltd
Priority to CN201410614104.1A priority Critical patent/CN105634779B/en
Priority to PCT/CN2015/073275 priority patent/WO2016070530A1/en
Publication of CN105634779A publication Critical patent/CN105634779A/en
Application granted granted Critical
Publication of CN105634779B publication Critical patent/CN105634779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明提供了一种主备设备的运行处理方法及装置,其中,上述方法包括:在确定主用设备和第一备用设备失联后,主用设备检测该主用设备与其他设备的链路连通性,同时第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除主用设备和备用设备之外的设备;主用设备根据链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理,以及第一备用设备根据链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理。采用上述技术方案,解决了相关技术中由于不区分是节点故障还是链路故障,而导致降低系统稳定性的问题,进而达到了增强了系统稳定性的效果。

The present invention provides an operation processing method and device for active and standby equipment, wherein the method includes: after determining that the active equipment and the first standby equipment are out of contact, the active equipment detects the links between the active equipment and other equipment At the same time, the first backup device detects the link connectivity between the first backup device and other devices, wherein the other devices are in the cluster system where the master device and the first backup device are located. Devices other than the active device and the backup device; the active device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity, and the first backup device processes the operation of the primary device and/or the first backup device according to the detection result of the link connectivity The operation of the primary device and/or the first backup device is handled. By adopting the above technical solution, the problem in the related art that the stability of the system is reduced due to not distinguishing whether it is a node failure or a link failure is solved, thereby achieving the effect of enhancing the system stability.

Description

主备设备的运行处理方法及装置Operation processing method and device of main and standby equipment

技术领域technical field

本发明涉及通信领域,具体而言,涉及一种主备设备的运行处理方法及装置。The present invention relates to the communication field, in particular, to a method and device for operating and processing master and backup equipment.

背景技术Background technique

在集群系统中,一般承载的通常会配置为主用设备和备用设备的形式。其中,主用设备主要执行相关功能;而备用设备作为主用设备的备份存在。当主用设备宕机时,备用设备就会升级为主用设备接替原主用设备的相关工作,维持业务的不中断,而当备用设备宕机时,主用设备会重新选择出新的备用设备。基于上述技术方案,主用设备和备用红色贝协同增强的系统的稳定性。In a cluster system, the general bearer is usually configured in the form of active equipment and standby equipment. Wherein, the active device mainly performs related functions; and the standby device exists as a backup of the active device. When the primary device is down, the backup device will be upgraded to the primary device to take over the work of the original primary device to maintain uninterrupted business. When the backup device is down, the primary device will reselect a new backup device. Based on the above technical solution, the stability of the system enhanced by the cooperation of the active equipment and the standby red shell.

现有技术中,集群系统的主用设备和备用设备间一般使用心跳报文进行保活。当主用设备和备用设备节点保活失效时,并不能确认是节点故障还是链路故障,进而导致主用设备和备用设备按照错误的路径演化,当超过一定时间阈值收不到对端的心跳报文,则认为对方发生了异常,并启动备用设备升级为主用设备或者重新选择备用设备的操作。但是,当主用设备和备用设备之间的保活链路出现闪断并恢复后,系统可能会演化成“双主”(即备用设备也会切换为主用设备,但原主用设备仍然还在运行中)。目前市场上相关技术中在集群设备出现双主后一般都是通过重整系统重启进行恢复,但是这样会降低系统的稳定性,使得用户体验度差。In the prior art, heartbeat messages are generally used to keep alive between the active device and the standby device of the cluster system. When the node keep-alive of the active device and the standby device fails, it is impossible to confirm whether it is a node failure or a link failure, which leads to the evolution of the active device and the backup device along the wrong path. , it will consider that the other party has an abnormality, and start the operation of upgrading the backup device to the active device or reselecting the backup device. However, when the keep-alive link between the active device and the backup device is disconnected and restored, the system may evolve into a "dual-active" (that is, the backup device will also switch to the active device, but the original active device is still in use. running). At present, in the related technologies on the market, after dual masters appear on the cluster devices, the system is generally restored by restarting the system, but this will reduce the stability of the system and make the user experience poor.

针对相关技术中,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,尚未提出有效的解决方案。In the related technology, after the main device and the backup device lose connection, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to the master device, causing two master devices to run in the system, reducing the The problem of system stability and poor user experience has not been proposed yet.

发明内容Contents of the invention

为了解决上述技术问题,本发明提供了一种主备设备的运行处理方法及装置。In order to solve the above technical problems, the present invention provides an operation processing method and device for master and backup equipment.

根据本发明的一个方面,提供了一种主备设备的运行处理方法,包括:在确定主用设备和第一备用设备失联后,所述主用设备检测该主用设备与其他设备的链路连通性,同时所述第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述第一备用设备之外的设备;所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,以及所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。According to one aspect of the present invention, there is provided an operation processing method of a master device, including: after determining that the master device loses contact with the first backup device, the master device detects the link between the master device and other devices At the same time, the first backup device detects the link connectivity between the first backup device and other devices, wherein the other devices are in the trunking system where the active device and the first backup device are located , a device other than the active device and the first backup device; The operation is processed, and the first backup device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity.

优选地,所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,包括:当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the active device processes the operation of the active device and/or the first backup device according to the detection result of the link connectivity, including: when the active device detects that the link When the link is connected, replace the first backup device with the second backup device; when the active device detects that the link is not connected, prohibit the operation of the active device within a second predetermined period of time .

优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first backup device processes the operation of the master device and the first backup device according to the detection result of the link connectivity, including: when the first backup device detects that the link When the path is connected, it is judged whether the main equipment is running; when the main equipment is not running, the first backup equipment is used as the main equipment.

优选地,在所述主用设备正在运行时,则在第三预定时间段内禁止所述第一备用设备运行。Preferably, when the master device is running, the first backup device is prohibited from running within a third predetermined time period.

优选地,通过以下至少之一方式判断所述主用设备是否正在运行:通过所述主用设备和所述第一备用设备外的第三方告知;通过所述第一备用设备在转发面消息传输通道的指定信息检测。Preferably, it is determined whether the active device is running in at least one of the following ways: notification by a third party other than the active device and the first backup device; message transmission on the forwarding plane by the first backup device Channel-specified information detection.

优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路未连通时,则在第一预定时间段内禁止所述第一备用设备运行。Preferably, the first backup device processes the operation of the master device and the first backup device according to the detection result of the link connectivity, including: when the first backup device detects that the link When the road is not connected, the first backup device is prohibited from running within the first predetermined time period.

优选地,确定主用设备和第一备用设备失联,包括:当所述主用设备和/或所述第一备用设备未接收到保活报文时,确定所述主用设备和所述第一备用设备失联。Preferably, determining that the active device is out of contact with the first backup device includes: when the active device and/or the first backup device do not receive a keep alive message, determining that the active device and the first backup device The first backup device lost contact.

根据本发明的另一个方面,还提供了一种主备设备的运行处理系统,包括:主用设备,用于在确定主用设备和第一备用设备失联后,检测该主用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或第一备用设备的运行进行处理,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述第一备用设备之外的设备;所述第一备用设备,用于在所述主用设备检测所述主用设备与其他设备的链路连通性时,检测所述第一备用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。According to another aspect of the present invention, there is also provided an operation processing system for active and standby equipment, including: the active equipment, which is used to detect that the active equipment is disconnected from other active equipment after it is determined that the active equipment is out of contact with the first standby equipment. link connectivity of the device, and process the operation of the active device and/or the first backup device according to the detection result of the link connectivity, wherein the other devices are the active device and the first backup device. In the trunking system where the first backup device is located, a device other than the active device and the first backup device; the first backup device is used to detect the active device on the active device When there is link connectivity with other devices, detect the link connectivity between the first backup device and other devices, and perform a check on the active device and/or the first device according to the link connectivity detection result The operation of the standby device is handled.

优选地,所述主用设备还用于当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;以及当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the master device is further configured to replace the first backup device with a second backup device when the master device detects that the link is connected; and when the master device detects that the link is When the link is not connected, prohibit the operation of the master device within a second predetermined time period.

优选地,所述第一备用设备还用于当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first backup device is further configured to determine whether the active device is running when the first backup device detects that the link is connected; when the active device is not running, The first standby device is used as the active device.

通过本发明,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,解决了相关技术中在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。According to the present invention, after the primary device and the backup device are out of contact, the primary device and the backup device simultaneously detect the link connectivity between each other and other devices in the cluster system, and then according to the detection results of the detected link connectivity The technical solution for processing by the main device and/or the backup device solves the problem that in related technologies, after the main device and the backup device lose connection, the backup device will directly switch to the main device because it cannot distinguish whether it is a node failure or a link failure. The use of equipment causes two main equipment to run in the system, which reduces the system stability and makes the user experience poor, which in turn enhances the system stability and improves the user experience.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:

图1是根据本发明实施例的主备设备的运行处理方法的流程图;FIG. 1 is a flow chart of an operation processing method of a master device and a backup device according to an embodiment of the present invention;

图2是根据本发明优选实施例的集群系统的结构框图;Fig. 2 is a structural block diagram of a cluster system according to a preferred embodiment of the present invention;

图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图;Fig. 3 is a schematic diagram of processing after the active and standby equipment detects link connectivity according to a preferred embodiment of the present invention;

图4是根据本发明实施例的主备设备的运行处理系统的结构框图;Fig. 4 is a structural block diagram of an operation processing system of a master device and a backup device according to an embodiment of the present invention;

图5是根据本发明实施例的主备设备的运行处理系统的另一结构框图。Fig. 5 is another structural block diagram of the operation processing system of the master device and the backup device according to the embodiment of the present invention.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present invention will be described in detail with reference to the drawings and examples. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

相关技术中,由于当主用设备和备用设备保活失效时,并不能确认是设备故障或链路故障,进而导致主用设备和备用设备可能按照错误的路径演化,即备用设备直接切换为主用设备,导致在系统中存在两个主用设备的问题,提供了以下技术方案。In the related technology, when the active device and the backup device fail to keep alive, it cannot be confirmed that it is a device failure or a link failure, which may lead to the evolution of the active device and the backup device along the wrong path, that is, the backup device directly switches to the master device. The device causes the problem that there are two active devices in the system, and the following technical solution is provided.

为了解决上述技术问题,在本实施例中提供了一种主备设备的运行处理方法,图1是根据本发明实施例的主备设备的运行处理方法的流程图,如图1所示,该流程包括如下步骤:In order to solve the above technical problems, this embodiment provides a method for processing the operation of the master and backup devices. Figure 1 is a flowchart of the method for processing the operation of the master and backup devices according to an embodiment of the present invention. As shown in Figure 1, the The process includes the following steps:

步骤S102,在确定主用设备和第一备用设备失联后,主用设备检测该主用设备与其他设备的链路连通性,同时第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,上述其他设备为上述主用设备和第一备用设备所在的集群系统中,除上述主用设备和上述备用设备之外的设备;Step S102, after determining that the master device and the first backup device are out of contact, the master device detects the link connectivity between the master device and other devices, and at the same time, the first backup device detects the link connectivity between the first backup device and other devices. Road connectivity, wherein the above-mentioned other devices are devices other than the above-mentioned primary devices and the above-mentioned backup devices in the cluster system where the above-mentioned primary devices and the first backup devices are located;

步骤S104,主用设备根据上述链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理,以及上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理。Step S104, the master device processes the operation of the master device and/or the first backup device according to the detection result of the link connectivity, and the first backup device processes the operation of the master device according to the detection result of the link connectivity. device and/or the operation of the aforementioned first standby device is handled.

通过上述各个步骤,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the above steps, after the primary device and the backup device lose connection, the primary device and the backup device simultaneously detect their link connectivity with other devices in the cluster system, and then according to the detection results of the detected link connectivity The technical solution for dealing with the main device and/or the backup device, after the connection between the main device and the backup device is lost, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to the main device, causing The two main devices run in the system, which reduces the system stability and makes the user experience poor, and then achieves the effect of enhancing the system stability and improving the user experience.

对于步骤S104所体现的技术方案,在本发明实施例可以从以下几个方面体现:The technical solution embodied in step S104 can be embodied in the following aspects in the embodiment of the present invention:

(1)上述主用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理,包括:当上述主用设备检测上述链路为连通时,则将上述第一备用设备更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。(1) The above-mentioned main device processes the operation of the above-mentioned main device and/or the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned main device detects that the above-mentioned link is connected, then replacing the first backup device with a second backup device; when the master device detects that the link is not connected, prohibiting the master device from running within a second predetermined time period.

(2)第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路为连通时,则判断上述主用设备是否正在运行;在上述主用设备未在运行时,将上述第一备用设备作为主用设备,在上述主用设备正在运行时,则在第三预定时间段内禁止上述第一备用设备运行。(2) The first backup device processes the operation of the above-mentioned active device and the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned first backup device detects that the above-mentioned link is connected, then judge Whether the above-mentioned primary device is running; when the above-mentioned primary device is not running, use the above-mentioned first backup device as the primary device, and when the above-mentioned primary device is running, prohibit the above-mentioned first Standby equipment is running.

(3)上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路未连通时,则在第一预定时间段内禁止上述第一备用设备运行。(3) The above-mentioned first backup device processes the operation of the above-mentioned active device and the above-mentioned first backup device according to the detection result of the above-mentioned link connectivity, including: when the above-mentioned first backup device detects that the above-mentioned link is not connected, then The operation of the above-mentioned first backup device is prohibited within a first predetermined period of time.

需要说明的是,上述(1)-(3),即主用设备根据链路连通性对主用设备和/或备用设备的运行进行处理,以及备用设备根据链路连通性对主用设备和/或备用设备的运行进行处理的过程,是可以结合判断的,主用设备的判断过程和备用设备的判断过程并不矛盾,两个过程是可以共存的。It should be noted that the above (1)-(3), that is, the active device processes the operation of the active device and/or the backup device according to the link connectivity, and the backup device processes the operation of the active device and/or the backup device according to the link connectivity. The processing process of/or the operation of the backup device can be combined with judgment. The judgment process of the main device and the judgment process of the backup device are not contradictory, and the two processes can coexist.

实际上,在主用设备和备用设备失联后,主用设备侧和备用设备侧是同时检测链路的连通性,而当主用设备检测链路的连通性时,当上述主用设备检测上述链路为连通时,则上述主用设备将上述第一备用设备更换为第二备用设备,即链路没有发生故障,但主用设备和备用设备仍然失联,那么说明上述第一备用设备存在故障,需要更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。In fact, after the master device and the backup device lose contact, the master device side and the backup device side detect the connectivity of the link at the same time, and when the master device detects the connectivity of the link, when the master device detects the above-mentioned When the link is connected, the above-mentioned primary device replaces the above-mentioned first backup device with the second backup device, that is, the link does not fail, but the primary device and the backup device are still disconnected, then the above-mentioned first backup device exists failure, it needs to be replaced with the second backup device; when the above-mentioned main device detects that the above-mentioned link is not connected, the operation of the above-mentioned main device is prohibited within the second predetermined time period.

备用设备判断主用设备是否正在运行可以有多种方式,在本发明实施例的一个可选示例中,通过以下至少之一方式判断上述主用设备是否正在运行:通过上述主用设备和上述第一备用设备外的第三方告知;通过上述第一备用设备在转发面消息传输通道的指定信息检测。There are many ways for the backup device to judge whether the main device is running. In an optional example of the embodiment of the present invention, it is judged whether the above-mentioned main device is running by at least one of the following methods: through the above-mentioned main device and the above-mentioned second Notification by a third party outside the backup device; detection of specified information on the forwarding plane message transmission channel of the first backup device.

可选地,在步骤S102中,可以通过执行以下过程确定主用设备和第一备用设备失联:当上述主用设备和/或上述第一备用设备未接收到保活报文时,确定上述主用设备和上述第一备用设备失联。Optionally, in step S102, it may be determined that the master device and the first backup device are out of contact by performing the following process: when the master device and/or the first backup device do not receive a keep-alive message, determine that the The active device loses contact with the above-mentioned first backup device.

综上所述,主用设备和备用设备基于与系统中其他设备的连通状态,计算出自身链路的连通性,其中,连通性取值TURE(T),说明主用设备和备用设备之间的链路为连通的,或FALSE(F),说明主用设备和备用设备之间的链路是未连通的。To sum up, the active device and the standby device calculate the connectivity of their own links based on the connectivity status with other devices in the system, where the value of connectivity is TURE(T), indicating that The link is connected, or FALSE (F), indicating that the link between the active device and the backup device is not connected.

并且,主用设备和备用设备间采用双向保活,双向检测,以保证链路两端同一时间内感知保活链路的状态变化。任一方向保活失效,则判定主用设备和备用设备失联。In addition, two-way keep-alive and two-way detection are used between the active device and the backup device to ensure that both ends of the link perceive the status changes of the keep-alive link at the same time. If the keepalive fails in either direction, it is determined that the active device and the backup device are disconnected.

本发明实施例提供的上述技术方案:在主用设备和备用设备失联后,主备设备计算自身链路的连通性,计算结果可能出现如下4种情况,如下表一所示:The above-mentioned technical solution provided by the embodiment of the present invention: After the primary device and the backup device lose connection, the primary and backup devices calculate the connectivity of their own links, and the calculation results may appear in the following four situations, as shown in Table 1 below:

表一Table I

主(T)Master (T) 主(F)Master (F) 备(T)Prepare (T) TTTT FTFT 备(F)Prepare (F) TFTF FFFF

以下对上述四种情况进行简单说明:The following is a brief description of the above four situations:

a)TT,主备设备连通性检测都是TURE(T);a) TT, the connectivity detection of the active and standby equipment is both TURE(T);

主用设备选择新的备用设备;而备用设备通过第3方机制探测主用设备是否在位,探测结果主用设备正在运行则备用设备在预设时间段内暂停使用,探测结果为主用设备没有在运行则备用设备转为主用设备。The primary device selects a new backup device; the backup device detects whether the primary device is in place through a third-party mechanism. If the detection result is that the primary device is running, the backup device will be suspended for a preset period of time, and the detection result is the primary device. If it is not running, the standby device becomes the active device.

b)FT,主用设备连通性检测是FALSE(F),备用设备连通性检测是TURE(T);b) FT, the connectivity detection of the primary device is FALSE (F), and the connectivity detection of the backup device is TURE (T);

主在预设时间内暂停使用;备用设备通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果主用设备没有正在运行则备用设备转为主用设备。The primary device is suspended for a preset time; the standby device detects whether the primary device is running through a third-party mechanism. If the detection result is that the primary device is running, the backup device is suspended. Primary device.

c)TF,主用设备连通性检测是TURE(T),备用设备连通性检测是FALSE(F);c) TF, the connectivity detection of the primary device is TURE (T), and the connectivity detection of the backup device is FALSE (F);

主用设备选择新的备用设备;备用设备暂停使用。The primary device selects a new backup device; the backup device is suspended.

d)FF,主备连通性检测都是FALSE(F);d) FF, the primary and backup connectivity checks are both FALSE (F);

主用设备和备用设备暂停使用。The active equipment and the standby equipment are suspended.

上述四种情况可以大概总结为:主用设备和备用设备中无论哪方检测连通性是FASLE(F),均在预定时间段内暂停运行;主用设备和备用设备中无论哪方检测连通性是TURE(T),主用设备则选择新的备用设备;备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主正在运行则备用设备暂停使用,探测结果主没有在运行则备用设备转为主用设备。The above four situations can be roughly summarized as follows: no matter which one of the main equipment and the standby equipment detects that the connectivity is FASLE(F), the operation will be suspended within a predetermined period of time; is TURE(T), the active device selects a new standby device; the standby device detects whether the active device is running through a third-party mechanism, and if the detection result indicates that the active device is running, the standby device will be suspended; if the detection result is not active, then The backup device becomes the active device.

为了更好的理解上述主用设备和备用设备失联后的技术方案,以下结合优选实施例进行说明,但不限定本发明实施例:In order to better understand the above-mentioned technical solution after the main device and the standby device lose connection, the following description will be made in conjunction with preferred embodiments, but not limiting the embodiments of the present invention:

首先,对本发明优选实施例的集群系统进行简单说明,如图2所示,集群系统按照划分为若干个设备。为了方便描述,图2中只描述了3个设备。主用设备和备用设备双向保活,双向检测。主用设备和备用设备与系统中其他设备进行连通性检测。First, a brief description will be given of the cluster system in the preferred embodiment of the present invention. As shown in FIG. 2 , the cluster system is divided into several devices. For convenience of description, only three devices are described in FIG. 2 . Two-way keep-alive and two-way detection for the active device and the backup device. The primary device and the standby device perform connectivity detection with other devices in the system.

其中,主用设备和备用设备与系统中其他设备进行连通性检测,采用基于消息的检测机制,可采用但不限于如下方案:通信链路检测,比如TCP链路、TIPC链路等,异步消息保活,由于上述方案为相关技术中常用的技术手段,本发明实施例对此不再赘述。Among them, the primary device and the backup device perform connectivity detection with other devices in the system, using a message-based detection mechanism, which can adopt but not limited to the following schemes: communication link detection, such as TCP link, TIPC link, etc., asynchronous message Keeping alive, since the foregoing solution is a commonly used technical means in the related art, the embodiment of the present invention will not repeat it here.

图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图,如图3所示,图3所示意的技术方案可以总结为:主用设备和备用设备间互发心跳报文,主用设备和备用设备各自接收和检查收到的报文。通过保活报文检测到主备失联后,连通性检测为FASLE(F)者重启自己;连通性检测为TRUE(T)者,如果是主用设备则选择新的备用设备,如果是备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果未主用设备未正在运行则备用设备转为主用设备。Fig. 3 is a schematic diagram of the processing after the active and standby equipment detects the connection of the link according to a preferred embodiment of the present invention. message, the active device and the standby device receive and check the received message respectively. After detecting the disconnection of the primary and backup devices through the keep-alive message, the device whose connectivity detection is FASLE(F) restarts itself; the device whose connectivity detection is TRUE(T), if it is the primary device, selects a new backup device, and if it is the backup device The device detects whether the active device is running through a third-party mechanism. If the detection result shows that the active device is running, the backup device will be suspended. If the detection result is not that the active device is not running, the backup device will become the active device.

需要说明的是,图3中的“在位”可以理解为是否在运行,“自杀”可以理解为在预定时间段内不使用。It should be noted that "in-position" in FIG. 3 can be understood as whether it is running, and "suicide" can be understood as not being used within a predetermined period of time.

在本发明实施例中,还提供了一种主备设备的运行处理系统,图4为根据本发明实施例的主备设备的运行处理系统的结构框图,如图4所示,包括:In an embodiment of the present invention, an operation processing system of an active/standby device is also provided. FIG. 4 is a structural block diagram of an operation processing system of an active/standby device according to an embodiment of the present invention. As shown in FIG. 4 , it includes:

主用设备40,用于在确定主用设备40和第一备用设备42失联后,检测该主用设备40与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理,其中,其他设备44为主用设备40和第一备用设备42所在的集群系统中,除主用设备40和第一备用设备42之外的设备;The master device 40 is configured to detect the link connectivity between the master device 40 and other devices 44 after determining that the master device 40 and the first backup device 42 are out of contact, and according to the detection result of the link connectivity Processing the operation of the main device 40 and/or the first backup device 42, wherein, other devices 44 are in the cluster system where the main device 40 and the first backup device 42 are located, except the main device 40 and the first backup device Equipment other than 42;

第一备用设备42,用于在主用设备40检测主用设备40与其他设备44的链路连通性时,检测第一备用设备42与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理。The first backup device 42 is configured to detect the link connectivity between the first backup device 42 and other devices 44 when the master device 40 detects the link connectivity between the master device 40 and other devices 44, and according to the link The detection result of the road connectivity is processed for the operation of the master device 40 and/or the first backup device 42 .

通过上述系统内各个设备的综合作用,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the comprehensive function of each device in the above-mentioned system, after the primary device and the backup device lose connection, the primary device and the backup device simultaneously detect the link connectivity with other devices in the cluster system, and then according to the detected link The technical solution for processing the primary device and/or the backup device based on the connectivity detection results. After the primary device and the backup device lose connection, since it is impossible to distinguish whether it is a node failure or a link failure, the backup device will directly convert to The main device causes two main devices to run in the system, which reduces the system stability and makes the user experience poor, which in turn enhances the system stability and improves the user experience.

可选地,如图5所示,主用设备40还用于当主用设备40检测上述链路为连通时,则将第一备用设备42更换为第二备用设备46;以及当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行;第一备用设备42还用于当第一备用设备42检测上述链路为连通时,则判断主用设备40是否正在运行;在主用设备40未在运行时,将第一备用设备42作为主用设备。Optionally, as shown in FIG. 5 , the master device 40 is also used to replace the first backup device 42 with the second backup device 46 when the master device 40 detects that the above-mentioned link is connected; and when the above-mentioned master device When it is detected that the above-mentioned link is not connected, the operation of the above-mentioned main device is prohibited within the second predetermined time period; the first backup device 42 is also used to judge that the main device is connected when the first backup device 42 detects that the above-mentioned link is connected. 40 is running; when the master device 40 is not running, use the first backup device 42 as the master device.

综上所述,本发明实施例达到了以下技术效果:解决了相关技术中备用设备直接切换而导致的“双主”的问题,正确的检测保活链路的实际状况,并制定系统统一的演化路径,避免设备单独演化,以防止出现上述双主现象的发生,提高了系统的稳定性。To sum up, the embodiment of the present invention achieves the following technical effects: it solves the problem of "dual master" caused by the direct switching of the standby equipment in the related art, correctly detects the actual status of the keep-alive link, and formulates a unified system The evolution path avoids the separate evolution of the device, so as to prevent the occurrence of the above-mentioned dual master phenomenon and improve the stability of the system.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as a read-only memory (ROM, Read -Only Memory), random access memory (RAM, Random AccessMemory), magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, computer, server, or network equipment, etc.) execute the present invention methods described in the various examples.

需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的对象在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here The steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present invention is not limited to any specific combination of hardware and software.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (10)

1. a kind of operation processing method of master/slave device characterized by comprising
After determining host apparatus and the first stand-by equipment lost contact, the host apparatus detects the host apparatus and other equipment Connectivity of link, while first stand-by equipment detects the connectivity of link of first stand-by equipment and other equipment, wherein The other equipment are in the host apparatus and group system where first stand-by equipment, except the host apparatus and Equipment except first stand-by equipment;
The host apparatus spare sets the host apparatus and/or described first according to the testing result of the connectivity of link Standby operation is handled and first stand-by equipment primary is set according to the testing result of the connectivity of link to described The operation of standby and/or described first stand-by equipment is handled.
2. the method according to claim 1, wherein detection of the host apparatus according to the connectivity of link As a result the operation of the host apparatus and/or first stand-by equipment is handled, comprising:
When it is connection that the host apparatus, which detects the link, then first stand-by equipment is changed to second and spare set It is standby;
When the host apparatus, which detects the link, not to be connected to, then the host apparatus is forbidden to transport in the second predetermined amount of time Row.
3. the method according to claim 1, wherein first stand-by equipment is according to the connectivity of link Testing result handles the operation of the host apparatus and first stand-by equipment, comprising:
When it is connection that first stand-by equipment, which detects the link, then judge whether the host apparatus is currently running;
The host apparatus not at runtime, using first stand-by equipment as host apparatus.
4. according to the method described in claim 3, it is characterized in that,
When the host apparatus is currently running, then first stand-by equipment is forbidden to run in third predetermined amount of time.
5. according to the described in any item methods of claim 3 or 4, which is characterized in that judge institute by least one of mode State whether host apparatus is currently running:
It is informed by the third party outside the host apparatus and first stand-by equipment;
Specify information by first stand-by equipment in forwarding surface message transmission channel detects.
6. the method according to claim 1, wherein first stand-by equipment is according to the connectivity of link Testing result handles the operation of the host apparatus and first stand-by equipment, comprising:
When first stand-by equipment, which detects the link, not to be connected to, then forbid in the first predetermined amount of time described first standby It is run with equipment.
7. method according to any one of claims 1 to 4, which is characterized in that determine host apparatus and the first stand-by equipment Lost contact, comprising:
When the host apparatus and/or first stand-by equipment do not receive keep alive Packet, determine the host apparatus and The first stand-by equipment lost contact.
8. a kind of operation processing system of master/slave device characterized by comprising
Host apparatus, for detecting the host apparatus and other equipment after determining host apparatus and the first stand-by equipment lost contact Connectivity of link, and according to the testing result of the connectivity of link to the host apparatus and/or the first stand-by equipment Operation handled, wherein the other equipment are the host apparatus and the cluster system where first stand-by equipment Equipment in system, in addition to the host apparatus and first stand-by equipment;
First stand-by equipment, for detecting the connectivity of link of the host apparatus and other equipment in the host apparatus When, detect the connectivity of link of first stand-by equipment and other equipment, and the detection knot according to the connectivity of link Fruit handles the operation of the host apparatus and/or first stand-by equipment.
9. system according to claim 8, which is characterized in that the host apparatus is also used to detect when the host apparatus When the link is connection, then first stand-by equipment is changed to the second stand-by equipment;And when the host apparatus is examined When surveying the link and not being connected to, then the host apparatus is forbidden to run in the second predetermined amount of time.
10. system according to claim 8, which is characterized in that first stand-by equipment is also used to when described first is standby Detecting the link with equipment is when being connected to, then to judge whether the host apparatus is currently running;Do not exist in the host apparatus When operation, using first stand-by equipment as host apparatus.
CN201410614104.1A 2014-11-04 2014-11-04 Operation processing method and device of main and standby equipment Active CN105634779B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410614104.1A CN105634779B (en) 2014-11-04 2014-11-04 Operation processing method and device of main and standby equipment
PCT/CN2015/073275 WO2016070530A1 (en) 2014-11-04 2015-02-25 Method and system for processing operation of primary and standby device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410614104.1A CN105634779B (en) 2014-11-04 2014-11-04 Operation processing method and device of main and standby equipment

Publications (2)

Publication Number Publication Date
CN105634779A CN105634779A (en) 2016-06-01
CN105634779B true CN105634779B (en) 2019-09-03

Family

ID=55908461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410614104.1A Active CN105634779B (en) 2014-11-04 2014-11-04 Operation processing method and device of main and standby equipment

Country Status (2)

Country Link
CN (1) CN105634779B (en)
WO (1) WO2016070530A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019036892A1 (en) * 2017-08-22 2019-02-28 深圳瀚飞科技开发有限公司 Remote communication detection system and detection method for online monitoring platform
CN107688547B (en) * 2017-08-23 2020-06-16 苏州浪潮智能科技有限公司 A method and system for switching between active and standby controllers
CN107579860A (en) * 2017-09-29 2018-01-12 新华三技术有限公司 Node electoral machinery and device
CN109728981A (en) * 2019-03-19 2019-05-07 江苏汇智达信息科技有限公司 A kind of cloud platform fault monitoring method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101207408A (en) * 2006-12-22 2008-06-25 中兴通讯股份有限公司 Apparatus and method of synthesis fault detection for main-spare taking turns
CN101674199A (en) * 2009-09-22 2010-03-17 中兴通讯股份有限公司 Method for realizing switching during network fault and finders
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN102480423A (en) * 2010-11-30 2012-05-30 中兴通讯股份有限公司 Method and system for protecting layer 2 tunneling protocol (L2TP) network
CN103560955A (en) * 2013-10-24 2014-02-05 华为技术有限公司 Method and device for switching between redundancy devices

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8094569B2 (en) * 2008-12-05 2012-01-10 Cisco Technology, Inc. Failover and failback of communication between a router and a network switch
US8244125B2 (en) * 2009-01-21 2012-08-14 Calix, Inc. Passive optical network protection switching
CN102742222B (en) * 2011-06-29 2015-05-13 华为技术有限公司 Method and apparatus for maintaining connectivity of transmission lines
US8675479B2 (en) * 2011-07-12 2014-03-18 Tellabs Operations, Inc. Methods and apparatus for improving network communication using ethernet switching protection
CN103931139B (en) * 2013-03-19 2017-02-15 华为技术有限公司 Method and device for redundancy protection, and device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101207408A (en) * 2006-12-22 2008-06-25 中兴通讯股份有限公司 Apparatus and method of synthesis fault detection for main-spare taking turns
CN101674199A (en) * 2009-09-22 2010-03-17 中兴通讯股份有限公司 Method for realizing switching during network fault and finders
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN102480423A (en) * 2010-11-30 2012-05-30 中兴通讯股份有限公司 Method and system for protecting layer 2 tunneling protocol (L2TP) network
CN103560955A (en) * 2013-10-24 2014-02-05 华为技术有限公司 Method and device for switching between redundancy devices

Also Published As

Publication number Publication date
CN105634779A (en) 2016-06-01
WO2016070530A1 (en) 2016-05-12

Similar Documents

Publication Publication Date Title
US10764119B2 (en) Link handover method for service in storage system, and storage device
CN106330475B (en) A method and device for managing active and standby nodes in a communication system and a high-availability cluster
CN110730125B (en) Message forwarding method and device, dual-active system and communication equipment
CN105634779B (en) Operation processing method and device of main and standby equipment
US9417939B2 (en) Dynamic escalation of service conditions
CN103560898B (en) A kind of port status method to set up, the system of selection of port priority and device
WO2016095344A1 (en) Link switching method and device, and line card
CN103491134A (en) Container monitoring method and device and agency service system
WO2018107891A1 (en) Network-communication function exception processing method, application processor, and computer storage medium
CN108092857A (en) A kind of distributed system heartbeat detecting method and relevant apparatus
WO2017071384A1 (en) Message processing method and apparatus
CN113535480A (en) Data disaster recovery system and method
CN105376785A (en) Processing method for network communication function abnormity, application processor and mobile terminal
WO2017036165A1 (en) Link fault detection method and apparatus
CN110708275A (en) Protocol message processing method and device
CN105517030A (en) Method for processing abnormality of network communication function, modem and mobile terminal
CN110661599B (en) HA implementation method, device and storage medium between main node and standby node
CN117370316A (en) High availability management method and device for database, electronic equipment and storage medium
CN110603798B (en) System and method for providing elastic consistency platform with high availability
WO2017146718A1 (en) Ring protection network division
US10491421B2 (en) Ring protection network module
CN105376787B (en) A processing method and application processor for network communication function abnormality
CN115296982A (en) Node switching method and device based on database, electronic equipment and storage medium
CN105071960B (en) Pseudo-wire dispositions method, fault handling method, relevant device and dual-homing protection system
CN108174417B (en) Main/standby switching method and device, related electronic equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190717

Address after: 210012 Nanjing, Yuhuatai District, South Street, Bauhinia Road, No. 68

Applicant after: Nanjing Zhongxing Software Co., Ltd.

Address before: 518057 Nanshan District science and technology, Guangdong Province, South Road, No. 55, No.

Applicant before: ZTE Corporation

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant