CN115145782A

CN115145782A - A server switching method, MooseFS system and storage medium

Info

Publication number: CN115145782A
Application number: CN202110341290.6A
Authority: CN
Inventors: 奚诚
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-04

Abstract

The embodiment of the present application discloses a server switching method, a MooseFS system and a storage medium. The MooseFS system includes: a main server, a backup server, and a cluster monitoring module, and the server switching method includes: when the backup server monitors that the main server is running abnormally, sending a message to the cluster The monitoring module sends a query request; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines the first service information corresponding to the main server according to the identification information; wherein, the first service information represents the actual running state of the main server; the cluster monitoring module Send a query response to the backup server; wherein, the query response carries the first service information; the primary server and the backup server determine whether to perform server switching according to the first service information. The server switching method proposed in the present application can prevent the active and standby nodes in the MooseFS system from competing for resources, and realize the high availability of the MooseFS system.

Description

A server switching method, MooseFS system and storage medium

技术领域technical field

本发明涉及分布式存储技术领域，尤其涉及一种服务器切换方法，MooseFS系统及存储介质。The invention relates to the technical field of distributed storage, in particular to a server switching method, a MooseFS system and a storage medium.

背景技术Background technique

分布式存储是指将多台单机连接起来组成的一个存储集群，可以合并所有机器的存储和读写能力。MooseFS系统是一种分布式文件系统，通过将文件存储于集群中，解决了单个机器的存储容量受限的问题，同时通过网络介质解决单个机器自身吞吐量的限制，多副本机制保障了数据安全，且系统扩容方便。Distributed storage refers to a storage cluster formed by connecting multiple single machines, which can combine the storage and read-write capabilities of all machines. The MooseFS system is a distributed file system. By storing files in the cluster, it solves the problem of limited storage capacity of a single machine, and at the same time solves the limitation of the throughput of a single machine itself through network media, and the multi-copy mechanism ensures data security. , and the system is easy to expand.

支持IP的动态替换(Ucarp)是共用地址冗余协议(Common Access RedundancyProtocol，CARP)的Linux实现版本，Ucarp允许主服务器和其它服务器共享一个虚拟网络互联协议(Internet Protocol，IP)地址，并且当主服务器出现故障时，其它服务器会自动接替主服务器向集群提供服务。Dynamic replacement of IP (Ucarp) is a Linux implementation of the Common Access Redundancy Protocol (CARP). Ucarp allows the main server and other servers to share a virtual Internet Protocol (Internet Protocol, IP) address, and when the main server In the event of a failure, other servers will automatically take over from the primary server to provide services to the cluster.

然而，在实际应用中，通常会出现误判主服务器出现故障的现象，此时主服务器仍然正常运行并向集群提供服务，但是Ucarp却判定由其它服务器接替主服务器向集群提供服务，使得其它服务器同主服务器争抢资源，进而造成MooseFS系统状态混乱，数据损坏的问题。However, in practical applications, there is usually a misjudgment that the main server is faulty. At this time, the main server is still running normally and provides services to the cluster, but Ucarp decides that other servers should replace the main server to provide services to the cluster, so that other servers can provide services to the cluster. Competing with the main server for resources, which causes the state of the MooseFS system to be chaotic and data corruption.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种服务器切换方法，MooseFS系统及存储介质，能够避免MooseFS系统中主备节点争抢资源，而造成MooseFS系统状态混乱，数据损坏的问题，进而有效实现MooseFS系统的高可用。The embodiments of the present application provide a server switching method, a MooseFS system and a storage medium, which can avoid the problems that the active and standby nodes in the MooseFS system compete for resources, causing the state of the MooseFS system to be chaotic and data corrupt, thereby effectively realizing the high availability of the MooseFS system. .

本申请实施例的技术方案是这样实现的：The technical solutions of the embodiments of the present application are implemented as follows:

第一方面，本申请实施例提供了一种服务器切换方法，所述方法应用于MooseFS系统，其中，所述MooseFS系统包括：主服务器、备份服务器以及集群监控模块，所述方法包括：In a first aspect, an embodiment of the present application provides a server switching method, and the method is applied to a MooseFS system, wherein the MooseFS system includes: a main server, a backup server, and a cluster monitoring module, and the method includes:

当所述备份服务器监控到所述主服务器运行异常时，向所述集群监控模块发送查询请求；其中，所述查询请求携带所述主服务器的标识信息；When the backup server monitors that the primary server is running abnormally, it sends a query request to the cluster monitoring module; wherein, the query request carries the identification information of the primary server;

所述集群监控模块根据所述标识信息确定所述主服务器对应的第一服务信息；其中，所述第一服务信息表征所述主服务器的实际运行状态；The cluster monitoring module determines the first service information corresponding to the main server according to the identification information; wherein, the first service information represents the actual running state of the main server;

所述集群监控模块向所述备份服务器发送查询响应；其中，所述查询响应携带所述第一服务信息；The cluster monitoring module sends a query response to the backup server; wherein, the query response carries the first service information;

所述主服务器和所述备份服务器根据所述第一服务信息，确定是否进行服务器切换。The primary server and the backup server determine whether to perform server switching according to the first service information.

第二方面，本申请实施例提供了一种MooseFS系统，其特征在于，所述MooseFS系统包括：主服务器、备份服务器、以及集群监控模块，其中：In a second aspect, an embodiment of the present application provides a MooseFS system, wherein the MooseFS system includes: a main server, a backup server, and a cluster monitoring module, wherein:

所述主服务器，用于管理所述MooseFS系统，并与所述数据节点进行数据传输；The main server is used to manage the MooseFS system and perform data transmission with the data node;

所述备份服务器，用于监控所述主服务器的运行状态，并保存所述主服务器中的元数据；the backup server, configured to monitor the running state of the primary server and save the metadata in the primary server;

所述集群监控模块，用于监控所述主服务器和所述数据节点的实际运行状态。The cluster monitoring module is used for monitoring the actual running status of the main server and the data node.

第三方面，本申请实施例提供了一种MooseFS系统，所述MooseFS系统包括：主服务器、备份服务器以及集群监控模块，所述MooseFS系统还包括处理器、存储有所述处理器可执行指令的存储器，当所述指令被所述处理器执行时，实现如上所述的服务器切换方法。In a third aspect, an embodiment of the present application provides a MooseFS system, where the MooseFS system includes: a primary server, a backup server, and a cluster monitoring module, the MooseFS system further includes a processor, a processor that stores executable instructions of the processor The memory, when the instructions are executed by the processor, implements the server switching method as described above.

第四方面，本申请实施例提供了一种计算机可读存储介质，其上存储有程序，应用于MooseFS系统中，所述MooseFS系统包括：主服务器、备份服务器以及集群监控模块，其特征在于，所述程序被处理器执行时，实现如上所述的服务器切换方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which a program is stored and applied in a MooseFS system, where the MooseFS system includes: a main server, a backup server, and a cluster monitoring module, characterized in that: When the program is executed by the processor, the server switching method as described above is implemented.

本申请实施例提供了一种服务器切换方法，MooseFS系统及存储介质，MooseFS系统包括主服务器、备份服务器以及集群监控模块，当备份服务器监控到主服务器运行异常时，向集群监控模块发送查询请求；其中，查询请求携带主服务器的标识信息；集群监控模块根据标识信息确定主服务器对应的第一服务信息；其中，第一服务信息表征主服务器的实际运行状态；集群监控模块向备份服务器发送查询响应；其中，查询响应携带第一服务信息；主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换。也就是说，在本申请的实施例中，MooseFS系统中的备份服务器在确定主服务器的当前状态为异常时，并没有直接执行服务器切换处理，而是先通过集群监控模块获取主服务器对应的第一服务信息，通过第一服务信息确定主服务器的实际运行状态，进而确定是否执行服务器切换处理。可见，如果在主节点正常管理整个MooseFS系统的进程中，出现了主备节点之间的通讯中断等异常情况，MooseFS系统通过集群监控模块对主备节点的监控，就可以防止此时备节点对主节点的状态产生误判而争抢资源、争抢服务，进而避免MooseFS系统发生状态混乱，数据损坏的情况，保证了MooseFS系统的高可用。The embodiment of the present application provides a server switching method, a MooseFS system and a storage medium. The MooseFS system includes a main server, a backup server, and a cluster monitoring module. When the backup server monitors that the main server is abnormally running, it sends a query request to the cluster monitoring module; The query request carries the identification information of the main server; the cluster monitoring module determines the first service information corresponding to the main server according to the identification information; wherein the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server ; wherein, the query response carries the first service information; the primary server and the backup server determine whether to perform server switching according to the first service information. That is to say, in the embodiment of the present application, when the backup server in the MooseFS system determines that the current state of the main server is abnormal, it does not directly perform the server switching process, but first obtains the first server corresponding to the main server through the cluster monitoring module. The first service information determines the actual running state of the main server, and then determines whether to perform server switching processing. It can be seen that if there is an abnormal situation such as a communication interruption between the active and standby nodes during the process of the master node's normal management of the entire MooseFS system, the MooseFS system can monitor the active and standby nodes through the cluster monitoring module to prevent the standby nodes from The state of the master node is misjudged and competes for resources and services, thereby avoiding state confusion and data corruption in the MooseFS system, ensuring the high availability of the MooseFS system.

附图说明Description of drawings

图1为本申请实施例提出的MooseFS系统的组成结构示意图一；1 is a schematic diagram of the composition structure of the MooseFS system proposed by the embodiment of the present application;

图2为本申请实施例提出的服务器切换方法的实现流程示意图一FIG. 2 is a schematic diagram 1 of the implementation flow of the server switching method proposed by the embodiment of the present application

图3为本申请实施例提出的服务器切换方法的实现流程示意图二；FIG. 3 is a second implementation flowchart of the server switching method proposed by the embodiment of the present application;

图4为本申请实施例提出的MooseFS系统的组成结构示意图二；FIG. 4 is a schematic diagram 2 of the composition structure of the MooseFS system proposed by the embodiment of the present application;

图5为本申请实施例提出的服务器切换方法的实现流程示意图三；FIG. 5 is a schematic diagram 3 of the implementation flow of the server switching method proposed by the embodiment of the present application;

图6为本申请实施例提出的服务器切换方法的实现流程示意图四；FIG. 6 is a fourth schematic diagram of the implementation flow of the server switching method proposed by the embodiment of the present application;

图7为本申请实施例提出的MooseFS系统的组成结构示意图三；FIG. 7 is a schematic diagram three of the composition structure of the MooseFS system proposed in the embodiment of the present application;

图8为本申请实施例提出的服务器切换方法的实现流程示意图五；FIG. 8 is a schematic diagram five of the implementation flow of the server switching method proposed by the embodiment of the present application;

图9为本申请实施例提出的服务器切换方法的实现流程示意图六；FIG. 9 is a sixth schematic flowchart of the implementation of the server switching method proposed by the embodiment of the present application;

图10为本申请实施例提出的服务器切换方法的实现流程示意图七；FIG. 10 is a schematic diagram 7 of the implementation flow of the server switching method proposed by the embodiment of the present application;

图11为本申请实施例提出的服务器切换方法的实现流程示意图八；FIG. 11 is a schematic diagram eight of the implementation flow of the server switching method proposed by the embodiment of the application;

图12为本申请实施例提出的MooseFS系统的组成结构示意图四；FIG. 12 is a fourth schematic diagram of the composition and structure of the MooseFS system proposed by the embodiment of the application;

图13为本申请实施例提出的MooseFS系统的组成结构示意图五；13 is a schematic diagram five of the composition and structure of the MooseFS system proposed by the embodiment of the application;

图14为本申请实施例提出的MooseFS系统的组成结构示意图六。FIG. 14 is a sixth schematic diagram of the composition and structure of the MooseFS system proposed by the embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是，此处所描述的具体实施例仅用于解释相关申请，而非对该申请的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关申请相关的部分。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are only used to explain the related application, but not to limit the application. In addition, it should be noted that, for the convenience of description, only the parts related to the relevant application are shown in the drawings.

目前，随着互联网技术的发展和各类信息系统的逐步深入，新的数据源不断涌现，业务数据数量的逐渐增大，对非结构化文件存储的需求显著增大。然而，传统的集中式存储通常采用存储区域网络(Storage Area Network，SAN)和网络接入存储(Network AttachedStorage，NAS)等高端存储来应对数据的爆发增长，由于这类存储的空间和容量不能随意扩展，提升设备需要高昂的金钱成本，且存在存储效率低、横向扩展功能不足、负载均衡能力差以及并发访问性能低等缺点，分布式存储的引入能够很好地解决传统SAN，NAS等存储存在的难题。At present, with the development of Internet technology and the gradual deepening of various information systems, new data sources continue to emerge, the amount of business data is gradually increasing, and the demand for unstructured file storage has increased significantly. However, traditional centralized storage usually adopts high-end storage such as Storage Area Network (SAN) and Network Attached Storage (NAS) to cope with the explosive growth of data, because the space and capacity of such storage cannot be arbitrarily Expansion and upgrading of equipment require high monetary cost, and there are disadvantages such as low storage efficiency, insufficient horizontal expansion function, poor load balancing ability, and low concurrent access performance. The introduction of distributed storage can well solve the problems of traditional SAN, NAS and other storage. the problem.

MooseFS系统是一种分布式文件系统，通过MooseFS系统自带的客户端进行文件分配存储，代替传统的直接存储本地磁盘的方式。但MooseFS系统的整体故障率较高，可能会出现主备节点通讯不通等异常情况下所引发的脑裂问题，从而造成MooseFS系统状态混乱，数据损坏的情况，为保障MooseFS系统在面对上述异常情况下不发生重要数据丢失，持续为用户提供服务，提升MooseFS系统整体的高可用性，就成为一个亟待解决的技术问题。The MooseFS system is a distributed file system that allocates and stores files through the client that comes with the MooseFS system, instead of the traditional way of directly storing local disks. However, the overall failure rate of the MooseFS system is relatively high, and there may be a split-brain problem caused by abnormal conditions such as the communication between the main and standby nodes, which will cause the state of the MooseFS system to be chaotic and data corruption. In order to ensure that the MooseFS system faces the above exceptions It is a technical problem that needs to be solved urgently in order to continuously provide services to users and improve the overall high availability of the MooseFS system without loss of important data.

为了解决现有的MooseFS系统所存在的问题，本申请实施例提供了一种服务器切换方法，MooseFS系统及存储介质，且基于MooseFS 3.0机制做出脚本调整。具体地，MooseFS系统包括主服务器、备份服务器以及集群监控模块。当备份服务器监控到主服务器运行异常时，向集群监控模块发送查询请求；其中，查询请求携带主服务器的标识信息；集群监控模块根据标识信息确定主服务器对应的第一服务信息；其中，第一服务信息表征主服务器的实际运行状态；集群监控模块向备份服务器发送查询响应；其中，查询响应携带第一服务信息；主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换。从而防止在网络通讯出现异常的情况下，备节点对主节点的状态产生误判，而争抢资源、争抢服务引发脑裂问题，进而防止MooseFS系统发生状态混乱，数据损坏的情况，保证了MooseFS系统的高可用。In order to solve the problems existing in the existing MooseFS system, the embodiments of the present application provide a server switching method, a MooseFS system and a storage medium, and make script adjustments based on the MooseFS 3.0 mechanism. Specifically, the MooseFS system includes a primary server, a backup server, and a cluster monitoring module. When the backup server monitors that the main server is running abnormally, it sends a query request to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines the first service information corresponding to the main server according to the identification information; The service information represents the actual running state of the primary server; the cluster monitoring module sends a query response to the backup server; the query response carries the first service information; the primary server and the backup server determine whether to perform server switching according to the first service information. This prevents the standby node from misjudging the status of the master node in the case of abnormal network communication, and the competition for resources and services will cause a split-brain problem, thereby preventing the state of the MooseFS system from being chaotic and data corrupted. High availability of MooseFS system.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

实施例一Example 1

本申请实施例提供了一种服务器切换方法，该服务器切换方法应用于MooseFS系统中，图1为MooseFS系统的组成结构示意图一，如图1所示，MooseFS系统10可以包括主服务器11、备份服务器12以及集群监控模块13。An embodiment of the present application provides a server switching method. The server switching method is applied to the MooseFS system. FIG. 1 is a schematic diagram 1 of the composition and structure of the MooseFS system. As shown in FIG. 1 , the MooseFS system 10 may include a main server 11 and a backup server. 12 and the cluster monitoring module 13.

在本申请的实施例中，主服务器11负责维护整个MooseFS系统10的命名空间，并暴露给用户使用，用于与多个数据节点进行数据传输，并管理整个MooseFS系统10。In the embodiment of the present application, the main server 11 is responsible for maintaining the namespace of the entire MooseFS system 10 and exposing it to users for data transmission with multiple data nodes, and managing the entire MooseFS system 10 .

备份服务器12与主服务器11连接，可以在主服务器出现故障时，执行主备切换，接管主服务器11的工作，并且备份服务器12可以实时同步主服务器11的元数据，采用Rsync和Sersync架构，同步发生变化的文件或目录，不仅传输速率高，还能达到元数据实时同步并备份的目的。The backup server 12 is connected to the primary server 11, and can perform a master-standby switchover when the primary server fails, taking over the work of the primary server 11, and the backup server 12 can synchronize the metadata of the primary server 11 in real time, using Rsync and Sersync architecture, synchronization Changed files or directories not only have a high transmission rate, but also achieve real-time synchronization and backup of metadata.

集群监控模块13用于监控主服务器11和备份服务器12，通过对主服务器、备份服务器的实时监控，防止出现某些异常情况，如通讯不通的情况时，备份服务器对主服务器的状态产生误判，误认为主服务器此时出现故障，而争抢资源、争抢服务引发脑裂问题，从而达到预防MooseFS系统10发生状态混乱，数据损坏的效果。The cluster monitoring module 13 is used to monitor the main server 11 and the backup server 12. Through real-time monitoring of the main server and the backup server, it can prevent some abnormal situations, such as the situation of communication failure, the backup server will make a misjudgment of the status of the main server , mistakenly believe that the main server is faulty at this time, and the competition for resources and services causes a split-brain problem, thereby achieving the effect of preventing the state of the MooseFS system 10 from being chaotic and data corrupted.

本申请实施例提供了一种服务器切换方法，图2为本申请实施例提出的服务器切换方法的实现流程示意图一，如图2所示，在本申请的实施例中，服务器切换方法可以包括以下步骤：An embodiment of the present application provides a server switching method, and FIG. 2 is a schematic diagram 1 of the implementation flow of the server switching method proposed by the embodiment of the present application. As shown in FIG. 2 , in the embodiment of the present application, the server switching method may include the following step:

步骤101、当备份服务器监控到主服务器运行异常时，向集群监控模块发送查询请求，其中，查询请求携带主服务器的标识信息。Step 101: When the backup server monitors that the primary server is abnormally running, it sends a query request to the cluster monitoring module, wherein the query request carries the identification information of the primary server.

在本申请的实施例中，MooseFS系统中的备份服务器可以先基于心跳监测机制实时监测主服务器，当备份服务器基于心跳监测机制确定主服务器的当前状态为异常时，备份服务器就会向集群监控模块请求查询主服务器的当前状态，即向集群监控模块发送查询请求，其中，查询请求携带主服务器的标识信息。In the embodiment of the present application, the backup server in the MooseFS system can first monitor the main server in real time based on the heartbeat monitoring mechanism. When the backup server determines that the current state of the main server is abnormal based on the heartbeat monitoring mechanism, the backup server will report to the cluster monitoring module. To request to query the current state of the master server, that is, to send a query request to the cluster monitoring module, wherein the query request carries the identification information of the master server.

需要说明的是，在本申请的实施例中，备份服务器基于心跳监测机制向主服务器定时发送心跳包，并接收反馈信息，如果发送的心跳包在预设时间内未收到反馈信息，则判断主服务器当前状态为异常。进而向集群监控模块发送查询请求，其中，查询请求携带主服务器的标识信息。It should be noted that, in the embodiment of the present application, the backup server regularly sends heartbeat packets to the main server based on the heartbeat monitoring mechanism, and receives feedback information. If the sent heartbeat packets do not receive feedback information within the preset time, it is determined that The current status of the master server is abnormal. Further, a query request is sent to the cluster monitoring module, wherein the query request carries the identification information of the main server.

在本申请的实施例中，查询请求可以用于获取主服务器的服务状态。具体地，当备份服务器在预设时间内未收到反馈信息，判断主服务器当前状态为异常时，备份服务器通过请求集群监控模块发送其监测到的主服务器的第一服务信息，就可以根据接收到的第一服务信息明确主服务器的服务状态。In this embodiment of the present application, the query request may be used to obtain the service status of the primary server. Specifically, when the backup server does not receive the feedback information within the preset time and judges that the current state of the primary server is abnormal, the backup server requests the cluster monitoring module to send the monitored first service information of the primary server, and then the backup server can send the first service information of the primary server monitored by the backup server. The received first service information specifies the service status of the primary server.

在本申请的实施例中，标识信息可以用于表征主服务器的身份信息，进而，集群监控模块可以在其获取并存储的所有节点的服务信息中，根据备份服务器发送的标识信息确定主服务器对应的第一服务信息。标识信息的设置方式本申请不做具体限制。In the embodiment of the present application, the identification information can be used to represent the identity information of the main server, and further, the cluster monitoring module can determine the corresponding main server according to the identification information sent by the backup server in the service information of all nodes acquired and stored by the cluster monitoring module. the first service information. The setting method of the identification information is not specifically limited in this application.

需要说明的是，由于备份服务器基于心跳监测机制对主服务器的监测可能会由于主备之间通讯中断而造成误判，所以当备份服务器判断主服务器此时可能存在故障时，需要向集群监控模块发送查询请求，以明确此时主服务器的真实状态，而不是直接根据故障判断执行主备切换。It should be noted that since the monitoring of the master server by the backup server based on the heartbeat monitoring mechanism may cause misjudgment due to the interruption of communication between the master and the backup server, when the backup server determines that the master server may be faulty at this time, it needs to report to the cluster monitoring module. Send a query request to clarify the real status of the primary server at this time, instead of performing the primary-standby switchover directly based on fault judgment.

步骤102、集群监控模块根据标识信息确定主服务器对应的第一服务信息；其中，第一服务信息表征主服务器的实际运行状态。Step 102: The cluster monitoring module determines the first service information corresponding to the master server according to the identification information; wherein the first service information represents the actual running state of the master server.

在本申请的实施例中，当备份服务器基于心跳监测机制确定主服务器的当前状态为异常时，备份服务器向集群监控模块发送查询请求，MooseFS系统中的集群监控模块在接收到备份服务器发送的、携带有标识信息的查询请求之后，可以根据标识信息进一步确定主服务器的第一服务信息。In the embodiment of the present application, when the backup server determines that the current state of the main server is abnormal based on the heartbeat monitoring mechanism, the backup server sends a query request to the cluster monitoring module, and the cluster monitoring module in the MooseFS system receives the After the query request carrying the identification information, the first service information of the main server may be further determined according to the identification information.

在本申请的实施例中，主服务器的第一服务信息可以是集群监控模块对主服务器进行监控获得的第一服务信息，是根据主服务器上运行的所有服务生成的第一服务信息，其表示除了备份服务器对主服务器监测获得的当前状态信息以外，由处于备份服务器上层的集群监控模块对主服务器的监控所获得的另一种服务状态信息，即第一服务信息，第一服务信息用于指示主服务器的实际运行状态。In the embodiment of the present application, the first service information of the main server may be the first service information obtained by monitoring the main server by the cluster monitoring module, and is the first service information generated according to all services running on the main server, which represents In addition to the current status information obtained by the backup server monitoring the primary server, another kind of service status information obtained by monitoring the primary server by the cluster monitoring module located at the upper layer of the backup server is the first service information. The first service information is used for Indicates the actual operational status of the master server.

示例性的，在本申请的实施例中，集群监控模块在根据标识信息确定主服务器对应的第一服务信息时，集群监控模块可以一直对主服务器的服务状态进行监控，并同时按照标识信息存储第一服务信息，当集群监控模块接收到备份服务器发送的查询请求后，就可以根据查询请求中携带的标识信息，定位到集群监控模块中预先存储的对应的第一服务信息；集群监控模块也可以不预先存储第一服务信息，根据标识信息直接查询主服务器此时的第一状态信息。Exemplarily, in the embodiment of the present application, when the cluster monitoring module determines the first service information corresponding to the main server according to the identification information, the cluster monitoring module can always monitor the service status of the main server, and at the same time store the information according to the identification information. For the first service information, when the cluster monitoring module receives the query request sent by the backup server, it can locate the corresponding first service information pre-stored in the cluster monitoring module according to the identification information carried in the query request; The first state information of the primary server at this time may be directly queried according to the identification information without storing the first service information in advance.

示例性的，在本申请的实施例中，集群监控模块在根据标识信息确定主服务器对应的第一服务信息时，集群监控模块可以对主服务器进行监控获得第一服务信息，进而根据第一服务信息对主服务器的服务状态进行确定，尤其是当备份服务器对主服务器的当前服务状态产生故障判断时，需要根据集群监控模块发送的第一服务信息对主服务器的服务状态进行更客观的判断与确定。Exemplarily, in the embodiment of the present application, when the cluster monitoring module determines the first service information corresponding to the main server according to the identification information, the cluster monitoring module may monitor the main server to obtain the first service information, and then according to the first service information information to determine the service status of the main server, especially when the backup server makes a fault judgment on the current service status of the main server, it is necessary to make a more objective judgment and comparison of the service status of the main server according to the first service information sent by the cluster monitoring module. Sure.

步骤103、集群监控模块向备份服务器发送查询响应，其中，查询响应携带第一服务信息。Step 103: The cluster monitoring module sends a query response to the backup server, where the query response carries the first service information.

在本申请的实施例中，集群监控模块根据标识信息确定主服务器对应的第一服务信息以后，响应备份服务器发送的查询请求，向备份服务器发送携带第一服务信息的查询响应，从而备份服务器可以获取由集群监控模块监测到的主服务器的第一服务信息。In the embodiment of the present application, after determining the first service information corresponding to the primary server according to the identification information, the cluster monitoring module responds to the query request sent by the backup server, and sends a query response carrying the first service information to the backup server, so that the backup server can Acquire the first service information of the master server monitored by the cluster monitoring module.

步骤104、主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换。Step 104: The primary server and the backup server determine whether to perform server switching according to the first service information.

在本申请的实施例中，集群监控模块向备份服务器发送查询响应之后，主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换。In the embodiment of the present application, after the cluster monitoring module sends a query response to the backup server, the primary server and the backup server determine whether to perform server switching according to the first service information.

在本申请的实施例中，服务器切换是指对备份服务器和主服务器进行切换，利用备份服务器接替主服务器的工作，进而为系统提供服务。In the embodiment of the present application, server switching refers to switching between the backup server and the primary server, and using the backup server to take over the work of the primary server, thereby providing services for the system.

需要说明的是，由于第一服务信息表征主服务器的实际运行状态，因此主服务器和备服务器可以基于第一服务信息确定服务器是否需要进行服务器切换，若根据第一服务信息判定需要进行服务器切换，则执行服务器切换，由备份服务器接替主服务器的工作，若根据第一服务信息判定无需进行服务器切换，则不执行服务器切换。It should be noted that, since the first service information represents the actual operating state of the primary server, the primary server and the backup server can determine whether the server needs to perform server switching based on the first service information. Then, server switching is performed, and the backup server takes over the work of the main server. If it is determined according to the first service information that server switching is not necessary, server switching is not performed.

图3为本申请实施例提出的服务器切换方法的实现流程示意图二，如图3所示，在本申请的实施例中，主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换，即步骤104可以包括以下步骤：FIG. 3 is a second implementation flowchart of the server switching method proposed by the embodiment of the present application. As shown in FIG. 3 , in the embodiment of the present application, the primary server and the backup server determine whether to perform server switching according to the first service information, that is, Step 104 may include the following steps:

步骤104a、若备份服务器接收到的第一服务信息为正常，则主服务器和备份服务器不执行服务器切换处理。Step 104a: If the first service information received by the backup server is normal, the primary server and the backup server do not perform server switching processing.

在本申请的实施例中，主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换，如果备份服务器确定此时主服务器的第一服务信息为正常，那么主服务器和备份服务器不执行服务器切换处理。In the embodiment of the present application, the primary server and the backup server determine whether to perform server switching according to the first service information. If the backup server determines that the first service information of the primary server is normal at this time, the primary server and the backup server do not execute the server switch. Switch processing.

在本申请的实施例中，若备份服务器获取到的第一服务信息为正常，则表示备份服务器可能对主服务器当前服务状态存在误判，即主服务器并没有出现故障，可能是由于主服务器和备份服务器之间通讯不通，备份服务器没有接收到主服务器反馈回来的信息，而导致备份服务器误认为主服务器状态异常。由此，依据获得的第一服务信息，备份服务器可以确定主服务器是正常运行状态，从而主服务器和备份服务器不执行服务器切换处理，避免了在通讯不通的情况下引发脑裂问题造成系统混乱，数据丢失的问题，保障了MooseFS系统的高可用。In the embodiment of the present application, if the first service information obtained by the backup server is normal, it means that the backup server may have misjudged the current service status of the primary server, that is, the primary server is not faulty. The communication between the backup servers is blocked, and the backup server does not receive the information returned by the primary server, which causes the backup server to mistakenly think that the status of the primary server is abnormal. Therefore, according to the obtained first service information, the backup server can determine that the primary server is in a normal operating state, so that the primary server and the backup server do not perform server switching processing, which avoids system confusion due to a split-brain problem caused by communication failure. The problem of data loss ensures the high availability of the MooseFS system.

在本申请的实施例中，当主服务器由于负载过高而导致暂时无响应，即出现主服务假死的情况时，也会导致备份服务器判断主服务器已经宕机，而造成误判，从而可能引发脑裂问题。具体的故障信息本申请不做限制。In the embodiment of the present application, when the primary server is temporarily unresponsive due to an excessively high load, that is, when the primary service is suspended, it will also cause the backup server to judge that the primary server is down, resulting in a misjudgment, which may cause brain damage. cracking problem. The specific fault information is not limited in this application.

步骤104b、若备份服务器接收到的第一服务信息为异常，则主服务器和备份服务器执行服务器切换处理。Step 104b: If the first service information received by the backup server is abnormal, the primary server and the backup server perform server switching processing.

在本申请实施例中，集群监控模块向备份服务器发送查询响应，其中，查询响应携带第一服务信息，若备份服务器接收到的查询响应中，显示第一服务信息为异常，则表明主服务器当前状态确实为异常，则主服务器和备份服务器执行服务器切换处理，由备份服务器接管主服务器工作，从而减少了MooseFS系统的故障处理时间，保证了MooseFS系统的高可用。In this embodiment of the present application, the cluster monitoring module sends a query response to the backup server, wherein the query response carries the first service information. If the query response received by the backup server shows that the first service information is abnormal, it indicates that the primary server is currently If the status is indeed abnormal, the primary server and the backup server perform server switching processing, and the backup server takes over the work of the primary server, thereby reducing the fault processing time of the MooseFS system and ensuring the high availability of the MooseFS system.

实施例二Embodiment 2

基于上述实施例一，在本申请的再一实施例中，图4为MooseFS系统的组成结构示意图二，如图4所示，MooseFS系统10中的集群监控模块13可以包括服务监控子模块131、阈值监控子模块132、信息存储子模块133、全局监控子模块134。Based on the above-mentioned first embodiment, in yet another embodiment of the present application, FIG. 4 is a second schematic diagram of the composition and structure of the MooseFS system. As shown in FIG. 4 , the cluster monitoring module 13 in the MooseFS system 10 may include a service monitoring sub-module 131, Threshold monitoring sub-module 132 , information storage sub-module 133 , and global monitoring sub-module 134 .

图5为本申请实施例提出的服务器切换方法的实现流程示意图三，如图5所示，集群监控模块根据标识信息确定主服务器对应的第一服务信息的方法包括以下步骤：FIG. 5 is a schematic diagram 3 of the implementation flow of the server switching method proposed by the embodiment of the present application. As shown in FIG. 5 , the method for the cluster monitoring module to determine the first service information corresponding to the main server according to the identification information includes the following steps:

步骤102a、服务监控子模块基于心跳监测机制对主服务器进行服务状态监测，获取至少一个服务对应的至少一个服务信息；其中，一个服务对应一个服务信息。Step 102a, the service monitoring sub-module monitors the service status of the main server based on the heartbeat monitoring mechanism, and obtains at least one service information corresponding to at least one service; wherein, one service corresponds to one service information.

需要说明的是，在本申请的实施例中，主服务器的至少一个服务可以包括存储心跳信息服务、MooseFS系统运行日志存储服务以及MooseFS系统数据节点服务等多个服务中的至少一个，本申请不作具体限定。It should be noted that, in the embodiment of this application, at least one service of the main server may include at least one of multiple services, such as the storage heartbeat information service, the MooseFS system operation log storage service, and the MooseFS system data node service. Specific restrictions.

步骤102b、若至少一个服务信息均为正常，则确定第一服务信息为正常。Step 102b: If at least one service information is normal, determine that the first service information is normal.

步骤102c、若至少一个服务信息中的任一个服务信息为异常，则确定第一服务信息为异常。Step 102c: If any one of the at least one service information is abnormal, determine that the first service information is abnormal.

示例性的，在本申请中，如果服务监控子模块监测获得的主服务器的存储心跳信息服务、MooseFS系统运行日志存储服务以及MooseFS系统数据节点服务均为正常，才认为此时主服务器的第一服务信息为正常。Exemplarily, in this application, if the service monitoring sub-module monitors the storage heartbeat information service of the main server, the MooseFS system operation log storage service, and the MooseFS system data node service are all normal, it is considered that the first server of the main server at this time. Service information is normal.

进一步地，在本申请的实施例中，集群监控模块中的信息存储子模块可以接收服务监控子模块发送的主服务器对应的心跳监测文件，并存储心跳监测文件。Further, in the embodiment of the present application, the information storage sub-module in the cluster monitoring module may receive the heartbeat monitoring file corresponding to the main server sent by the service monitoring submodule, and store the heartbeat monitoring file.

需要说明的是，在本申请的实施例中，信息存储子模块在预先建立的共享目录中存放心跳监测文件，即服务心跳相关信息，信息存储子模块服务于服务监控子模块。It should be noted that, in the embodiment of the present application, the information storage sub-module stores heartbeat monitoring files, ie service heartbeat related information, in a pre-established shared directory, and the information storage sub-module serves the service monitoring sub-module.

图6为本申请实施例提出的服务器切换方法的实现流程示意图四，如图6所示，集群监控模块根据标识信息确定主服务器对应的第一服务信息的方法还包括以下步骤：FIG. 6 is a fourth schematic flowchart of the implementation process of the server switching method proposed by the embodiment of the application. As shown in FIG. 6 , the method for the cluster monitoring module to determine the first service information corresponding to the main server according to the identification information further includes the following steps:

步骤102d、集群监控模块获取信息存储子模块存储的主服务器对应的心跳监测文件。Step 102d, the cluster monitoring module acquires the heartbeat monitoring file corresponding to the main server stored in the information storage sub-module.

在本申请的实施例中，信息存储子模块接收服务监控子模块发送的主服务器对应的心跳监测文件，并存储心跳监测文件。当集群监控模块需要获取心跳监测文件时，就可以从信息存储子模块中获取心跳监测文件。In the embodiment of the present application, the information storage sub-module receives the heartbeat monitoring file corresponding to the main server sent by the service monitoring submodule, and stores the heartbeat monitoring file. When the cluster monitoring module needs to obtain the heartbeat monitoring file, it can obtain the heartbeat monitoring file from the information storage submodule.

步骤102e、集群监控模块根据心跳维护文件和心跳监测文件进行校验处理，获得校验结果。Step 102e: The cluster monitoring module performs verification processing according to the heartbeat maintenance file and the heartbeat monitoring file to obtain a verification result.

在本申请的实施例中，在集群监控模块获取了心跳监测文件之后，就可以根据心跳维护文件和心跳监测文件进行校验处理，并获得校验结果。In the embodiment of the present application, after the cluster monitoring module obtains the heartbeat monitoring file, it can perform verification processing according to the heartbeat maintenance file and the heartbeat monitoring file, and obtain the verification result.

需要说明的是，在本申请的实施例中，心跳维护文件是备份服务器基于心跳监测机制获得的主服务器的心跳信息。It should be noted that, in the embodiment of the present application, the heartbeat maintenance file is the heartbeat information of the primary server obtained by the backup server based on the heartbeat monitoring mechanism.

需要说明的是，在本申请的实施例中，心跳监测文件是集群监控模块中的服务监控子模块获得的主服务器对应的心跳监控信息。It should be noted that, in the embodiment of the present application, the heartbeat monitoring file is the heartbeat monitoring information corresponding to the main server obtained by the service monitoring sub-module in the cluster monitoring module.

步骤102f、若校验结果为校验成功，则确定主服务器对应的第一服务信息为正常。Step 102f: If the verification result is that the verification is successful, it is determined that the first service information corresponding to the primary server is normal.

步骤102g、若校验结果为校验失败，则确定主服务器对应的第一服务信息为异常。Step 102g, if the verification result is that the verification fails, determine that the first service information corresponding to the primary server is abnormal.

在本申请的实施例中，集群监控模块根据心跳维护文件和心跳监测文件进行校验处理，如果校验结果为校验成功，则确定主服务器的第一服务信息为正常；如果校验结果为校验失败，则确定主服务器对应的第一服务信息为异常。In the embodiment of the present application, the cluster monitoring module performs verification processing according to the heartbeat maintenance file and the heartbeat monitoring file, and if the verification result is that the verification is successful, it is determined that the first service information of the main server is normal; if the verification result is If the verification fails, it is determined that the first service information corresponding to the primary server is abnormal.

在本申请的实施例中，校验成功是指通过心跳维护文件和心跳监测文件进行对比或校验后，发现心跳监测文件显示主服务器的第一服务信息为正常，即可以确定心跳维护文件所显示的主服务器状态异常可能出现误判，主服务器此时应当为正常的服务状态，则结果为校验成功；校验失败是指通过心跳维护文件和心跳监测文件进行对比或校验后，依然发现心跳监测文件显示主服务器的第一服务信息为异常，则可以确定主服务器此时确实存在异常，即结果为校验失败。In the embodiment of the present application, the verification is successful means that after comparing or verifying the heartbeat maintenance file and the heartbeat monitoring file, it is found that the heartbeat monitoring file shows that the first service information of the main server is normal, and it can be determined that the heartbeat maintenance file has The abnormal status of the main server displayed may cause misjudgment. The main server should be in normal service status at this time, and the result is that the verification is successful; the verification failure refers to the comparison or verification of the heartbeat maintenance file and the heartbeat monitoring file. If it is found that the heartbeat monitoring file shows that the first service information of the main server is abnormal, it can be determined that the main server is indeed abnormal at this time, that is, the result is that the verification fails.

实施例三Embodiment 3

基于上述实施例一和二，在本申请的另一实施例中，图7为本申请实施例提出的MooseFS系统方法的组成结构示意图三，如图7所示，MooseFS系统10还包括数据节点14。Based on the first and second embodiments above, in another embodiment of the present application, FIG. 7 is a third schematic diagram of the composition and structure of the MooseFS system method proposed in the embodiment of the present application. As shown in FIG. 7 , the MooseFS system 10 further includes a data node 14 .

需要说明的是，在本申请的实施例中，MooseFS系统中还可以包括多个数据节点14，用于提供真实文件数据的存储服务。It should be noted that, in the embodiment of the present application, the MooseFS system may further include a plurality of data nodes 14 for providing a storage service of real file data.

图8为本申请实施例提出的服务器切换方法的实现流程示意图五，如图8所示，在本申请的实施例中，服务器切换方法可以包括以下步骤：FIG. 8 is a schematic diagram 5 of the implementation flow of the server switching method proposed by the embodiment of the present application. As shown in FIG. 8 , in the embodiment of the present application, the server switching method may include the following steps:

步骤201、服务监控子模块基于心跳监测机制对数据节点进行监测，获得数据节点对应的第二服务信息。Step 201: The service monitoring sub-module monitors the data nodes based on the heartbeat monitoring mechanism, and obtains second service information corresponding to the data nodes.

在本申请的实施例中，MooseFS系统中还包括多个数据节点，服务监控子模块还同时对系统中多个数据节点进行监控，并获得多个数据节点的第二服务信息，第二服务信息表征数据节点的服务状态。In the embodiment of this application, the MooseFS system further includes multiple data nodes, and the service monitoring sub-module also monitors multiple data nodes in the system at the same time, and obtains the second service information of the multiple data nodes, the second service information Represents the service status of a data node.

需要说明的是，本申请实施例中，数据节点可以使用双网卡绑定技术，不仅可以提高网络传输速度，还可以确保当其中一块网卡出现故障，依然可以实现正常高效的工作。示例性的，当某个数据节点的一块网卡发生故障时，另一块立刻接管全部负载，使得服务不被中断，并等待维修人员进行后续维修，从而保证系统整体的正常使用。It should be noted that, in the embodiment of the present application, the data node can use the dual network card binding technology, which can not only improve the network transmission speed, but also ensure that when one of the network cards fails, normal and efficient work can still be achieved. Exemplarily, when one network card of a certain data node fails, the other one immediately takes over the entire load, so that the service is not interrupted, and waits for the maintenance personnel to perform subsequent maintenance, thereby ensuring the normal use of the entire system.

步骤202、若第二服务信息为中止，则执行服务拉起处理，并重新对数据节点进行监测，获得更新后第二服务信息。Step 202: If the second service information is suspended, execute the service pulling process, and re-monitor the data node to obtain the updated second service information.

步骤203、若更新后第二服务信息为中止，则进行报警处理。Step 203: If the second service information is suspended after the update, perform alarm processing.

在本申请的实施例中，服务监控子模块基于心跳监测机制对数据节点进行监测，获得数据节点对应的第二服务信息以后，如果发现第二服务状态为中止，则对第二服务进行拉起；如果第二服务状态为无法再次启动，则进行报警处理。In the embodiment of the present application, the service monitoring sub-module monitors the data nodes based on the heartbeat monitoring mechanism, and after obtaining the second service information corresponding to the data nodes, if the second service status is found to be suspended, the second service is started. ; If the status of the second service is that it cannot be started again, an alarm will be processed.

在本申请的实施例中，如果第二服务信息为中止，则执行服务拉起处理，并重新对数据节点进行监测，获得更新后第二服务信息；若更新后第二服务信息为中止，则进行报警处理。In the embodiment of the present application, if the second service information is suspended, the service pulling process is performed, and the data node is re-monitored to obtain the updated second service information; if the updated second service information is suspended, then Perform alarm processing.

进一步地，在本申请的实施例中，集群监控模块中的信息存储子模块可以接收服务监控子模块发送的数据节点对应的第二服务信息，并存储第二服务信息。Further, in the embodiment of the present application, the information storage sub-module in the cluster monitoring module may receive the second service information corresponding to the data node sent by the service monitoring sub-module, and store the second service information.

在本申请的实施例中，在服务监控子模块基于心跳监测机制对数据节点进行监测，获得数据节点对应的第二服务信息并进行处理后，信息存储子模块还用于对数据节点的第二服务信息进行存储。In the embodiment of the present application, after the service monitoring sub-module monitors the data nodes based on the heartbeat monitoring mechanism, and obtains and processes the second service information corresponding to the data nodes, the information storage sub-module is further configured to monitor the second service information of the data nodes. Service information is stored.

实施例四Embodiment 4

基于上述实施例一至三，在本申请的另一实施例中，图9为本申请实施例提出的服务器切换方法的实现流程示意图六，如图9所示，在本申请的实施例中，服务器切换方法还可以包括以下步骤：Based on the above-mentioned first to third embodiments, in another embodiment of the present application, FIG. 9 is a sixth schematic flowchart of the implementation of the server switching method proposed by the embodiment of the present application. As shown in FIG. 9 , in the embodiment of the present application, the server The switching method may further include the following steps:

步骤301、阈值监控子模块获取主服务器对应的第一状态参数。Step 301: The threshold monitoring sub-module acquires the first state parameter corresponding to the primary server.

在本申请的实施例中，阈值监控子模块可以获取主服务器对应的第一状态参数。In the embodiment of the present application, the threshold monitoring sub-module may acquire the first state parameter corresponding to the main server.

需要说明的是，在本申请的实施例中，第一状态参数是主服务器的运行状态参数，可以包括主服务器对应的中央处理器(Central Processing Unit，CPU)使用率、内存使用率、输入/输出(Input/Output，I/O)访问率以及磁盘存储率等多种表征运行状态的参数，本申请不做具体限制。It should be noted that, in the embodiment of the present application, the first state parameter is the running state parameter of the main server, which may include the central processing unit (Central Processing Unit, CPU) usage rate, memory usage rate, input/output ratio corresponding to the main server There are various parameters that characterize the running state, such as an output (Input/Output, I/O) access rate and a disk storage rate, which are not specifically limited in this application.

步骤302、若第一状态参数大于第一预设状态阈值，则进行报警处理；其中，第一状态参数用于对主服务器的运行状态进行监测。Step 302: If the first state parameter is greater than the first preset state threshold, perform alarm processing; wherein the first state parameter is used to monitor the running state of the main server.

在本申请的实施例中，阈值监控子模块在获取主服务器对应的第一状态参数之后，可以对第一状态参数和第一预设状态阈值进行比较，从而可以根据比较结果执行进一步的处理。In the embodiment of the present application, after acquiring the first state parameter corresponding to the main server, the threshold monitoring sub-module can compare the first state parameter with the first preset state threshold, so that further processing can be performed according to the comparison result.

具体地，在本申请中，阈值监控子模块在获取主服务器对应的第一状态参数以后，将第一状态参数和第一预设阈值进行对比，若第一状态参数大于第一预设状态阈值，则进行报警处理；其中，第一状态参数用于对主服务器的运行状态进行监测。Specifically, in this application, after obtaining the first state parameter corresponding to the main server, the threshold monitoring sub-module compares the first state parameter with the first preset threshold, if the first state parameter is greater than the first preset state threshold , then alarm processing is performed; wherein, the first state parameter is used to monitor the running state of the main server.

在本申请的实施例中，第一预设阈值是根据主服务器的运行参数进行预设的值，用于对主服务器的运行状态进行监控，当第一状态参数大于第一预设状态阈值时，就表明此时主服务器的运行状态可能存在超负荷的情况，进行报警，通知集群管理人员进行及时的维护，达到避免主服务器发生故障，而引发MooseFS系统混乱，数据丢失的情况，达到了MooseFS系统高可用的效果。In the embodiment of the present application, the first preset threshold is a value preset according to the operating parameter of the main server, and is used to monitor the operating state of the main server. When the first state parameter is greater than the first preset state threshold , it indicates that the running status of the main server may be overloaded at this time, alarm and notify the cluster administrator to carry out timely maintenance, so as to avoid the failure of the main server, which will cause the chaos of the MooseFS system and the loss of data, which has reached MooseFS. The effect of high system availability.

图10为本申请实施例提出的服务器切换方法的实现流程示意图七，如图10所示，在本申请的实施例中，服务器切换方法还可以包括以下步骤：FIG. 10 is a seventh schematic flowchart of the implementation of the server switching method proposed by the embodiment of the present application. As shown in FIG. 10 , in the embodiment of the present application, the server switching method may further include the following steps:

步骤303、阈值监控子模块获取数据节点对应的第二状态参数。Step 303: The threshold monitoring sub-module acquires the second state parameter corresponding to the data node.

在本申请的实施例中，第二状态参数是各个数据节点的运行状态参数，可以包括数据节点对应的CPU使用率、内存使用率、I/O访问率以及磁盘存储率等多种表征运行状态的参数，本申请不做具体限制。In the embodiment of the present application, the second state parameter is the operating state parameter of each data node, which may include various representations of the operating state, such as the CPU usage rate, memory usage rate, I/O access rate, and disk storage rate corresponding to the data node. parameters, which are not specifically limited in this application.

步骤304、若第二状态参数大于第二预设状态阈值，则进行报警处理；其中，第二状态参数用于对数据节点的运行状态进行监测。Step 304: If the second state parameter is greater than the second preset state threshold, perform alarm processing; wherein the second state parameter is used to monitor the running state of the data node.

在本申请的实施例中，当阈值监控子模块获取数据节点对应的第二状态参数以后，若第二状态参数大于第二预设状态阈值，则进行报警处理；其中，第二状态参数用于对数据节点的运行状态进行监测。In the embodiment of the present application, after the threshold monitoring sub-module acquires the second state parameter corresponding to the data node, if the second state parameter is greater than the second preset state threshold, an alarm processing is performed; wherein the second state parameter is used for Monitor the running status of data nodes.

在本申请的实施例中，第二预设状态阈值是根据数据节点的运行参数进行预设的值，用于对数据节点的运行状态进行监控，当第二状态参数大于第二预设状态阈值时，就表明此时数据节点的运行状态可能存在超负荷的情况，进行报警，通知集群管理人员进行及时的维护，以避免数据节点发生故障，造成剩余数据节点的负载加重，可能会导致数据节点的宕机，从而实现了数据节点的高可用。In the embodiment of the present application, the second preset state threshold is a value preset according to the operation parameter of the data node, and is used to monitor the operation state of the data node. When the second state parameter is greater than the second preset state threshold When the data node is overloaded, it indicates that the running state of the data node may be overloaded, and an alarm is issued to notify the cluster administrator to perform timely maintenance, so as to avoid the failure of the data node and increase the load of the remaining data nodes, which may cause the data nodes to fail. downtime, thus achieving high availability of data nodes.

需要说明的是，第一状态参数和第二状态参数分别是主服务器和数据节点的运行状态参数，分别表征主服务器和数据节点的运行状态；第一预设状态阈值和第二预设状态阈值分别是对主服务器和数据节点的运行状态进行预先设定的值，用于对主服务器和数据节点的运行状态进行预警。It should be noted that the first state parameter and the second state parameter are the operating state parameters of the main server and the data node respectively, and represent the operating states of the main server and the data node respectively; the first preset state threshold and the second preset state threshold The values are preset values for the running states of the main server and the data nodes, respectively, and are used for early warning of the running states of the main server and the data nodes.

图11为本申请实施例提出的服务器切换方法的实现流程示意图八，如图11所示，在本申请的实施例中，服务器切换方法还可以包括以下步骤：FIG. 11 is a schematic diagram 8 of the implementation flow of the server switching method proposed by the embodiment of the present application. As shown in FIG. 11 , in the embodiment of the present application, the server switching method may further include the following steps:

步骤305、全局监控子模块对服务监控子模块、阈值监控子模块以及信息存储子模块进行监测，获得服务监控子模块对应的第一服务进程、阈值监控子模块对应的第二服务进程以及信息存储子模块对应的第三服务进程。Step 305: The global monitoring submodule monitors the service monitoring submodule, the threshold monitoring submodule, and the information storage submodule, and obtains the first service process corresponding to the service monitoring submodule, the second service process corresponding to the threshold monitoring submodule, and the information storage. The third service process corresponding to the submodule.

在本申请的实施例中，全局监控子模块是集群监控模块中的一个子模块，用于对集群监控模块中其余三个子模块：服务监控子模块、阈值监控子模块以及信息存储子模块进行服务进程上的监控，以保障服务监控子模块、阈值监控子模块以及信息存储子模块的服务进程可以保持正常状态，实现集群监控模块的高可用。In the embodiment of the present application, the global monitoring sub-module is a sub-module in the cluster monitoring module, and is used to provide services to the remaining three sub-modules in the cluster monitoring module: the service monitoring sub-module, the threshold monitoring sub-module and the information storage sub-module Monitoring on the process to ensure that the service process of the service monitoring sub-module, the threshold monitoring sub-module and the information storage sub-module can maintain a normal state and achieve high availability of the cluster monitoring module.

步骤306、若第一服务进程、第二服务进程以及第三服务进程中的任一个进程为异常，则进行报警处理，以保证集群监控模块的高可用。Step 306: If any one of the first service process, the second service process, and the third service process is abnormal, perform alarm processing to ensure high availability of the cluster monitoring module.

在本申请的实施例中，在全局监控子模块对服务监控子模块、阈值监控子模块以及信息存储子模块进行监测，获得服务监控子模块对应的第一服务进程、阈值监控子模块对应的第二服务进程以及信息存储子模块对应的第三服务进程之后，若第一服务进程、第二服务进程以及第三服务进程中的任一个进程为异常，则进行报警处理。In the embodiment of the present application, the global monitoring sub-module monitors the service monitoring sub-module, the threshold monitoring sub-module and the information storage sub-module, and obtains the first service process corresponding to the service monitoring sub-module and the first service process corresponding to the threshold monitoring sub-module. After the second service process and the third service process corresponding to the information storage sub-module, if any one of the first service process, the second service process and the third service process is abnormal, an alarm process is performed.

在本申请的实施例中，通过全局监控模块对第一服务进程、第二服务进程以及第三服务进程的监控，在出现进程异常的情况时，针对异常情况进行报警，通知集群管理人员进行维护，保障集群监控模块的高可用。In the embodiment of the present application, the first service process, the second service process, and the third service process are monitored by the global monitoring module. When the process is abnormal, an alarm is issued for the abnormal situation, and the cluster management personnel are notified to perform maintenance. , to ensure the high availability of the cluster monitoring module.

图12为本申请实施例提出的MooseFS系统的组成结构示意图四，如图12所示，在本申请的实施例中，MooseFS系统可以包括主服务器、备份服务器、数据节点、交换机、集群监控模块以及客户端的应用。FIG. 12 is a fourth schematic diagram of the composition and structure of the MooseFS system proposed by the embodiment of the present application. As shown in FIG. 12 , in the embodiment of the present application, the MooseFS system may include a main server, a backup server, a data node, a switch, a cluster monitoring module and client application.

在本申请的实施例中，由主服务器和备份服务器组成主备高可用模块，主备份服务器之间的高可用功能主要依靠Ucarp和虚拟IP技术实现，虚拟IP技术给应用和数据节点之间提供一个浮动访问点，主备切换的时候不影响应用和数据节点之间的连接和交互。In the embodiment of the present application, the active and standby high-availability modules are composed of the main server and the backup server, and the high-availability function between the main and backup servers is mainly realized by Ucarp and virtual IP technology, and the virtual IP technology provides between applications and data nodes. A floating access point, which does not affect the connection and interaction between applications and data nodes when switching between active and standby.

在本申请的实施例中，备份服务器可以实时同步主服务器的元数据，采用Rsync和Sersync架构，Sersync记录被监听目录下发生任何变化的某一个文件或者某一个目录名，Rsync负责实时传输文件，这样在Rsync和Sersync架构的搭配下，就可以同步发生变化的文件或目录，不仅传输速率高，还能达到元数据实时同步并备份的目的，有效防止主备份服务器同步元数据机制存在故障等导致备份服务器存在元数据丢失的问题。In the embodiment of the present application, the backup server can synchronize the metadata of the main server in real time, using the Rsync and Sersync architecture, Sersync records a certain file or a certain directory name that has any changes in the monitored directory, and Rsync is responsible for real-time transmission of files, In this way, under the combination of Rsync and Sersync architecture, the changed files or directories can be synchronized, not only the transmission rate is high, but also the purpose of real-time synchronization and backup of metadata can be achieved, which can effectively prevent the failure of the metadata synchronization mechanism of the primary and backup servers. The backup server has an issue with metadata loss.

在本申请的实施例中，交换机用于连接服务器，并辅助服务器完成数据接收和转发的工作。本申请中的交换机使用双网卡，使得任一交换机发生故障也不会影响系统的正常使用，提升交换机的可用性。In the embodiment of the present application, the switch is used to connect to the server, and assist the server to complete the work of data receiving and forwarding. The switch in this application uses dual network cards, so that the failure of any switch will not affect the normal use of the system and improve the availability of the switch.

在本申请的实施例中，数据节点使用双网卡绑定技术，不仅可以提高网络传输速度，还可以确保当其中一块网卡出现故障，依然可以实现正常高效的工作。示例性的，当某个数据节点的一块网卡发生故障时，另一块立刻接管全部负载，使得服务不被中断，并等待维修人员进行后续维修，从而保证系统整体的正常使用。In the embodiment of the present application, the data node uses the dual network card binding technology, which can not only improve the network transmission speed, but also ensure that when one of the network cards fails, normal and efficient work can still be achieved. Exemplarily, when one network card of a certain data node fails, the other one immediately takes over the entire load, so that the service is not interrupted, and waits for the maintenance personnel to perform subsequent maintenance, thereby ensuring the normal use of the entire system.

在本申请的实施例中，集群监控模块中包含服务监控子模块、阈值监控子模块、信息存储子模块、全局监控子模块。集群监控模块用于监控主服务器、备份服务器以及数据节点，实现MooseFS系统的整体高可用。In the embodiment of the present application, the cluster monitoring module includes a service monitoring sub-module, a threshold monitoring sub-module, an information storage sub-module, and a global monitoring sub-module. The cluster monitoring module is used to monitor the main server, backup server and data nodes to achieve the overall high availability of the MooseFS system.

实施例五Embodiment 5

基于上述实施例一至四提供的服务器切换方法，如前述图1所示，本申请实施例提供一种MooseFS系统10，包括：主服务器11、备份服务器12、以及集群监控模块13，其中：Based on the server switching methods provided in the foregoing embodiments 1 to 4, as shown in the aforementioned FIG. 1 , an embodiment of the present application provides a MooseFS system 10, including: a primary server 11, a backup server 12, and a cluster monitoring module 13, wherein:

图13为本申请实施例提出的MooseFS系统的结构示意图五，如图13所示，本申请实施例提出的MooseFS系统10包括：发送单元15、确定单元16以及执行单元17。FIG. 13 is a fifth structural diagram of the MooseFS system proposed by the embodiment of the present application. As shown in FIG. 13 , the MooseFS system 10 proposed by the embodiment of the present application includes a sending unit 15 , a determining unit 16 and an executing unit 17 .

所述发送单元15，用于当所述备份服务器监控到所述主服务器运行异常时，向所述集群监控模块发送查询请求；其中，所述查询请求携带所述主服务器的标识信息。The sending unit 15 is configured to send a query request to the cluster monitoring module when the backup server monitors that the primary server runs abnormally; wherein the query request carries the identification information of the primary server.

确定单元16，用于所述集群监控模块根据所述标识信息确定所述主服务器对应的第一服务信息；其中，所述第一服务信息表征所述主服务器的实际运行状态。The determining unit 16 is used for the cluster monitoring module to determine the first service information corresponding to the main server according to the identification information; wherein, the first service information represents the actual running state of the main server.

进一步地，所述发送单元15，还用于所述集群监控模块向所述备份服务器发送查询响应；其中，所述查询响应携带所述第一服务信息。Further, the sending unit 15 is further configured for the cluster monitoring module to send a query response to the backup server, wherein the query response carries the first service information.

所述确定单元16，还用于所述主服务器和所述备份服务器根据所述第一服务信息，确定是否进行服务器切换。The determining unit 16 is further configured for the primary server and the backup server to determine whether to perform server switching according to the first service information.

执行单元17，用于若第一服务信息为正常，则主服务器和备份服务器不执行服务器切换处理。The execution unit 17 is configured to, if the first service information is normal, the primary server and the backup server do not perform server switching processing.

进一步地，执行单元17，还用于若第一服务信息为异常，则主服务器和备份服务器执行服务器切换处理。Further, the executing unit 17 is further configured to execute server switching processing on the primary server and the backup server if the first service information is abnormal.

在本申请的实施例中，进一步地，如图13所示，本申请实施例提出的MooseFS系统还可以包括：获取单元18。In the embodiments of the present application, further, as shown in FIG. 13 , the MooseFS system proposed in the embodiments of the present application may further include: an acquisition unit 18 .

获取单元18，用于节点服务监控子模块基于心跳监测机制对主服务器进行服务状态监测，获取至少一个服务对应的至少一个服务状态。The obtaining unit 18 is used for the node service monitoring sub-module to monitor the service state of the main server based on the heartbeat monitoring mechanism, and obtain at least one service state corresponding to at least one service.

进一步地，在本申请的实施例中，确定单元16，具体用于若至少一个服务状态均为正常，则确定当前服务状态为正常；若至少一个服务状态中的任一个服务状态为异常，则确定当前服务状态为异常。Further, in the embodiment of the present application, the determining unit 16 is specifically configured to determine that the current service state is normal if at least one service state is normal; if any one of the at least one service state is abnormal, then Determines that the current service status is abnormal.

在本申请的实施例中，进一步地，如图13所示，本申请实施例提出的MooseFS系统还可以包括：报警单元19。In the embodiment of the present application, further, as shown in FIG. 13 , the MooseFS system proposed in the embodiment of the present application may further include: an alarm unit 19 .

报警单元19，用于若第一状态参数大于第一预设状态阈值，则进行报警处理；其中，第一状态参数用于对主服务器的运动状态进行监测。The alarm unit 19 is configured to perform alarm processing if the first state parameter is greater than the first preset state threshold; wherein the first state parameter is used to monitor the motion state of the main server.

进一步地，在本申请的实施例中，获取单元18，还用于阈值监控子模块获取主服务器对应的第一状态参数。Further, in the embodiment of the present application, the obtaining unit 18 is further configured to obtain the first state parameter corresponding to the main server by the threshold monitoring sub-module.

在本申请的实施例中，进一步地，如图13所示，本申请实施例提出的MooseFS系统还可以包括：接收单元110和存储单元111。In the embodiments of the present application, further, as shown in FIG. 13 , the MooseFS system proposed in the embodiments of the present application may further include: a receiving unit 110 and a storage unit 111 .

接收单元110，用于信息存储子模块接收服务监控子模块发送的主服务器对应的第一服务信息。The receiving unit 110 is used for the information storage sub-module to receive the first service information corresponding to the main server sent by the service monitoring sub-module.

存储单元111，用于存储所述第一服务信息。The storage unit 111 is configured to store the first service information.

进一步地，在本申请的实施例中，获取单元18，还用于集群监控模块获取信息存储子模块存储的主服务器的第一服务信息；集群监控模块根据心跳维护文件和第一服务信息进行校验处理，获得校验结果。Further, in the embodiment of the present application, the obtaining unit 18 is also used for the cluster monitoring module to obtain the first service information of the main server stored by the information storage sub-module; the cluster monitoring module performs calibration according to the heartbeat maintenance file and the first service information. The verification process is performed to obtain the verification result.

进一步地，在本申请的实施例中，确定单元16，还用于若校验结果为校验成功，则确定主服务器的当前服务状态正常；若校验结果为校验失败，则确定主服务器对应的第一服务信息为异常。Further, in the embodiment of the present application, the determining unit 16 is further configured to determine that the current service status of the main server is normal if the verification result is that the verification is successful; and determine that the main server is the main server if the verification result is that the verification fails. The corresponding first service information is abnormal.

在本申请的实施例中，进一步地，如图13所示，本申请实施例提出的MooseFS系统还可以包括：监测单元112。In the embodiments of the present application, further, as shown in FIG. 13 , the MooseFS system proposed in the embodiments of the present application may further include: a monitoring unit 112 .

监测单元112，用于全局监控子模块对服务监控子模块、阈值监控子模块以及信息存储子模块进行监测，获得服务监控子模块对应的第一服务进程、阈值监控子模块对应的第二服务进程、以及信息存储子模块对应的第三服务进程。The monitoring unit 112 is used for the global monitoring sub-module to monitor the service monitoring sub-module, the threshold monitoring sub-module and the information storage sub-module, and obtain the first service process corresponding to the service monitoring sub-module and the second service process corresponding to the threshold monitoring sub-module , and the third service process corresponding to the information storage submodule.

进一步地，在本申请的实施例中，报警单元19，还用于若第一服务进程、第二服务进程以及第三服务进程中的任意一个进程为异常，则进行报警处理。Further, in the embodiment of the present application, the alarm unit 19 is further configured to perform alarm processing if any one of the first service process, the second service process and the third service process is abnormal.

进一步地，在本申请的实施例中，获取单元18，还用于服务监控子模块基于心跳监测机制对数据节点进行监测，获取数据节点对应的第二服务信息。Further, in the embodiment of the present application, the obtaining unit 18 is further configured for the service monitoring sub-module to monitor the data node based on the heartbeat monitoring mechanism, and obtain the second service information corresponding to the data node.

进一步地，在本申请的实施例中，执行单元17，还用于若第二服务状态为中止，则对第二服务进行拉起；若第二服务状态为无法再次启动，则进行报警处理。Further, in the embodiment of the present application, the execution unit 17 is further configured to start the second service if the second service status is suspended; and perform alarm processing if the second service status cannot be restarted.

进一步地，在本申请的实施例中，获取单元18，还用于阈值监控子模块获取数据节点的第二状态参数。Further, in the embodiment of the present application, the obtaining unit 18 is further configured to obtain the second state parameter of the data node by the threshold monitoring sub-module.

进一步地，在本申请的实施例中，执行单元17，还用于若第二状态参数大于第二预设状态阈值，则进行报警处理。Further, in the embodiment of the present application, the execution unit 17 is further configured to perform alarm processing if the second state parameter is greater than the second preset state threshold.

进一步地，在本申请的实施例中，接收单元110，还用于信息存储子模块接收服务监控子模块发送的数据节点对应的第二服务信息。Further, in the embodiment of the present application, the receiving unit 110 is further configured for the information storage sub-module to receive the second service information corresponding to the data node sent by the service monitoring sub-module.

进一步地，在本申请的实施例中，存储单元111，还用于存储第二服务信息。Further, in the embodiment of the present application, the storage unit 111 is further configured to store the second service information.

图14为本申请实施例提出的MooseFS系统的组成结构示意图六，如图14所示，本申请提出的MooseFS系统还包括处理器113、存储有处理器113可执行指令的存储器114，通信接口115，和用于连接处理器113、存储器114以及通信接口115的总线116。FIG. 14 is a schematic diagram 6 of the composition and structure of the MooseFS system proposed by the embodiment of the present application. As shown in FIG. 14 , the MooseFS system proposed by the present application further includes a processor 113 , a memory 114 storing executable instructions of the processor 113 , and a communication interface 115 , and a bus 116 for connecting the processor 113 , the memory 114 and the communication interface 115 .

在本申请的实施例中，上述处理器113可以为特定用途集成电路(ApplicationSpecific Integrated Circuit，ASIC)、数字信号处理器(Digital Signal Processor，DSP)、数字信号处理装置(Digital Signal Processing Device，DSPD)、可编程逻辑装置(ProgRAMmable Logic Device，PLD)、现场可编程门阵列(Field ProgRAMmable GateArray，FPGA)、中央处理器(Central Processing Unit，CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地，对于不同的设备，用于实现上述处理器功能的电子器件还可以为其它，本申请实施例不作具体限定。还可以包括存储器114，该存储器114可以与处理器113连接，其中，存储器114用于存储可执行程序代码，该程序代码包括计算机操作指令，存储器114可能包含高速RAM存储器，也可能还包括非易失性存储器，例如，至少两个磁盘存储器。In the embodiments of the present application, the processor 113 may be an application specific integrated circuit (ASIC), a digital signal processor (DSP), or a digital signal processing device (DSPD). At least one of a programmable logic device (ProgRAMmable Logic Device, PLD), a field programmable gate array (Field ProgRAMmable GateArray, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor A sort of. It can be understood that, for different devices, the electronic device used to implement the above processor function may also be other, which is not specifically limited in the embodiment of the present application. It can also include a memory 114, which can be connected to the processor 113, wherein the memory 114 is used to store executable program codes, which include computer operation instructions. The memory 114 may include high-speed RAM memory, and may also include non-volatile memory. Volatile memory, for example, at least two disk drives.

在本申请的实施例中，总线116用于连接通信接口115、处理器113以及存储器114以及这些器件之间的相互通信。In the embodiment of the present application, the bus 116 is used to connect the communication interface 115 , the processor 113 and the memory 114 and the mutual communication among these devices.

在本申请的实施例中，存储器114，用于存储指令和数据。In the embodiment of the present application, the memory 114 is used to store instructions and data.

进一步地，在本申请的实施例中，上述处理器113，用于当备份服务器监控到主服务器运行异常时，向集群监控模块发送查询请求；其中，查询请求携带主服务器的标识信息；集群监控模块根据标识信息确定主服务器对应的第一服务信息；其中，第一服务信息表征主服务器的实际运行状态；集群监控模块向备份服务器发送查询响应；其中，查询响应携带第一服务信息；主服务器和备份服务器根据第一服务信息，确定是否进行服务器切换。Further, in the embodiment of the present application, the above-mentioned processor 113 is configured to send a query request to the cluster monitoring module when the backup server monitors that the main server is running abnormally; wherein, the query request carries the identification information of the main server; the cluster monitoring The module determines the first service information corresponding to the main server according to the identification information; wherein, the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein, the query response carries the first service information; the main server and the backup server determines whether to perform server switching according to the first service information.

在实际应用中，上述存储器114可以是易失性存储器(volatile memory)，例如随机存取存储器(Random-Access Memory，RAM)；或者非易失性存储器(non-volatilememory)，例如只读存储器(Read-Only Memory，ROM)，快闪存储器(flash memory)，硬盘(Hard Disk Drive，HDD)或固态硬盘(Solid-State Drive，SSD)；或者上述种类的存储器的组合，并向处理器111提供指令和数据。In practical applications, the above-mentioned memory 114 may be a volatile memory (volatile memory), such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory ( Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide the processor 111 with instructions and data.

另外，在本实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.

集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时，可以存储在一个计算机可读取存储介质中，基于这样的理解，本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read OnlyMemory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

本申请实施例提供第一计算机可读存储介质，其上存储有程序，该程序被第一处理器执行时实现如实施例一至实施例四的方法。The embodiments of the present application provide a first computer-readable storage medium, on which a program is stored, and when the program is executed by the first processor, the methods according to Embodiments 1 to 4 are implemented.

具体来讲，本实施例中的一种服务器切换方法对应的程序指令可以被存储在光盘，硬盘，U盘等存储介质上，当存储介质中的与一种服务器切换方法对应的程序指令被一电子设备读取或被执行时，包括如下步骤：Specifically, a program instruction corresponding to a server switching method in this embodiment may be stored on a storage medium such as an optical disc, a hard disk, a U disk, etc. When the program instruction corresponding to a server switching method in the storage medium is When the electronic device reads or is executed, it includes the following steps:

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to schematic flowcharts and/or block diagrams of implementations of methods, apparatuses (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the schematic flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the schematic flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a process or processes and/or a block or blocks in the block diagrams.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions An apparatus implements the functions specified in a flow or flows of the implementation flow diagram and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the implementing flow diagram and/or the block or blocks of the block diagram.

以上所述，仅为本申请的较佳实施例而已，并非用于限定本申请的保护范围。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application.

Claims

1. a server switching method, it is characterized in that, described server switching method is applied to distributed file system MooseFS system, described MooseFS system comprises: main server, backup server and cluster monitoring module, described method comprises:

When the backup server monitors that the primary server is running abnormally, it sends a query request to the cluster monitoring module; wherein, the query request carries the identification information of the primary server;

The cluster monitoring module determines the first service information corresponding to the main server according to the identification information; wherein, the first service information represents the actual running state of the main server;

The cluster monitoring module sends a query response to the backup server; wherein, the query response carries the first service information;

The primary server and the backup server determine whether to perform server switching according to the first service information.

2. The method according to claim 1, wherein the primary server and the backup server determine whether to perform server switching according to the first service information, comprising:

If the first service information received by the backup server is normal, the primary server and the backup server do not perform server switching processing;

If the first service information received by the backup server is abnormal, the primary server and the backup server perform server switching processing.

3 . The method according to claim 1 , wherein the cluster monitoring module comprises: a service monitoring sub-module, and the cluster monitoring module determines the first service information corresponding to the main server according to the identification information, comprising: 3 . :

The service monitoring submodule performs service status monitoring on the main server based on the heartbeat monitoring mechanism, and obtains at least one service information corresponding to at least one service; wherein, one service corresponds to one service information;

If the at least one service information is normal, determining that the first service information is normal;

If any one of the at least one service information is abnormal, it is determined that the first service information is abnormal.

4. The method according to claim 3, wherein the cluster monitoring module further comprises: a threshold monitoring sub-module, and the method further comprises:

The threshold monitoring sub-module acquires the first state parameter corresponding to the main server;

If the first state parameter is greater than the first preset state threshold, alarm processing is performed; wherein, the first state parameter is used to monitor the running state of the main server.

5. The method according to claim 4, wherein the cluster monitoring module further comprises: an information storage sub-module, and the method further comprises:

The information storage submodule receives the heartbeat monitoring file corresponding to the main server sent by the service monitoring submodule, and stores the heartbeat monitoring file.

6. The method according to claim 5, wherein the query request also carries a heartbeat maintenance file of the main server, and the cluster monitoring module determines the first service corresponding to the main server according to the identification information information, including:

The cluster monitoring module obtains the heartbeat monitoring file corresponding to the main server stored by the information storage sub-module;

The cluster monitoring module performs verification processing according to the heartbeat maintenance file and the heartbeat monitoring file to obtain a verification result;

If the verification result is that the verification is successful, it is determined that the first service information corresponding to the primary server is normal;

If the verification result is a verification failure, it is determined that the first service information corresponding to the primary server is abnormal.

7. The method according to claim 6, wherein the cluster monitoring module further comprises: a global monitoring sub-module, and the method further comprises:

The global monitoring sub-module monitors the service monitoring sub-module, the threshold monitoring sub-module and the information storage sub-module, and obtains the first service process and the threshold monitoring sub-module corresponding to the service monitoring sub-module a corresponding second service process and a third service process corresponding to the information storage submodule;

If any one of the first service process, the second service process, and the third service process is abnormal, alarm processing is performed to ensure high availability of the cluster monitoring module.

8. The method according to claim 3, wherein the MooseFS system further comprises: a data node, and the method further comprises:

The service monitoring sub-module monitors the data node based on a heartbeat monitoring mechanism, and obtains second service information corresponding to the data node;

If the second service information is suspended, execute the service pulling process, and re-monitor the data node to obtain the updated second service information;

If the updated second service information is discontinued, an alarm process is performed.

9. A MooseFS system, characterized in that the MooseFS system comprises: a main server, a backup server and a cluster monitoring module, wherein,

The main server is used to manage the MooseFS system and perform data transmission with the data nodes;

the backup server, configured to monitor the running state of the primary server and save the metadata in the primary server;

The cluster monitoring module is used for monitoring the actual running status of the main server and the data node.

10. A MooseFS system, characterized in that the MooseFS system comprises: a main server, a backup server and a cluster monitoring module, the MooseFS system further comprises a processor, a memory storing executable instructions of the processor, and when all When the instructions are executed by the processor, the method according to any one of claims 1-8 is implemented.

11. A computer-readable storage medium on which a program is stored, applied in the MooseFS system, the MooseFS system comprising: a main server, a backup server and a cluster monitoring module, it is characterized in that, when the program is executed by the processor , implementing the method according to any one of claims 1-8.