CN112751916B

CN112751916B - Data publishing-subscribing method and system for micro-service governance

Info

Publication number: CN112751916B
Application number: CN202011578199.8A
Authority: CN
Inventors: 黄涛; 唐震; 王伟; 魏峻; 李慧; 张舒扬; 宋傲
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-03-25
Anticipated expiration: 2040-12-28
Also published as: CN112751916A

Abstract

The present invention relates to the technical field of software, and in particular, to a data publishing-subscription method and system for microservice governance. The main contents of the present invention include: when the client publishes data to a certain topic, the data to be published is encapsulated into a data publishing request, and sent to any node of the data publishing server cluster, and the node parses the data to be published and sends the data to be published. Forward to the data publishing server node; the data publishing server node sends the received data together with the client forwarding address list to a client that subscribes to the topic to which the data belongs; after the client receives the data and the client forwarding address list, it sends the data to the client. The received data is stored in the local storage system and starts to assist the data publishing server node to forward messages. The forwarded target addresses are all addresses in the client forwarding address list. The invention can solve the performance problem of data publishing and subscription in large-scale micro-service governance, and ensure the reliability of data publishing and subscription.

Description

Data publishing-subscribing method and system for micro-service governance

Technical Field

The invention relates to the technical field of software, in particular to a data publishing-subscribing method and system for micro-service governance.

Background

Cloud computing technology has been widely used to support the deployment and management of large-scale microservice systems. Currently, by means of a cloud computing-based technical system such as virtualization and containers, the micro-service instances in a data center cluster of a large cloud service provider are rapidly increased in size, the number of instances deployed in a single data center cluster reaches ten thousand orders, and an extreme scale of more than one hundred thousand or even million orders will be reached in the future. The key requirements of micro-service administration such as registration discovery, configuration management and health check of micro-services face a plurality of new problems in the scene facing the extreme-scale micro-service cluster, and bring a plurality of technical challenges.

A Publish-Subscribe (Publish-Subscribe) model is a common model for meeting the above key requirements, and is widely used in current actual systems. For example: under the configuration management requirement of the micro-service, releasing configuration data subscribed by the micro-service instance to realize dynamic configuration change during service operation; the micro-service provider issues the latest service list to the micro-service caller (subscriber) when the service needs to be expanded or reduced, so that the service calling relationship is dynamically changed during operation; when multi-stage gradation test is performed using a part of the service instances, it is necessary to issue tag data so as to control a test range and the like.

The typical scenes of the micro-service governance all depend on an efficient, reliable and extensible data publishing-subscribing model and mechanism, and the problems of performance, reliability and extensibility of a publishing-subscribing system in the scene of the extreme-scale service instance are solved, so that the typical scenes of the micro-service governance also become a hot problem for supporting the effective management of the extreme-scale micro-service instance.

Taking a typical extreme-scale service instance environment as an example, a typical data publish-subscribe scenario in the environment is shown in fig. 1, where a large number of data update requests first reach a data publisher cluster (on the left side of fig. 1), and the cluster publishes update data to an instance subscribed to the data; on the service instance side (right side of fig. 1), it should be ensured that all or most instances get the correct and complete data. In this scenario, the data publishing mechanism faces the challenges of complex Service-Level agent (SLA) such as publishing delay (performance), publishing success rate (reliability), maximum bearable load of a data publisher cluster (scalability), and the like, which are specifically as follows:

1) data publishing-subscribing under an extreme scale scene depends on a distributed data publishing system, and the expansibility of the system and the data publishing time delay are contradictory. Communication and data synchronization overhead among data distribution servers is increased sharply along with the increase of cluster scale, so that the average time delay of data distribution is difficult to guarantee. Instability of data release delay can cause data release requirements to be not effectively guaranteed, and therefore system errors, overload and even breakdown are caused. Meanwhile, the performance problem is difficult to solve simply by increasing the cluster scale due to the fact that the difference of the upper and lower limits of the load among the servers is too large and the task load distribution is uneven. At present, a load balancing mechanism in a data publishing server cluster is urgently needed to be introduced, so that the load of data publishing is ensured to be balanced among different data publishing servers, and the contradiction between the expansibility of the data publisher cluster and the data publishing time delay is avoided.

2) Under an extreme scale scene, the problems of failure of a micro-service instance and failure of a data distribution server are more common, and the failed service instance and the failed data distribution server instance need to be identified and isolated in time on the premise of providing expansibility guarantee. Firstly, a small probability event of node failure or network unreachability is more likely to occur in a large-scale instance scene; second, cluster deployment of multiple multidata centers can result in network partitioning due to link failure. If the examples fail to be effectively identified and isolated in the data issuing process, the data issuing efficiency is directly influenced, and inconsistent and even wrong data are used by the failed examples, so that the correctness of the whole system is influenced. For example, if an IP address of a public micro service subscribed to a service instance of a specific service changes, the data update needs to be quickly published to an reachable node in a cluster, if there is an undetected failed node in the cluster, data publishing performance may be reduced due to repeated retries to the unreachable node, it is difficult for the failed node to obtain correct data after recovery, and an access request of a user may be routed to a wrong IP, which may result in timeout and even return a wrong result.

In summary, an efficient (low-delay, extensible) and reliable publish-subscribe system is a basic requirement for the extreme-scale micro-service governance, such as configuration management and service discovery, and a targeted method and system are urgently needed to solve the new problems faced by the large-scale micro-service instance publish-subscribe.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a micro-service management-oriented high-efficiency reliable data publishing-subscribing method and system, which solve the performance problem of data publishing and subscribing in large-scale micro-service management, ensure the reliability of data publishing and subscribing and can solve the problems of node failure, network unreachability and the like which are common in large-scale clusters. The main invention content is as follows:

an efficient and reliable data publishing-subscribing method mainly comprises two types of participants: (1) and the server cluster participates in data distribution. (2) And a target node (hereinafter referred to as a client) for receiving data. The method mainly comprises the following steps:

1) the server cluster is initialized, data publish-subscribe service is started, and meanwhile network communication connection inside the cluster is established.

2) The client cluster calculates the address of the data publishing server node needing to be requested according to the subscribed topic (Key) and sends a data subscription request to the client cluster.

3) When a client needs to issue data to a certain theme (Key), the client requests (namely, a data issuing request) any node of a data issuing server cluster, encapsulates the data to be issued into the data issuing request, and sends the request to the node. After receiving the request sent from the client, the node analyzes the data to be issued from the request data, firstly calculates the data issuing server node for carrying the issuing task in the same calculation mode as the step 2), and forwards the data to the data issuing server node through the network communication connection in the cluster established in the step 1).

4) And after receiving the data, the data publishing server node which receives the data publishing task sends the data and the client forwarding address list to a certain client which subscribes the data target theme (Key).

5) After receiving the data and the client forwarding address list, the client stores the data in the local storage system and starts assisting the data publishing server node to forward the message, where the forwarded target address is all addresses in the client forwarding address list sent by the data publishing server to the client in step 4), as shown in fig. 2.

Furthermore, the method is a topic (Key) -oriented publish-subscribe method, all data to be published need to correspond to a globally unique topic, and publish-subscribe operations are operations on a certain topic and data under the topic.

Further, the network communication connection inside the cluster in step 1 is a long persistent connection, and is established if and only if the cluster is initialized, and is disconnected if the cluster stops service.

Further, the data distribution server node mapping method in the above steps 2 and 3 (i.e., the method of calculating the data distribution server node that takes over the distribution task) is performed based on a hash (hash value).

Further, in order to reduce the computational load of the data publishing server cluster when the subscribed service instances are large in scale, the method needs to enable the client to assist the data publishing server to publish the message based on the client forwarding method in step 4, that is, step 5). The client forwarding address list is composed of all client addresses subscribing the topic on the node bearing the data publishing task.

Further, in order to ensure the reliability of data distribution, the method needs to additionally take the following steps (as shown in fig. 3):

(1) for a client node directly sent by a server, after the client successfully receives a message, reporting the state of the client to the server, namely sending response information (ACK information) to the server;

(2) for the client node which receives the message after the client-side assistance forwarding, reporting the state of the client node to a source node (source client node) of the forwarding operation after the client node successfully receives the message, and reporting a node address list of the forwarding failure to a server after the source node summarizes the sending condition of the forwarding operation;

(3) and (3) when the server fails to receive the state reporting request in the step (1), the server attempts to retransmit the message, and stops retransmitting and records the log when the retransmission reaches the maximum limit, and informs system operation and maintenance personnel to perform problem troubleshooting.

(4) And (3) when the server successfully receives the report request in the step (1) and receives the forwarding failure node address list reported in the step (2), the server directly retransmits the request without forwarding operation.

An efficient and reliable data publish-subscribe system adopting the method mainly comprises the following steps:

the data publishing-subscribing topic management module is used for managing the publishing-subscribing states of the server and the client, simultaneously recording the mapping relation between a topic and a client address list subscribing the topic, and generating a client forwarding address list;

the network communication and connection management module is used for sending data to the interior of the data distribution server cluster and the client nodes through a network and is also used for message forwarding operation between the client nodes;

and the hash value calculation module is used for calculating a globally unique hash value according to the current node state of the cluster, wherein the hash value is a basis for mapping the subscription and publication requests of the client to specific server nodes.

Furthermore, the system also comprises a fault-tolerant and reliability guarantee module which is used for guaranteeing the reliability of the data publishing-subscribing. For example, Ack information collection and message retransmission mechanisms can be adopted to ensure the reliability of data publish-subscribe.

Furthermore, the system also comprises a performance experiment and performance analysis module which is used for assisting in verifying the correctness of the whole system, and meanwhile, the performance indexes of the whole system can be collected through the analysis of logs, so that the system is used for supporting the subsequent iteration of other functional and non-functional requirements of the system.

Compared with the prior art, the invention has the advantages that:

(1) the issue-subscription performance problem under the control of the extreme-scale micro-service is solved by means of client-side assisted forwarding and the like.

(2) The reliability of data publishing-subscribing under abnormal conditions such as network partition, node downtime and the like is ensured.

(3) The expandability of the data distribution server cluster is maintained through the method of hash value calculation and server mapping.

Drawings

FIG. 1 is a schematic diagram of a data publish-subscribe scenario in a microservice environment;

FIG. 2 is a schematic diagram of a client-assisted forwarding operation;

FIG. 3 is a schematic diagram of the reliability guarantee principle of the publish-subscribe method;

FIG. 4 is a general flow diagram of a publish-subscribe method.

FIG. 5 is a schematic diagram of a system server-side implementation;

FIG. 6 is a schematic diagram of a system client implementation principle;

FIG. 7 is a schematic diagram of the system performance testing and analyzing module implementation principle.

Detailed Description

In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings of the specification.

The invention provides a micro-service governance-oriented efficient and reliable data publishing-subscribing method and system, the general flow of which is shown in figure 4, and the main module division is shown in figures 5 and 6.

Further, on the data publishing server side, as shown in fig. 5, the data publishing-subscribing topic management module, the network communication and connection management module, the hash (hash) value calculation module, the forwarding address list and Ack information collection module, and the cluster address management module mainly comprise the following modules:

1. the data publish-subscribe topic management module: the module is mainly responsible for efficiently managing the topics to be published currently and corresponding subscribers under each topic, and meanwhile, persistently storing the information. The specific implementation way is to manage the data structure by using a K-V (key-value) data structure, wherein the key is specific information of a subject and the value is an address list of a subscriber. When the data issuing server receives an issuing request of a client to a certain theme, the system inquires the K-V data structure according to a key corresponding to the theme, inquires an address list of the client to be issued, and delivers the address list to the network communication and connection management module for subsequent operation. Similarly, when the data publishing server receives a subscription request of a client to a certain topic, the system queries the K-V data structure according to a key corresponding to the topic, queries an address list of the client to be published, and adds the address of the client to the address list.

2. The network communication and connection management module: the module has the main functions of efficiently realizing cluster internal communication and communication from a data publishing server to a subscribing client and providing a standard publishing-subscribing behavior interface for the outside. The implementation is mainly based on a non-blocking IO communication mode of a persistent connection channel of a TCP protocol. Meanwhile, the module provides a protobuf-based coding and decoding function to the outside so as to serialize the data to be issued of the upper layer application into the byte stream with the corresponding format and deserialize the received byte stream into specific data to be received.

3. Hash (hash) value calculation module: load distribution needs to be carried out by relying on a hash value given by the module in the processes of subscribing the client and publishing the data by the data publishing server. The specific calculation method of the hash value is based on external characteristics of the server (such as an IP address, a port number, a MAC address and the like), and the hash value needs to ensure global uniqueness.

4. A forwarding address list and Ack information collection module: the module is mainly responsible for: (1) response (ACK) information of the client to the data publication is collected. (2) And monitoring the completion condition of data distribution according to the information, and carrying out retransmission operation on the data to be distributed on the client side with failed distribution based on the retransmission mode given in the method.

5. The cluster address management module: when the system is started, the module loads configuration files from the outside, generates an address list corresponding to the server cluster, and delivers the list to the network communication and connection management module to establish a TCP long connection channel inside the data publisher cluster. Meanwhile, the cluster address is managed in the memory in a list mode by the module, and the hash (hash) value calculation module calculates the hash (hash) values of the server node and other server nodes in the cluster.

On the data distribution client side, as shown in fig. 6, the data distribution client side mainly includes a network communication and connection management module and a forwarding address list analysis module, and detailed information of each module is as follows:

1. the network communication and connection management module: similar to the network communication and connection management module at the data publishing server side, the module has the main responsibility of efficiently realizing cluster internal communication and communication from the data publishing server to the subscribing client side and providing a standard publishing-subscribing behavior interface for the outside.

2. A forwarding address list analysis module: the module is used for analyzing a client forwarding address list carried by the data issuing server while issuing data, and after the module analyzes the address list, the module generates a corresponding forwarding data packet and transfers the forwarding data packet to the network communication and connection management module to be forwarded to other corresponding clients.

In another embodiment of the present invention, the system may further include a fault tolerance and reliability guarantee module and a performance experiment and performance analysis module. The fault-tolerant and reliability guarantee module is used for guaranteeing the reliability of data publishing-subscribing. The performance experiment and performance analysis module is used for assisting in verifying the correctness of the whole system, and meanwhile, can collect the performance indexes of the whole system through the analysis of logs and is used for supporting the subsequent iteration of other functional and non-functional requirements of the system. As shown in fig. 7, the system provides a performance testing and performance analysis tool, and the implementation principle is as follows: the tool performs a data publishing-subscribing performance experiment by running a plurality of internal load test scripts in different modes and sending a real network request. For performance analysis, the module collects and summarizes logs of the data release server and the client. And analyzing the completion condition, delay distribution condition and other information of the publishing/subscribing operation of each time of data according to the collected logs.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the embodiments have been described in detail for the present invention, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered in the claims of the present invention.

Claims

1. A data publishing-subscribing method facing micro-service administration is characterized by comprising the following steps:

initializing a data publishing server cluster, starting a data publishing-subscribing service, and establishing network communication connection in the cluster;

the client cluster calculates the address of a data publishing server node needing to be requested according to the subscribed topic and sends a data subscription request to the client cluster;

when a client needs to issue data to a certain theme, packaging the data to be issued into a data issuing request, sending the data issuing request to any node of a data issuing server cluster, and analyzing the data to be issued from the data issuing request by the any node and forwarding the data to the calculated data issuing server node;

the calculated data publishing server node sends the received data and the client forwarding address list to a certain client subscribing the subject to which the data belongs;

and after receiving the data and the client forwarding address list, the client stores the received data into a local storage system and starts assisting the data publishing server node to forward the message, wherein the forwarded target addresses are all addresses in the client forwarding address list.

2. The method of claim 1, wherein the network communication connection within the cluster of data distribution servers is a long persistent connection that is established if and only if the cluster is initialized and is broken if the cluster is out of service.

3. The method according to claim 1, wherein the client cluster calculates the address of the data distribution server node that needs to be requested according to the topic to which the client cluster subscribes, and calculates the address of the data distribution server node that takes over the distribution task based on the hash value.

4. The method according to claim 1, wherein the client forwarding address list consists of all client addresses on the data publication server node that subscribe to the same topic.

5. The method according to claim 1, characterized in that the following steps are taken to ensure the reliability of the data distribution:

(1) for a client side which directly sends a message to a data release server node, after the client side successfully receives the message, the client side reports the self state to the data release server node, namely response information is sent to the data release server node;

(2) for the client node which receives the message after the client-side assistance forwarding, reporting the state of the client node to the source client node of the forwarding operation after successfully receiving the message, and reporting the node address list of the forwarding failure to the data publishing server node after the source client node summarizes the sending condition of the forwarding operation;

(3) when the data publishing server node fails to successfully receive the state reporting request in the step (1), the data publishing server node retransmits the message, and when the retransmission reaches the maximum limit, the retransmission is stopped and a log is recorded, and system operation and maintenance personnel are informed to perform problem troubleshooting;

(4) and (3) when the data release server node successfully receives the state reporting request in the step (1) and receives the node address list with the forwarding failure reported in the step (2), the data release server node directly retransmits without forwarding operation.

6. A data publish-subscribe system for micro-service administration using the method of any one of claims 1 to 5, comprising:

the data publishing-subscribing topic management module is used for managing the publishing-subscribing states of the data publishing server and the client, simultaneously recording the mapping relation between a topic and a client address list subscribing the topic and generating a client forwarding address list; the network communication and connection management module is used for sending data to the interior of the data distribution server cluster and the client nodes through a network and is also used for message forwarding operation between the client nodes;

and the hash value calculation module is used for calculating a globally unique hash value according to the current node state of the data publishing server cluster, wherein the hash value is a basis for mapping the subscription and publishing requests of the client to specific server nodes.

7. The system of claim 6, further comprising a fault tolerance and reliability assurance module to ensure reliability of the data publish-subscribe.

8. The system of claim 6 or 7, further comprising a performance testing and performance analysis module for assisting in verifying the correctness of the whole system, and collecting performance indicators of the whole system through log analysis for supporting subsequent iterations of other functional and non-functional requirements of the system.