HK1120962B

HK1120962B - Data processing method and device based on trunking

Info

Publication number: HK1120962B
Application number: HK08114056.4A
Authority: HK
Inventors: 姚建东
Original assignee: 创新先进技术有限公司
Filing date: 2008-12-30
Publication date: 2011-05-06

Description

Data processing method and device based on cluster

Technical Field

The present invention relates to the field of data processing, and in particular, to a data processing method and apparatus based on a cluster environment.

Background

With the development of computer and network technologies, simply improving the hardware performance of a server cannot meet the increasing processing requests and the requirements on the reliability of the server, and in this context, the clustering technology is widely applied. A cluster is a parallel or distributed system of interconnected sets of complete machines that can be used as a uniform computing resource. When a server in a cluster fails, the services and work of that server can be provided by other servers in the cluster, thereby providing users with highly reliable network services. Moreover, by adopting the clustering technology, the load originally borne by one server can be distributed to other servers in the cluster, thereby greatly improving the processing capacity of the system.

FIG. 1 is a system diagram showing a data processing system using clusters, including a cluster A and an external server B, where the cluster A is composed of a server a₀、a₁、a₂And the load balancing device x is used for selecting a server from the cluster according to a certain strategy on a physical layer to provide services for the outside world after receiving an external processing request, for example, the server is selected according to the current network connection of the server. Typically, each server in the cluster provides the same serviceHowever, sometimes different servers may also assume different roles to handle different tasks, e.g. by a₀Handling big customer data, a₁、a₂Common customer data is processed. Thus, if a₀Sending one or more data containing the unique identification to a server B, and returning one or more processing results containing the unique identification to the cluster A by the server B while requiring the processing results to continue to be processed by a₀And (6) processing.

In the above process, the prior art aims to ensure that the result returned by the server B can be correctly returned to the corresponding server a₀It is common practice to receive a₀After the data is sent, the server B is still connected with the server a₀Keeping network connection, after B process is completed, directly returning the processed result to a on the network link₀. However, this causes a network connection stress on the server B. Especially if there are many objects served by server B, the large number of network connections will consume a large amount of its server resources, resulting in a reduction of processing power. Therefore, how to correctly transmit the data processed by the external server to the corresponding server in the cluster becomes a problem that those skilled in the art must face when applying the clustering technology.

Disclosure of Invention

The invention aims to provide a cluster-based data processing method, which aims to solve the problems that server resources are occupied and the processing performance of a server is reduced due to the fact that data are transmitted between an external server and a cluster through network connection in the prior art.

In order to solve the above problem, the present invention discloses a data processing method based on a cluster, wherein the cluster comprises a load balancing device and at least two servers, and the method comprises the following steps:

sending first data containing a control identifier to the outside, wherein the control identifier comprises a unique identifier and control information corresponding to the first data;

the load balancing equipment receives second data returned from the outside, wherein the second data comprises the control identification;

and routing the second data according to the control information contained in the control identification.

Preferably, after receiving the second data returned from the outside, the load balancing device further includes: sending the second data to a server in the cluster according to a preset load balancing rule; and the server receiving the load balancing rule routes the second data according to the control information contained in the control identification.

Preferably, after receiving the second data returned from the outside, the load balancing device further includes: sending the second data to a server in the cluster according to a preset load balancing rule; the server sends the received second data to routing middleware; the routing middleware routes the second data according to the control information contained in the control identification.

Preferably, the control information includes server location information corresponding to the first data.

Preferably, the routing the second data in the cluster according to the control information includes: and sending the second data to a server corresponding to the server position information.

Preferably, the routing the second data in the cluster according to the control information further includes: and if the server corresponding to the server position information fails, reselecting the server in the cluster according to a preset routing rule, and sending the second data to the reselected server.

In order to solve the above problem, the present invention also discloses a cluster-based data processing apparatus, comprising a load balancing device and at least two servers,

the server includes:

a sending unit, configured to send first data including a control identifier to the outside, where the control identifier includes a unique identifier and control information corresponding to the first data;

the load balancing equipment receives second data returned from the outside, wherein the second data comprises the control identification and corresponding control information;

the device further comprises:

and the routing unit is used for routing the second data according to the control information contained in the control identification.

Preferably, the routing unit is located in the server; after receiving the second data, the load balancing equipment sends the second data to one server in the cluster according to a preset load balancing rule; the server receiving the load balancing rule further comprises a receiving unit, configured to receive second data sent by the load balancing device, and the routing unit routes the second data according to the control information included in the control identifier.

Preferably, the load balancing device sends the second data to one server in the cluster according to a preset load balancing rule; the server receiving the load balancing rule sends the received second data to the routing unit; the routing unit routes the second data according to the control information contained in the control identifier.

Preferably, the routing unit routes the second data according to the control information by sending the second data to a server corresponding to the server location information.

Preferably, the routing unit further includes: and the fault detection unit is used for detecting whether the server corresponding to the server position information has a fault or not, if so, reselecting the server according to a preset routing rule, and sending the second data to the reselected server.

Compared with the prior art, one embodiment of the present invention has the following effects:

the invention utilizes the characteristic that the unique identifier is not changed in data transmission, expands the unique identifier into the control identifier by adding the control information, realizes the routing of the second data in the cluster according to the control information corresponding to the control identifier in the second data after receiving the second data returned by the external server, ensures that the second data can be processed by the correct server, and avoids the problems of large occupation of server resources and performance reduction caused by the fact that the network connection is maintained between the cluster and the external server in the prior art to transmit the data. The invention breaks through the common knowledge of the unique identification function in the transmitted data in the prior art, and realizes the asynchronous processing between the cluster and the external server under the conditions of not changing the prior data structure and not needing to modify programs and devices of the external server.

Drawings

FIG. 1 is a block diagram of a system architecture of a cluster in the prior art;

FIG. 2 is a flow chart of one step of an embodiment of the method of the present invention;

fig. 3 is a block diagram of an embodiment of the apparatus of the present invention.

Detailed Description

In the conventional clustering technique, a server a in a cluster A is assigned_nThe processed data can still be processed by the server a after being processed by the external server_nTreated, often by a_nThe network connection is maintained between the server and the external server to realize the data transmission, so that the resource of the server is greatly occupied,the processing performance of the cluster and external servers is greatly reduced. The invention resets the unique identification of the data sent by the cluster to the external server, expands the unique identification into the control identification by adding the control information, and because the control identification is not changed after the data is processed by the external server, the cluster can realize the routing of the data in the cluster according to the control information contained in the control identification after receiving the data returned by the external server, so that the data can be processed by the corresponding server in the cluster, and the asynchronous processing of the data between the cluster and the external server is realized.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 2 shows a flow chart of steps of a first embodiment of the method according to the present invention, which is described in detail below with reference to fig. 2.

Step 210: and generating a control identifier according to a preset rule.

In the prior art, each piece of data sent by a cluster to an external server contains a unique identifier that is different from other data, and the unique identifier does not change in the process that the data is processed by the external server and returned to the cluster. The invention utilizes the characteristic of the unique identifier to extend the unique identifier into the control identifier by adding control information. In this example, the control identifier includes a unique identifier and control information. The generation rule of the unique identifier can be defined according to needs when the method is implemented, and only one piece of information can be uniquely identified; the control information is used to control how the data is processed in the cluster, e.g. the control information contains location information of the server used to process the data. Certainly, the inclusion of the server location information in the control information is only a preferred method of the present invention, and a person skilled in the art may set other contents in the control information according to business needs when implementing the present invention, for example, in some clusters, versions of service programs running on each server are different, and data must be processed by a server having a corresponding program version, so that a program version required for processing the data may be set in the control information; for example, if the server is requested to process data within a predetermined time, the processing time may be set in the control information.

Step 220: and sending the first data containing the control identification to an external server.

And forming first data by the control identification generated by the steps and other data needing to be processed, and sending the first data to an external server.

Preferably, the invention does not change the data structure of the original unique identifier when the unique identifier is extended to the control identifier, and the data composition structure and the interface rule of the first data sent to the external server are not changed, thereby keeping the semantic consistency and completeness of the first data, and simultaneously, the original processing rule of the external server does not need to be modified. Moreover, even if the control requirement of the data in the cluster changes, only the control information in the control identifier needs to be changed, and the cost increase caused by the change is reduced to the maximum extent.

Step 230: and receiving second data returned by the external server.

The second data is data processed by the external server and returned, and the data comprises a control identifier corresponding to the first data; the external server is a device or apparatus capable of performing data interaction with the cluster, which is selected for the convenience of explaining the scheme of the present invention, but the external server itself may also be a cluster.

Generally, a load balancing device is provided in the cluster to select a server from the cluster according to a certain policy from a physical level to provide services to the outside, for example, to select randomly or to select according to the current network connection condition of the server. The load balancing device may be a dedicated hardware device, or may be assumed by a server, and a person skilled in the art may decide whether to implement load balancing by using a dedicated hardware device at will when implementing the present invention. In this embodiment, preferably, any one of the servers in the cluster can implement routing of the second data. And after receiving second data returned by the external server, the load balancing equipment randomly selects a server from the cluster to respond, and sends the second data to the server.

Step 240: and analyzing the control identification and the corresponding control information.

And after receiving the second data, the server analyzes the control identification according to a preset rule to obtain a unique identification and control information.

Step 250: the second data is routed according to rules set in the control information.

Preferably, the corresponding server is selected according to the server position set in the control information, and the second data is sent to the server for processing. Of course, when implementing the present invention, a person skilled in the art may set the content and rule of the control information according to actual needs, and route the second data according to the rule, for example, the server may be selected according to the version of the service provided by the server and the required processing time.

Step 260: the step is an optional step, whether the corresponding server fails or not is detected according to the server position information in the control information, if the corresponding server fails, the server is reselected in the cluster according to a preset rule and a strategy, and the second data is sent to the server for processing.

It should be noted that, the control identifier may be provided with other contents according to the service requirement besides the unique identifier and the control information, which is not limited in this respect, for example, the data in the control identifier is encrypted and signed to improve the security of the data.

For example, cluster a sends a datagram to server B, asking server B to deduct 100 dollars from the user account, customercountid,

< product > Supermarket Consumer >

In order to avoid tampering or repudiation of the contents of the datagram by the server B, the contents of the message may be encrypted according to a predetermined algorithm to obtain a digest, and the digest is added to the control identifier, where the last 16 bits 8708765635553223 of ordered is the digest of the contents of the datagram. When transaction errors occur, the contents of the datagram received by the server B are encrypted according to the same algorithm to obtain an abstract, then the abstract is compared with the abstract in the control identification, if the contents of the datagram are inconsistent, the datagram is considered to be tampered after being sent to the server B, and therefore the data security is improved.

In the second embodiment of the present invention, after receiving the second data, the server selected by the load balancing device forwards the second data to the routing middleware, and the routing middleware routes the data according to the control information corresponding to the second data. The routing middleware may be an independent server, or may be assumed by a server in the cluster. Compared with the first embodiment, the servers in the cluster do not undertake routing of data any more, but are processed in a centralized manner by the routing middleware, and the routing service in each server is prevented from being maintained, so that the maintenance complexity is reduced. For other contents of this embodiment, please refer to embodiment one, which is not described herein again.

In another embodiment of the present invention, the functions performed by the load balancing device and the functions performed by the routing middleware are implemented on the same device, so that after receiving the data returned by the external server, the server can be selected from the cluster to complete the processing of the data by comprehensively considering the load balancing and the control rules of the control information related to the data.

The cluster-based data processing method of the present invention is described above, and a specific example is used to further describe the implementation process of the method in conjunction with the application environment.

In this example, the cluster a requests the server B for an order payment, and after receiving the request, the server B completes the payment according to the order number and returns the payment result, which includes the following specific processes:

step 301: server a₁Sending a request to the server B, if the server B responds, a₁Establishing network connection with B; a is₁And sending a request datagram to B, and disconnecting the network connection with B after the data is sent.

a₀The contents of the request datagram sent to B are as follows:

< product > Supermarket Consumer >

Wherein < ordered > A111123</ordered > is the control identification of the datagram, A1 identifies the server name, 11123 is the service order number.

Step 302: server B performs internal processing on the received datagram to generate a processing result.

In this example, server B pays an amount of 100.00 to customer account 87634293882173710987 and generates a processing result, which is formatted as follows:

< result > Payment success >

In this case, the return information of B must have the control identification value < ordered > A111123</ordered >.

Step 303: and the server B requests a new network connection from the load balancing equipment and returns a processing result, and the network connection is disconnected after the data transmission is finished.

Step 304: after receiving the request of the server B, the load balancing equipment randomly selects a server a from the cluster_nResponding and sending the result returned by the B to the server a_n。

Step 305: server a_nAfter receiving the return information, the route program running in the server analyzes<ordered>A111123</ordered>Obtains the location information A1 of the destination server, and then sends the return information to the server a₁。

While a cluster-based data processing method according to the present invention has been described above with reference to specific embodiments, reference is made to the above description of the invention, as shown in fig. 3, which is a cluster-based data processing apparatus 300 according to the present invention, the apparatus comprising: the load balancing device 310, the server 320,

for each of the servers: a sending unit 321, configured to send first data including a control identifier to the outside, where the control identifier includes a unique identifier and control information corresponding to the first data; the load balancing equipment receives second data returned from the outside, wherein the second data comprises the control identification and corresponding control information; the server further comprises: a receiving unit 322, configured to receive second data sent by the load balancing device; the server further comprises: a routing unit 323, configured to route the second data according to the control information corresponding to the second data.

For the first data sent by the sending unit, the control information corresponding to the control identification of the first data comprises the server position information corresponding to the first data. The second data received by the load balancing device and returned from the outside also includes the control identifier, and correspondingly, the control information of the control identifier also includes the location information of the server. And after the server receives second data sent by the load balancing equipment, the routing unit sends the second data to the server corresponding to the server position information.

Further, the routing unit 323 of the present invention further includes: and the fault detection unit 3231 is configured to detect whether a server corresponding to the server location information has a fault, and if the server has the fault, reselect the server according to a preset routing rule, and send the second data to the server for processing.

Another embodiment of the apparatus of the present invention differs from the above-described embodiment of the apparatus in that no routing unit is provided in each server in the cluster, but one routing unit is provided in the cluster and connected to all servers in the cluster, respectively. After receiving the second data sent by the load balancing device, the server first sends the second data to the routing unit, and then the routing unit routes the data to the corresponding server for processing according to the control information corresponding to the second data. In addition, the load balancing device may also directly send the second data to the routing unit, and then the routing unit routes the second data to the corresponding server according to the control information. For other contents of this embodiment, please refer to the above contents, which are not described herein again.

The above detailed description is provided for a cluster-based data processing method and apparatus, and the specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the above descriptions of the embodiments are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for cluster-based data processing, the cluster including a load-balancing device and at least two servers, the method comprising:

the server sends first data containing a control identifier to an external server, wherein the control identifier comprises a unique identifier and control information corresponding to the first data; the control information comprises server position information for sending the first data;

the load balancing equipment receives second data returned by the external server, wherein the second data comprises the control identification;

and sending the second data to a server corresponding to the server position information according to the control information contained in the control identification.

2. The method of claim 1, wherein the load balancing device further comprises, after receiving the second data returned from the outside:

sending the second data to a server in the cluster according to a preset load balancing rule; and the server receiving the second data sends the second data to a server corresponding to the server position information according to the control information contained in the control identification.

3. The method of claim 1, wherein the load balancing device further comprises, after receiving the second data returned from the outside:

sending the second data to a server in the cluster according to a preset load balancing rule;

the server sends the received second data to routing middleware; and the routing middleware sends the second data to a server corresponding to the server position information according to the control information contained in the control identification.

4. The method of claim 1, wherein sending the second data to a server corresponding to the server location information according to control information further comprises:

and if the server corresponding to the server position information fails, reselecting the server in the cluster according to a preset routing rule, and sending the second data to the reselected server.

5. A cluster-based data processing apparatus comprising a load balancing device and at least two servers,

the server includes:

a sending unit, configured to send first data including a control identifier to an external server, where the control identifier includes a unique identifier and control information corresponding to the first data; the control information comprises server position information for sending the first data;

the device further comprises:

a routing unit, configured to route the second data according to the control information included in the control identifier; and the routing unit is used for sending the second data to the server corresponding to the server position information according to the control information routing second data.

6. The apparatus of claim 5, wherein the routing unit is located in the server; after receiving the second data, the load balancing equipment sends the second data to one server in the cluster according to a preset load balancing rule; the server receiving the second data further comprises a receiving unit used for receiving the second data sent by the load balancing equipment, and the routing unit sends the second data to the server corresponding to the server position information according to the control information contained in the control identifier.

7. The apparatus according to claim 5, wherein the load balancing device sends the second data to one server in the cluster according to a preset load balancing rule; the server receiving the second data sends the received second data to the routing unit; and the routing unit sends the second data to a server corresponding to the server position information according to the control information contained in the control identification.

8. The apparatus of claim 5, wherein the routing unit further comprises: and the fault detection unit is used for detecting whether the server corresponding to the server position information has a fault or not, if so, reselecting the server according to a preset routing rule, and sending the second data to the reselected server.