CN111177154B

CN111177154B - Distributed database caching method and hash ring optimization thereof

Info

Publication number: CN111177154B
Application number: CN201911390078.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Zhangxun Yitong Beijing Information Technology Co ltd
Current assignee: Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2023-07-25
Anticipated expiration: 2039-12-27
Also published as: CN111177154A

Abstract

The application discloses a distributed database cache and a hash ring optimization method thereof. A hash algorithm is used for solving a hash value with fixed length of the URL, and then the hash value is used for carrying out a series of follow-up operations for searching the cache object. Hashing to obtain a 128-bit value for all the input strings: hash all URLs at one [0,2 ] ¹²⁸ ‑1]Is mapped one-to-one with the URL. The method adopts a self-decision virtual node migration mutual strategy to perform sensing monitoring on the operation parameters of each cache node in the distributed proxy cache, judges whether local overheating exists or not to cause other anomalies, selects a cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, shares virtual nodes corresponding to cache nodes with higher load and reduced performance on the hash ring, achieves hot spot migration, and simultaneously avoids single-point faults.

Description

Distributed database caching method and hash ring optimization thereof

Technical Field

The invention relates to the technical field of computer application, in particular to a distributed database caching method and a hash ring optimization method thereof.

Background

In a database cluster, the addition or deletion of physical nodes is the most basic function of the cluster management system. If the conventional hash algorithms such as hash modulus and random number are adopted, a large amount of original caches are re-established under the condition of adding or deleting physical nodes, so that serious performance expense is brought to the whole database cluster system, even the normal operation of a service system is influenced, and the monotonicity principle is also seriously violated.

The consistent hashing algorithm is presented to ensure that the algorithm is monotonic, i.e., when a physical node is removed or added, it has very little effect on the existing cache mapping. Moreover, the more the number of physical nodes in the cluster, the better the consistent hashing algorithm guarantees the monotonicity effect. The principle of the consistent hashing algorithm is as follows:

(1) A hash ring and physical nodes on the ring are determined.

A range of hash values is first determined, for example: (-2 ¹⁶ ，2 ¹⁶ ). Consider all hash values of this range as one increment clockwise and beginning and endThe connected rings are called hash rings.

A hash ring is a virtual data structure that does not exist in practice. Suppose that six nodes with a-F are distributed over the hash ring after hash computation. A schematic diagram of the hash ring is shown in fig. 1.

(2) Setting access mode of physical node

If there is a SQL query request and the SQL sentence string object is used as the KEY value of the hash algorithm, the calculated hash value is mapped to a certain point in the hash ring, if the point does not correspond to a certain physical node, a clockwise search is performed along the hash ring (i.e. the physical node with the hash value larger than the calculated hash value) until the mapped physical node is found for the first time, the node is the determined target node (i.e. the minimum node larger than the hash value of the target node), and if the calculated hash value exceeds 2 ¹⁶ This range, however, still does not find a node, matches to the first physical node (because of end-to-end, it can also be considered a straight-through clockwise lookup). If the calculated hash value is between B and C, the matched physical node is the C node. If the value of the hash value is greater than F, then the A physical node is hit.

(3) Processing to augment nodes

Suppose that a G physical node is to be added, as indicated by the grey circular box in fig. 2.

The hash value of the physical node is calculated first, and the value is corresponding to a certain point of the hash ring. Meanwhile, the access mode of the hash ring is unchanged, so that the mapping relation is changed to be hash values distributed between the node D and the node G, and the hash values are mapped to the node G instead of the original node E after the node G is increased. The mapping relation of the hash value originally distributed between the node G and the node E is not changed or mapped to the node E. The result of this is that only a small fraction of cache misses after adding a physical node need be re-established. Although the problem of the change of the mapping relationship caused by the increase of the nodes still exists after the consistent hash algorithm is applied, the situation of the change of the mapping relationship is reduced to be as low as possible compared with the traditional hash modulo manner.

(4) Processing of deleted nodes

Assuming that node G in fig. 2 needs to be deleted, the hash-change will revert to the state of the original hash ring. At this time, the cache mapped to the hash value on the node G will not necessarily be hit, in which case this part of the hash value will be mapped clockwise to the E node, and at this time, the only cache that needs to be re-established is the cache of the hash value distributed between the node E and the node G. Therefore, the amount of re-established cache required to delete nodes on a hash ring is also greatly reduced over conventional hash modulo methods.

The consistent hashing algorithm is a load balancing algorithm that is currently in very wide use. In a dynamically changing cache environment, the consistent hash algorithm well meets two factors for judging whether the hash algorithm is good or bad, namely balance and monotonicity.

(1) Balance of: balance means that all the buffer space should be fully utilized, so a good hashing algorithm should be able to make the distribution of the hash results as uniform as possible.

(2) Monotonicity: monotonicity means that after the original system cache is established stably, new cache nodes are added in the system, and at this time, the established cache should be mapped into the newly added cache space, but not into the original cache space.

However, after the analysis and research in the background art, the present invention discovers that when determining to use the consistent hash algorithm, the hash algorithm used in the consistent hash must be determined, which is also a very critical step, and determines the uniformity of node distribution on the ring, and also factors such as algorithm efficiency.

Disclosure of Invention

The invention aims to solve the technical problem of providing a distributed database caching method and hash ring optimization thereof. The whole process of the method achieves the effect of distributing the buffer object space, so that a plurality of buffer nodes in the background can work cooperatively, and the efficiency is improved. When a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added, so that the fluctuation of the system load is avoided, and the stability of the mechanism is ensured.

In order to solve the technical problems, the invention provides a caching method of a distributed database, which comprises the steps of firstly solving a hash value with a fixed length of a URL by using a hash algorithm, and then carrying out a series of follow-up operations of searching a cache object by using the hash value.

Preferably, the method further comprises: hashing by using MD5 algorithm to obtain a 128-bit value for all the inputted character strings: hash all URLs at one [0,2 ] ¹²⁸ -1]One-to-one mapping with URLs is performed; will all [0,2 ] ¹²⁸ -1]The hash range is seen as a circular structure, [0,2 ¹²⁸ -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole.

Preferably, the method further comprises: each cache node is responsible for all URLs corresponding to a section of range on the hash ring, and hash calculation is carried out on the IP value of each cache node to obtain a hash value.

Preferably, the method further comprises: when the request of the user comes, the front-end agent acquires the URL contained in the request, firstly calculates the hash value of the URL, the hash value is used for a key value on the hash ring, then searches the first node larger than the key value along the clockwise direction of the hash ring, and the front-end agent positions the HTTP request on the background cache server which is just searched; when a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new cache node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added.

Preferably, the method further comprises: the self-decision virtual node migration mutual-aid strategy is adopted, operation parameters of all Cache nodes in the distributed proxy Cache are monitored in a sensing mode, whether local overheating exists or not is judged, a Cache node set with lower current load is selected as an aid according to a given multi-copy hierarchical management strategy, and virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring are shared.

Preferably, the self-determined virtual node migration policy further includes:

A. evaluating the state and service capability of a cache server;

B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and stronger service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;

C. for different cache nodes, the number of virtual nodes to migrate is determined.

Preferably, the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (5) data migration.

Preferably, the step of adjusting the number of layers of the hash ring further includes: if the number of the virtual nodes of the unit weight is reduced to the threshold value, increasing the number of layers of the virtual nodes by 1; if the number of virtual nodes of the unit weight is higher than the threshold value, reducing the layers of the virtual nodes;

the adjusting the virtual node for rebalancing further comprises: rebalancing occurs during the addition of new nodes or the deletion/failure of existing nodes in the cluster;

the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.

In order to solve the technical problems, the invention also provides a hash ring optimization method in the distributed database caching method, which adopts a self-decision virtual node migration mutual strategy, carries out perception monitoring on the operation parameters of each Cache node in the distributed proxy Cache, judges whether local overheating exists or not and other anomalies exist, selects a Cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, and shares virtual nodes corresponding to Cache nodes with higher load and reduced performance on the Cache hash ring;

preferably, the self-determined virtual node migration policy further includes:

A. evaluating the state and service capability of a cache server;

Preferably, the step of evaluating the cache server status and the service capability further comprises: evaluating the busyness of each cache node in all current cache nodes, assuming that the background has n cache nodes, and calculating the average value of the current states of all nodesThe calculation formula is as follows:

two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then the state is normal; when the cache cluster needs self-decision adjustment, it is sure that at least one cache node is already in a hot state, which cache node in the hot state needs to be selected should need adjustment, and it is needed to decide to which normal state the virtual node on the hot state cache node should be migratedThe cache node in the state.

The beneficial effects of the invention include: the self-decision virtual node migration mutual-aid strategy is adopted, the operation parameters of each Cache node in the distributed agent Cache are perceptively monitored, whether local overheating occurs or not is judged to be abnormal, a Cache node set with lower current load is selected as an aid according to a given multi-copy hierarchical management strategy, virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring are shared, hot spot migration is achieved, and single-point faults are avoided. By dynamic adaptation, the availability will be higher.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the prior art, and it is obvious that the drawings in the following description are only a part of the embodiments or the prior art, and other similar or related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a hash ring according to the background of the invention;

FIG. 2 is a diagram of a hash ring addition node according to the background of the invention;

FIG. 3 is a diagram illustrating a correspondence between virtual nodes and physical nodes in a hash ring according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating node data migration in a hash ring according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a multi-layer consistent hash ring according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of creating virtual nodes according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a distributed proxy cache according to an embodiment of the present invention;

FIG. 8 is a diagram of URL space allocation based on hash ring according to an embodiment of the present invention;

fig. 9 is a migration interaction diagram of a virtual node according to an embodiment of the present invention.

Detailed Description

The present invention is described in detail below with reference to examples. The present invention will be described in more detail with reference to the following examples, but the present invention is not limited to these examples.

If only the existing physical nodes are distributed on the hash ring, the situation that the physical nodes are unevenly distributed on the hash change is likely to occur, and the situation has great influence on whether the subsequent caches are evenly distributed, so that the load imbalance result is caused.

In one embodiment of the present invention, to solve the balance problem, the present invention adds a concept of "virtual nodes" to the hash ring. The virtual nodes, the simple straight white understanding is that some replicas of the physical nodes on the hash ring, but not the nodes actually existing, generally determine the number of virtual nodes corresponding to one physical node according to the actual situation. The virtual nodes and the physical nodes are arranged on the hash ring in the same way and are accessed. After the virtual nodes are introduced, every time the physical nodes are added, the corresponding number of virtual nodes on the hash ring needs to be added, and if the physical nodes are deleted, all the corresponding virtual nodes on the hash ring need to be deleted instead.

In FIG. 3, nodes A-L represent 12 virtual nodes, nodes 1-4 represent 4 physical nodes, one for each three virtual nodes. The hash value calculated by the SQL string object corresponds to a certain node on the ring, then a corresponding virtual node is found according to the access mode of the hash ring, and then the physical node is mapped. The more the number of nodes in the cluster is, the better the balance effect of the consistent hash algorithm is, so that after the virtual nodes are introduced, the number of the nodes on the hash ring is greatly increased, and the better the balance effect of the consistent hash algorithm is.

The HASH algorithm has three major advantages: unidirectionality, anti-collision property and uniformity of distribution. The consistency HASH algorithm is more the most commonly adopted algorithm for current distributed load balancing and storage balancing. The realization method of the algorithm comprises the steps of taking a value of a machine node through a HASH function, and mapping the machine node to 0-2 ³² Is in the space of (2). When the data is stored, the HASH value is obtained by first calculating the HASH of the data block, and then the HASH value is corresponding to a certain position in the ring. As shown in fig. 4, the left image is the object1, the hash value key1 is calculated, then a machine Node1 is found clockwise, and the object1 is stored in the Node 1.

If Node2 is down, the data on Node2 will migrate to Node 3. This has the advantage that even if a node is broken, data access by other nodes can still be quickly located using the HASH algorithm, only affecting neighboring nodes.

Consistent hashing is widely used in distributed systems, including distributed databases. It provides a pattern of mapping keys to specific nodes around the ring. However, the existing consistent hashing method cannot guarantee accurate balancing of data. For example: when extended, the newly added node will only share the workload of the neighboring node. The same is true after the node is deleted from the system. This will result in an unbalanced state after deleting or adding a node.

In addition, it is not easy to switch the entire cluster to an equilibrium state, given that the capacity of the nodes in a distributed system is not always balanced at the beginning. In the method designed in fig. 3, in the previous embodiment of the present invention, virtual nodes are presented to map one physical node to multiple virtual nodes to maintain balance. However, the method of fig. 3 does not consider the relationship of virtual nodes belonging to the same physical node. Once a physical node is shut down, a data avalanche situation may occur.

Thus, as shown in FIG. 5, in another embodiment of the present invention, our idea is to create and use a multi-layer consistent hash ring for a distributed database, and includes:

and calculating the weight of the physical node according to the capacity and the rule.

A first hash ring is constructed and different weights are assigned to the physical nodes.

A second hash ring is constructed for certain physical nodes, where the second hash ring has more capacity and the nodes in the second hash ring are virtual nodes.

If there is an imbalance in some virtual nodes, more hash rings are constructed.

Finally, a multi-layer consistent hash ring is constructed.

Multi-layer hashing is used to locate and access data in the ring.

Once a node fails, non-first tier nodes are bypassed directly to speed up rebalancing.

First hash ring initiation

In the hash ring starting process of the embodiment, the weight of each physical node is calculated according to the rule and the capacity, such as the storage size, the storage type, the storage time, the storage risk and the like.

In an embodiment, we use memory size as one major factor for simplicity of illustration only. However, rules and other factors may be specified as desired.

Different virtual nodes are defined according to the weights of different physical nodes, and a multi-layer hash ring is generated. In real data processing, there may also be only a first layer of some physical nodes and no second and further layers.

(II) creating virtual nodes

The capacity size of each virtual node is ensured to be the same during creation of the virtual node. The number of virtual nodes per physical disk depends on the weight. The mapping relationship between the virtual nodes and the physical nodes will be recorded as metadata in the mapping table.

If the number of virtual nodes is not large enough, the balance of consistent hashing is broken. Thus, the lowest threshold of virtual nodes should be considered to have a multi-level index to record the nodes in each level of hash ring.

(III) adjusting the layer number of the hash ring

The adjustment rule is: if the number of virtual nodes per unit weight is reduced to a threshold value, the number of layers of the virtual node is increased by 1.

On the other hand, if the number of virtual nodes per weight is higher than the threshold, the layer of virtual nodes is reduced.

(IV) adjusting virtual nodes for rebalancing

Rebalancing may occur during the addition of new nodes or the deletion/failure of existing nodes in the cluster.

During the addition of a node:

-re-calculating weights of all nodes in the first hash ring;

-constructing a second or more layers of hash loops for the newly added node;

-adding new data to the newly added node and moving some existing data to the newly added node.

During deletion of a node:

-if a node failure is detected, bypassing all virtual nodes of the second or higher layer directly, which prevents a large number of unnecessary node redirections;

all data in the failed node will move to the adjacent hash ring.

And (V) data migration:

for rebalancing, data in the deleted virtual node should be migrated to the neighbor node. If a newly added node will find the same location as the delete in a consistent HASH circle. Migration occurs only between nodes.

Finally, how to find:

by checking the data structure and data stream and looking to see if there are multiple layers of consistent hash rings.

By checking user manuals and system behavior.

In another embodiment of the present invention, a hash ring storage mechanism is disclosed: distributed proxy cache architecture. As shown in fig. 7, a front-end proxy server needs to manage a plurality of background cache server nodes, when the front-end proxy server receives a request from a user, the front-end proxy server can go to the background cache server to acquire the requested data, and the cache server searches the corresponding Web content in the local cache space first, and if the Web content is found, returns to the front-end proxy server; if the corresponding Web content is not found, the content is firstly obtained from the original Web server of the request, then the content is sent to the front-end proxy server, and a copy of the cached object is locally stored at the same time, so that the subsequent request response time is accelerated.

In the distributed proxy cache, all HTTP requests of network users are distributed to each cache after passing through the proxy, and compared with the single cache, the cache space and the cache content are greatly increased due to the existence of a plurality of cache nodes at the background, and the response time of the foreground proxy is also reduced, so that the overall performance is improved. Obviously, if multiple cache nodes in the background all store the same content, the advantages will be greatly reduced, and the improvement of the performance can be ensured only by distributing the space of the whole cache copy to the cache nodes.

For the HTTP request of the user, the most useful information is URL, and the URL is also the basis for searching the content of the subsequent cache node, so that the division of the cache space by using the URL is easy to think, but the URL lengths are quite different, and the difficulty is great if the URL is directly used without processing. For the problem, a Hash value with a fixed length of the URL may be first obtained by using a specific Hash algorithm, and then a series of subsequent operations for searching the cache object may be performed by using the Hash value. The invention provides a Hash mechanism-based space for distributing the whole URL among a plurality of cache nodes, namely the whole cache space. Specifically, the present invention hashes using the MD5 algorithm, which will find a 128-bit value for all entered strings, i.e., all URLs will hash to a [0,2 ] ¹²⁸ -1]Is above the space of (2). Considering the actual number of URLs, this spatial range is sufficient to contain all the content, and a one-to-one mapping with URLs is possible.

To facilitate the implementation of the URL routing mechanism, as shown in FIG. 8, the entire [0,2 ] is taken ¹²⁸ -1]The hash range is seen as a circular structure, [0,2 ¹²⁸ -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole. For each cache node in fig. 7, it should be responsible for a range on the hash ring, i.e., all URLs corresponding to that range. Hashing the IP value of each cache node to obtain a hash value, as shown in the figurenode1 through node4 represent four cache nodes, the range between node1 and node2 being the URL range that cache node2 should be responsible for, and so on. Notably, the range to which node3 corresponds spans the maximum and starts from the minimum, which is an advantage of hash loops, enough to cover all ranges on the loop. When the user's request comes, the front-end proxy will obtain the URL contained in the request, first calculate the hash value of this URL, for a key on the hash ring, as shown in fig. 8, and then find the first node larger than this key in the clockwise direction of the hash ring, which represents a cache server in the real environment, so the front-end proxy will locate the HTTP request on the background cache server that was just found. The whole process achieves the effect of distributing the buffer object space, so that a plurality of buffer nodes in the background can work cooperatively, and the efficiency is improved. Meanwhile, note that the method has a characteristic that when a new cache node5 is added into the hash ring, the original responsible range of each node cannot be changed greatly, the node5 only splits the original range of one node, and the cache space cannot be redistributed when a new cache node is added, so that fluctuation of system load is avoided, and stability of the mechanism is ensured.

Although the storage mechanism of the hash ring has many advantages in cache management, there is a problem that, due to the inherent nature of the MD5 hashing algorithm, it cannot be guaranteed that each real cache node IP can be uniformly distributed over the entire hash ring after hashing. As shown in fig. 8, the responsible ranges of node1 and node5 are very different, and it is apparent that the probability of receiving HTTP requests is high for a large range, and vice versa. In the case of a relatively large number of requests, this may lead to uneven cache load tasks for each cache node, and it is not easy to adjust the load of each cache node, unless each cache node is re-hashed, which is obviously not feasible and is too costly. The improved MD5 algorithm does not help much in solving this problem, because randomness is a big feature of the hashing algorithm and does not enable some uniformly spaced property after IP hashing.

The root cause of the problem is that each cache node corresponds to only one node on the hash ring, so that the thought of virtual nodes is introduced on the hash ring to solve the problem of uneven cache space distribution. That is, a mechanism of multi-level hash ring is adopted, and one physical node corresponds to a plurality of virtual nodes. The idea core of the virtual node is that each cache server can correspond to a plurality of virtual nodes v-nodes besides one node on the hash ring, and all the ranges of the nodes and v-nodes are the cache URL space distributed by the real cache server, so that the routing query mechanism has no change. Three real Cache nodes correspond to node2, node3 on the Cache hash ring, so that there will be multiple copies of each Cache node over the hash ring, which will more evenly divide the entire URL space due to the randomness of the hash. The number of virtual nodes v-nodes generated on the Cache hash ring by each Cache node is calculated according to the service capacity of the virtual nodes v-nodes, the number of virtual nodes corresponding to the Cache with strong service capacity is large, the covered URL range is wide, and therefore the number of user requests distributed by the front-end agent is large; and vice versa. For example, node1 would be mapped to v-node1a, v-node1b, and node2 would be mapped to v-node2a, v-node2b, and node3 would be mapped to v-node3a, v-node3b, v-node3c.

The number of virtual nodes per cache node is closely related to its current service capabilities. In order to enable the distributed agent caching system to autonomously sense the running condition of the system, an autonomous decision module predicts the running state of future caching nodes, comprehensively evaluates the service capacity of the caching nodes, dynamically decides the distribution number of virtual nodes on a hash ring, reduces the number of virtual nodes of the caching nodes with overload, and reassigns the virtual nodes to the caching nodes with stronger current service performance, and the whole process is shown in fig. 9.

In FIG. 9, a self-decision virtual node migration mutual-aid strategy is adopted to perform sensing monitoring on operation parameters of each Cache node in the distributed proxy Cache, judge whether local overheating exists or not to cause other anomalies, select a Cache node set with lower current load as an aid according to a given multi-copy hierarchical management strategy, and share virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring, thereby achieving hot spot migration and avoiding single-point faults. Virtual node migration mutual assistance from decision making is essentially to solve three problems, namely, evaluating the state and service capability of a cache server; secondly, cache nodes with overheated selected states and reduced service capacity migrate out of virtual nodes of the cache nodes, and cache nodes with normal selected states and stronger service capacity fuse the migrated node lists to bear part of request loads of the cache nodes; thirdly, determining the number of the migrated virtual nodes for different cache nodes.

For the evaluation of the state and the service capability of the cache server, the perception monitoring part is comprehensively added to obtainIt represents the current state and service capabilities of the cache node. />The larger the buffer node, the more busy it is represented, the less service capability is available; />The smaller the size, the more free the cache node is represented and the more service capability is available.

Since the background has many cache nodes in the distributed proxy cache, the autonomous decision making portion will receive the status values of many cache nodes each time. In order to evaluate the busyness of each cache node in all the current cache nodes, the average value of the current states of all the nodes is calculated assuming that n cache nodes exist in the backgroundThe calculation formula is shown in the following formula 1:

two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then it is in a normal state. When the cache cluster needs to be self-decision-made to adjust, it is sure that at least one cache node is already in a hot state, which cache nodes in the hot state need to be selected should need to be adjusted, and it is needed to make a decision on which normal state cache nodes virtual nodes on the hot state cache nodes should be migrated.

First, define a set of cache nodes in all inclusive hot states as H, and n ₁ = |h|; the set of cache nodes including all normal states is N, and N ₂ = |n| then they satisfy equation 2:

n ₁ +n ₂ = |h|+|n|=n equation 2

Secondly, arranging all elements in H according to the descending order of state values to obtain a sequence S _H The sequence represents an order of how busy all hot cache nodes are, the earlier elements indicate that the cache nodes are busy, the less service is available, and equation 3 is as follows:

S _H ＝{S _H。1 ，S _H。2 ...S _H。i ...}(S _H。i ≥S _H。(i+1) and i=1, 2, n ₁ ) Equation 3

Next, all elements in N are arranged in ascending order of state values to obtain a sequence S _N The more idle the buffering node is in the sequence, the higher the service capability, and its formula 4 is as follows:

S _N ＝{S _N。1 ，S _N。2 ...S _N。i ..}(S _N。i ≤S _N。(i+1) and i=1, 2, n ₂ ) Equation 4

Obviously, ifFor S only _H The first element in the list, namely the cache node with the largest state value is adjusted, the adjusting effect of each time is only limited to one cache node, and the situation that the difference between the largest state value and the next largest state value in the elements is small is not considered; if pair S _H The virtual nodes of all elements in the network are adjusted, the cost is large, the heat degree of many nodes is small, and the adjustment is not needed. Comprehensively consider from sequence S _H A prefix sub-sequence that starts to be selected by the first element of (c), which will be the most hot state cache node to be adjusted, because their state value is relatively high, already in a busy state, and the service capability has started to drop.

Definition S _H The finally obtained prefix subsequence is S _sub-H The prefix subsequence is initiallyThe method of calculating the sequence is as follows:

(1) If |S _H S is equal to or less than 2 _H -＝S _sub-H I.e. selecting all elements, and ending the process;

(2) If |S _H I > 2, first S _Hol And S is _Ho2 Adding two elements;

(3) Suppose S _Hoi The previous elements have all added S _sub-H In the case of S _Hoi If the following condition is satisfied, S is _Hoi Adding S _sub-H The conditional expression is as in equation 5:

(4) If S _Hoi Satisfying conditional expression 5 in (3), then for S _H The next element S in (a) _Ho(i+1) Repeating the determination in (3); if conditional expression formula 5 is not satisfied, the entire selection process ends.

Equation 5 vs S _H The element interval difference in the sequence is judged, and S is finally selected _H Is a prefix sub-of (a)Sequence, the prefix subsequence S _sub-H From S _H The elements with the smallest interval in front consist of elements which are uniformly distributed to the hottest type of cache nodes, namely the cache node set which needs to be adjusted at this time. S is selected _sub-H Thereafter, it is necessary to go from S _N Determining a corresponding mutually assisted cache node set, wherein the cache nodes in the set are fused with S _sub-H A virtual node of the hot state cache node. Equation 5 has assumed n ₂ ＝|S _N |, for S _sub-H Is a hot state cache node S _sub-Hoi The corresponding mutually-assisted idle cache node is S _Noi Where j=i% n ₂ 。

While the invention has been described in terms of preferred embodiments, it will be understood by those skilled in the art that various changes and modifications can be made without departing from the scope of the invention, and it is intended to cover the invention as defined by the appended claims.

Claims

1. A caching method of a distributed database is characterized in that a hash value with a fixed length of a URL is obtained by a hash algorithm, and then a cache object is searched by using the hash value, and the caching method specifically comprises the following steps:

when the request of the user comes, the front-end agent acquires the URL contained in the request, firstly calculates the hash value of the URL, the hash value is relative to a key value on the hash ring, then searches the first node larger than the key value along the clockwise direction of the hash ring, and the front-end agent positions the request on the background cache server which is just searched; when a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new cache node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added;

the method further comprises: the method comprises the steps of adopting a self-decision virtual node migration mutual-aid strategy, performing sensing monitoring on operation parameters of each Cache node in a distributed proxy Cache, judging whether local overheating occurs or not to cause other anomalies, selecting a Cache node set with low current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with high load and reduced performance on a Cache hash ring;

the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; data migration;

the step of adjusting the number of layers of the hash ring further comprises: if the number of the virtual nodes of the unit weight is reduced to the threshold value, increasing the number of layers of the virtual nodes by 1; if the number of virtual nodes per weight is above the threshold, the layer of virtual nodes is reduced.

2. The method of caching a distributed database according to claim 1, further comprising: hashing by using MD5 algorithm to obtain a 128-bit value for all the inputted character strings: hash all URLs at one [0,2 ] ¹²⁸ -1]One-to-one mapping with URLs is performed; will all [0,2 ] ¹²⁸ -1]The hash range is seen as a circular structure, [0,2 ¹²⁸ -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole.

3. The method of caching a distributed database according to claim 2, further comprising: each cache node is responsible for all URLs corresponding to a section of range on the hash ring, and hash calculation is carried out on the IP value of each cache node to obtain a hash value.

4. The method of caching a distributed database according to claim 1, wherein the self-deciding virtual node migration policy further comprises:

A. evaluating the state and service capability of a cache server;

B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and strong service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;

5. The method of claim 1, wherein,

6. The hash ring optimization method in the distributed database caching method according to any one of claims 1-5, wherein a self-decision virtual node migration mutual strategy is adopted, operation parameters of each Cache node in the distributed proxy Cache are perceptively monitored, whether local overheating and other anomalies exist or not is judged, and a Cache node set with low current load is selected as an auxiliary according to a given multi-copy hierarchical management strategy to share virtual nodes corresponding to Cache nodes with high load and reduced performance on the Cache hash ring;

the self-deciding virtual node migration policy further comprises:

A. evaluating the state and service capability of a cache server;

7. The method of hash ring optimization in a distributed database caching method of claim 6, wherein said step of evaluating cache server status and service capabilities further comprises: evaluating the busyness of each cache node in all current cache nodes, assuming that the background has n cache nodes, and calculating the average value of the current states of all nodesThe calculation formula is as follows:

two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then the state is normal; when the cache cluster is self-decided to be regulated, a hot state cache node is selected, and virtual nodes on the hot state cache node are migrated to the cache node in a normal state.