[go: up one dir, main page]

CN111177154B - Distributed database caching method and hash ring optimization thereof - Google Patents

Distributed database caching method and hash ring optimization thereof Download PDF

Info

Publication number
CN111177154B
CN111177154B CN201911390078.8A CN201911390078A CN111177154B CN 111177154 B CN111177154 B CN 111177154B CN 201911390078 A CN201911390078 A CN 201911390078A CN 111177154 B CN111177154 B CN 111177154B
Authority
CN
China
Prior art keywords
cache
node
nodes
hash
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911390078.8A
Other languages
Chinese (zh)
Other versions
CN111177154A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.
Original Assignee
Zhangxun Yitong Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangxun Yitong Beijing Information Technology Co ltd filed Critical Zhangxun Yitong Beijing Information Technology Co ltd
Priority to CN201911390078.8A priority Critical patent/CN111177154B/en
Publication of CN111177154A publication Critical patent/CN111177154A/en
Application granted granted Critical
Publication of CN111177154B publication Critical patent/CN111177154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a distributed database cache and a hash ring optimization method thereof. A hash algorithm is used for solving a hash value with fixed length of the URL, and then the hash value is used for carrying out a series of follow-up operations for searching the cache object. Hashing to obtain a 128-bit value for all the input strings: hash all URLs at one [0,2 ] 128 ‑1]Is mapped one-to-one with the URL. The method adopts a self-decision virtual node migration mutual strategy to perform sensing monitoring on the operation parameters of each cache node in the distributed proxy cache, judges whether local overheating exists or not to cause other anomalies, selects a cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, shares virtual nodes corresponding to cache nodes with higher load and reduced performance on the hash ring, achieves hot spot migration, and simultaneously avoids single-point faults.

Description

Distributed database caching method and hash ring optimization thereof
Technical Field
The invention relates to the technical field of computer application, in particular to a distributed database caching method and a hash ring optimization method thereof.
Background
In a database cluster, the addition or deletion of physical nodes is the most basic function of the cluster management system. If the conventional hash algorithms such as hash modulus and random number are adopted, a large amount of original caches are re-established under the condition of adding or deleting physical nodes, so that serious performance expense is brought to the whole database cluster system, even the normal operation of a service system is influenced, and the monotonicity principle is also seriously violated.
The consistent hashing algorithm is presented to ensure that the algorithm is monotonic, i.e., when a physical node is removed or added, it has very little effect on the existing cache mapping. Moreover, the more the number of physical nodes in the cluster, the better the consistent hashing algorithm guarantees the monotonicity effect. The principle of the consistent hashing algorithm is as follows:
(1) A hash ring and physical nodes on the ring are determined.
A range of hash values is first determined, for example: (-2 16 ,2 16 ). Consider all hash values of this range as one increment clockwise and beginning and endThe connected rings are called hash rings.
A hash ring is a virtual data structure that does not exist in practice. Suppose that six nodes with a-F are distributed over the hash ring after hash computation. A schematic diagram of the hash ring is shown in fig. 1.
(2) Setting access mode of physical node
If there is a SQL query request and the SQL sentence string object is used as the KEY value of the hash algorithm, the calculated hash value is mapped to a certain point in the hash ring, if the point does not correspond to a certain physical node, a clockwise search is performed along the hash ring (i.e. the physical node with the hash value larger than the calculated hash value) until the mapped physical node is found for the first time, the node is the determined target node (i.e. the minimum node larger than the hash value of the target node), and if the calculated hash value exceeds 2 16 This range, however, still does not find a node, matches to the first physical node (because of end-to-end, it can also be considered a straight-through clockwise lookup). If the calculated hash value is between B and C, the matched physical node is the C node. If the value of the hash value is greater than F, then the A physical node is hit.
(3) Processing to augment nodes
Suppose that a G physical node is to be added, as indicated by the grey circular box in fig. 2.
The hash value of the physical node is calculated first, and the value is corresponding to a certain point of the hash ring. Meanwhile, the access mode of the hash ring is unchanged, so that the mapping relation is changed to be hash values distributed between the node D and the node G, and the hash values are mapped to the node G instead of the original node E after the node G is increased. The mapping relation of the hash value originally distributed between the node G and the node E is not changed or mapped to the node E. The result of this is that only a small fraction of cache misses after adding a physical node need be re-established. Although the problem of the change of the mapping relationship caused by the increase of the nodes still exists after the consistent hash algorithm is applied, the situation of the change of the mapping relationship is reduced to be as low as possible compared with the traditional hash modulo manner.
(4) Processing of deleted nodes
Assuming that node G in fig. 2 needs to be deleted, the hash-change will revert to the state of the original hash ring. At this time, the cache mapped to the hash value on the node G will not necessarily be hit, in which case this part of the hash value will be mapped clockwise to the E node, and at this time, the only cache that needs to be re-established is the cache of the hash value distributed between the node E and the node G. Therefore, the amount of re-established cache required to delete nodes on a hash ring is also greatly reduced over conventional hash modulo methods.
The consistent hashing algorithm is a load balancing algorithm that is currently in very wide use. In a dynamically changing cache environment, the consistent hash algorithm well meets two factors for judging whether the hash algorithm is good or bad, namely balance and monotonicity.
(1) Balance of: balance means that all the buffer space should be fully utilized, so a good hashing algorithm should be able to make the distribution of the hash results as uniform as possible.
(2) Monotonicity: monotonicity means that after the original system cache is established stably, new cache nodes are added in the system, and at this time, the established cache should be mapped into the newly added cache space, but not into the original cache space.
However, after the analysis and research in the background art, the present invention discovers that when determining to use the consistent hash algorithm, the hash algorithm used in the consistent hash must be determined, which is also a very critical step, and determines the uniformity of node distribution on the ring, and also factors such as algorithm efficiency.
Disclosure of Invention
The invention aims to solve the technical problem of providing a distributed database caching method and hash ring optimization thereof. The whole process of the method achieves the effect of distributing the buffer object space, so that a plurality of buffer nodes in the background can work cooperatively, and the efficiency is improved. When a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added, so that the fluctuation of the system load is avoided, and the stability of the mechanism is ensured.
In order to solve the technical problems, the invention provides a caching method of a distributed database, which comprises the steps of firstly solving a hash value with a fixed length of a URL by using a hash algorithm, and then carrying out a series of follow-up operations of searching a cache object by using the hash value.
Preferably, the method further comprises: hashing by using MD5 algorithm to obtain a 128-bit value for all the inputted character strings: hash all URLs at one [0,2 ] 128 -1]One-to-one mapping with URLs is performed; will all [0,2 ] 128 -1]The hash range is seen as a circular structure, [0,2 128 -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole.
Preferably, the method further comprises: each cache node is responsible for all URLs corresponding to a section of range on the hash ring, and hash calculation is carried out on the IP value of each cache node to obtain a hash value.
Preferably, the method further comprises: when the request of the user comes, the front-end agent acquires the URL contained in the request, firstly calculates the hash value of the URL, the hash value is used for a key value on the hash ring, then searches the first node larger than the key value along the clockwise direction of the hash ring, and the front-end agent positions the HTTP request on the background cache server which is just searched; when a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new cache node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added.
Preferably, the method further comprises: the self-decision virtual node migration mutual-aid strategy is adopted, operation parameters of all Cache nodes in the distributed proxy Cache are monitored in a sensing mode, whether local overheating exists or not is judged, a Cache node set with lower current load is selected as an aid according to a given multi-copy hierarchical management strategy, and virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring are shared.
Preferably, the self-determined virtual node migration policy further includes:
A. evaluating the state and service capability of a cache server;
B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and stronger service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;
C. for different cache nodes, the number of virtual nodes to migrate is determined.
Preferably, the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; and (5) data migration.
Preferably, the step of adjusting the number of layers of the hash ring further includes: if the number of the virtual nodes of the unit weight is reduced to the threshold value, increasing the number of layers of the virtual nodes by 1; if the number of virtual nodes of the unit weight is higher than the threshold value, reducing the layers of the virtual nodes;
the adjusting the virtual node for rebalancing further comprises: rebalancing occurs during the addition of new nodes or the deletion/failure of existing nodes in the cluster;
the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.
In order to solve the technical problems, the invention also provides a hash ring optimization method in the distributed database caching method, which adopts a self-decision virtual node migration mutual strategy, carries out perception monitoring on the operation parameters of each Cache node in the distributed proxy Cache, judges whether local overheating exists or not and other anomalies exist, selects a Cache node set with lower current load as an auxiliary according to a given multi-copy hierarchical management strategy, and shares virtual nodes corresponding to Cache nodes with higher load and reduced performance on the Cache hash ring;
preferably, the self-determined virtual node migration policy further includes:
A. evaluating the state and service capability of a cache server;
B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and stronger service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;
C. for different cache nodes, the number of virtual nodes to migrate is determined.
Preferably, the step of evaluating the cache server status and the service capability further comprises: evaluating the busyness of each cache node in all current cache nodes, assuming that the background has n cache nodes, and calculating the average value of the current states of all nodesThe calculation formula is as follows:
two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then the state is normal; when the cache cluster needs self-decision adjustment, it is sure that at least one cache node is already in a hot state, which cache node in the hot state needs to be selected should need adjustment, and it is needed to decide to which normal state the virtual node on the hot state cache node should be migratedThe cache node in the state.
The beneficial effects of the invention include: the self-decision virtual node migration mutual-aid strategy is adopted, the operation parameters of each Cache node in the distributed agent Cache are perceptively monitored, whether local overheating occurs or not is judged to be abnormal, a Cache node set with lower current load is selected as an aid according to a given multi-copy hierarchical management strategy, virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring are shared, hot spot migration is achieved, and single-point faults are avoided. By dynamic adaptation, the availability will be higher.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the prior art, and it is obvious that the drawings in the following description are only a part of the embodiments or the prior art, and other similar or related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a hash ring according to the background of the invention;
FIG. 2 is a diagram of a hash ring addition node according to the background of the invention;
FIG. 3 is a diagram illustrating a correspondence between virtual nodes and physical nodes in a hash ring according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating node data migration in a hash ring according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a multi-layer consistent hash ring according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of creating virtual nodes according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a distributed proxy cache according to an embodiment of the present invention;
FIG. 8 is a diagram of URL space allocation based on hash ring according to an embodiment of the present invention;
fig. 9 is a migration interaction diagram of a virtual node according to an embodiment of the present invention.
Detailed Description
The present invention is described in detail below with reference to examples. The present invention will be described in more detail with reference to the following examples, but the present invention is not limited to these examples.
If only the existing physical nodes are distributed on the hash ring, the situation that the physical nodes are unevenly distributed on the hash change is likely to occur, and the situation has great influence on whether the subsequent caches are evenly distributed, so that the load imbalance result is caused.
In one embodiment of the present invention, to solve the balance problem, the present invention adds a concept of "virtual nodes" to the hash ring. The virtual nodes, the simple straight white understanding is that some replicas of the physical nodes on the hash ring, but not the nodes actually existing, generally determine the number of virtual nodes corresponding to one physical node according to the actual situation. The virtual nodes and the physical nodes are arranged on the hash ring in the same way and are accessed. After the virtual nodes are introduced, every time the physical nodes are added, the corresponding number of virtual nodes on the hash ring needs to be added, and if the physical nodes are deleted, all the corresponding virtual nodes on the hash ring need to be deleted instead.
In FIG. 3, nodes A-L represent 12 virtual nodes, nodes 1-4 represent 4 physical nodes, one for each three virtual nodes. The hash value calculated by the SQL string object corresponds to a certain node on the ring, then a corresponding virtual node is found according to the access mode of the hash ring, and then the physical node is mapped. The more the number of nodes in the cluster is, the better the balance effect of the consistent hash algorithm is, so that after the virtual nodes are introduced, the number of the nodes on the hash ring is greatly increased, and the better the balance effect of the consistent hash algorithm is.
The HASH algorithm has three major advantages: unidirectionality, anti-collision property and uniformity of distribution. The consistency HASH algorithm is more the most commonly adopted algorithm for current distributed load balancing and storage balancing. The realization method of the algorithm comprises the steps of taking a value of a machine node through a HASH function, and mapping the machine node to 0-2 32 Is in the space of (2). When the data is stored, the HASH value is obtained by first calculating the HASH of the data block, and then the HASH value is corresponding to a certain position in the ring. As shown in fig. 4, the left image is the object1, the hash value key1 is calculated, then a machine Node1 is found clockwise, and the object1 is stored in the Node 1.
If Node2 is down, the data on Node2 will migrate to Node 3. This has the advantage that even if a node is broken, data access by other nodes can still be quickly located using the HASH algorithm, only affecting neighboring nodes.
Consistent hashing is widely used in distributed systems, including distributed databases. It provides a pattern of mapping keys to specific nodes around the ring. However, the existing consistent hashing method cannot guarantee accurate balancing of data. For example: when extended, the newly added node will only share the workload of the neighboring node. The same is true after the node is deleted from the system. This will result in an unbalanced state after deleting or adding a node.
In addition, it is not easy to switch the entire cluster to an equilibrium state, given that the capacity of the nodes in a distributed system is not always balanced at the beginning. In the method designed in fig. 3, in the previous embodiment of the present invention, virtual nodes are presented to map one physical node to multiple virtual nodes to maintain balance. However, the method of fig. 3 does not consider the relationship of virtual nodes belonging to the same physical node. Once a physical node is shut down, a data avalanche situation may occur.
Thus, as shown in FIG. 5, in another embodiment of the present invention, our idea is to create and use a multi-layer consistent hash ring for a distributed database, and includes:
and calculating the weight of the physical node according to the capacity and the rule.
A first hash ring is constructed and different weights are assigned to the physical nodes.
A second hash ring is constructed for certain physical nodes, where the second hash ring has more capacity and the nodes in the second hash ring are virtual nodes.
If there is an imbalance in some virtual nodes, more hash rings are constructed.
Finally, a multi-layer consistent hash ring is constructed.
Multi-layer hashing is used to locate and access data in the ring.
Once a node fails, non-first tier nodes are bypassed directly to speed up rebalancing.
First hash ring initiation
In the hash ring starting process of the embodiment, the weight of each physical node is calculated according to the rule and the capacity, such as the storage size, the storage type, the storage time, the storage risk and the like.
In an embodiment, we use memory size as one major factor for simplicity of illustration only. However, rules and other factors may be specified as desired.
Different virtual nodes are defined according to the weights of different physical nodes, and a multi-layer hash ring is generated. In real data processing, there may also be only a first layer of some physical nodes and no second and further layers.
(II) creating virtual nodes
The capacity size of each virtual node is ensured to be the same during creation of the virtual node. The number of virtual nodes per physical disk depends on the weight. The mapping relationship between the virtual nodes and the physical nodes will be recorded as metadata in the mapping table.
If the number of virtual nodes is not large enough, the balance of consistent hashing is broken. Thus, the lowest threshold of virtual nodes should be considered to have a multi-level index to record the nodes in each level of hash ring.
(III) adjusting the layer number of the hash ring
The adjustment rule is: if the number of virtual nodes per unit weight is reduced to a threshold value, the number of layers of the virtual node is increased by 1.
On the other hand, if the number of virtual nodes per weight is higher than the threshold, the layer of virtual nodes is reduced.
(IV) adjusting virtual nodes for rebalancing
Rebalancing may occur during the addition of new nodes or the deletion/failure of existing nodes in the cluster.
During the addition of a node:
-re-calculating weights of all nodes in the first hash ring;
-constructing a second or more layers of hash loops for the newly added node;
-adding new data to the newly added node and moving some existing data to the newly added node.
During deletion of a node:
-if a node failure is detected, bypassing all virtual nodes of the second or higher layer directly, which prevents a large number of unnecessary node redirections;
all data in the failed node will move to the adjacent hash ring.
And (V) data migration:
for rebalancing, data in the deleted virtual node should be migrated to the neighbor node. If a newly added node will find the same location as the delete in a consistent HASH circle. Migration occurs only between nodes.
Finally, how to find:
by checking the data structure and data stream and looking to see if there are multiple layers of consistent hash rings.
By checking user manuals and system behavior.
In another embodiment of the present invention, a hash ring storage mechanism is disclosed: distributed proxy cache architecture. As shown in fig. 7, a front-end proxy server needs to manage a plurality of background cache server nodes, when the front-end proxy server receives a request from a user, the front-end proxy server can go to the background cache server to acquire the requested data, and the cache server searches the corresponding Web content in the local cache space first, and if the Web content is found, returns to the front-end proxy server; if the corresponding Web content is not found, the content is firstly obtained from the original Web server of the request, then the content is sent to the front-end proxy server, and a copy of the cached object is locally stored at the same time, so that the subsequent request response time is accelerated.
In the distributed proxy cache, all HTTP requests of network users are distributed to each cache after passing through the proxy, and compared with the single cache, the cache space and the cache content are greatly increased due to the existence of a plurality of cache nodes at the background, and the response time of the foreground proxy is also reduced, so that the overall performance is improved. Obviously, if multiple cache nodes in the background all store the same content, the advantages will be greatly reduced, and the improvement of the performance can be ensured only by distributing the space of the whole cache copy to the cache nodes.
For the HTTP request of the user, the most useful information is URL, and the URL is also the basis for searching the content of the subsequent cache node, so that the division of the cache space by using the URL is easy to think, but the URL lengths are quite different, and the difficulty is great if the URL is directly used without processing. For the problem, a Hash value with a fixed length of the URL may be first obtained by using a specific Hash algorithm, and then a series of subsequent operations for searching the cache object may be performed by using the Hash value. The invention provides a Hash mechanism-based space for distributing the whole URL among a plurality of cache nodes, namely the whole cache space. Specifically, the present invention hashes using the MD5 algorithm, which will find a 128-bit value for all entered strings, i.e., all URLs will hash to a [0,2 ] 128 -1]Is above the space of (2). Considering the actual number of URLs, this spatial range is sufficient to contain all the content, and a one-to-one mapping with URLs is possible.
To facilitate the implementation of the URL routing mechanism, as shown in FIG. 8, the entire [0,2 ] is taken 128 -1]The hash range is seen as a circular structure, [0,2 128 -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole. For each cache node in fig. 7, it should be responsible for a range on the hash ring, i.e., all URLs corresponding to that range. Hashing the IP value of each cache node to obtain a hash value, as shown in the figurenode1 through node4 represent four cache nodes, the range between node1 and node2 being the URL range that cache node2 should be responsible for, and so on. Notably, the range to which node3 corresponds spans the maximum and starts from the minimum, which is an advantage of hash loops, enough to cover all ranges on the loop. When the user's request comes, the front-end proxy will obtain the URL contained in the request, first calculate the hash value of this URL, for a key on the hash ring, as shown in fig. 8, and then find the first node larger than this key in the clockwise direction of the hash ring, which represents a cache server in the real environment, so the front-end proxy will locate the HTTP request on the background cache server that was just found. The whole process achieves the effect of distributing the buffer object space, so that a plurality of buffer nodes in the background can work cooperatively, and the efficiency is improved. Meanwhile, note that the method has a characteristic that when a new cache node5 is added into the hash ring, the original responsible range of each node cannot be changed greatly, the node5 only splits the original range of one node, and the cache space cannot be redistributed when a new cache node is added, so that fluctuation of system load is avoided, and stability of the mechanism is ensured.
Although the storage mechanism of the hash ring has many advantages in cache management, there is a problem that, due to the inherent nature of the MD5 hashing algorithm, it cannot be guaranteed that each real cache node IP can be uniformly distributed over the entire hash ring after hashing. As shown in fig. 8, the responsible ranges of node1 and node5 are very different, and it is apparent that the probability of receiving HTTP requests is high for a large range, and vice versa. In the case of a relatively large number of requests, this may lead to uneven cache load tasks for each cache node, and it is not easy to adjust the load of each cache node, unless each cache node is re-hashed, which is obviously not feasible and is too costly. The improved MD5 algorithm does not help much in solving this problem, because randomness is a big feature of the hashing algorithm and does not enable some uniformly spaced property after IP hashing.
The root cause of the problem is that each cache node corresponds to only one node on the hash ring, so that the thought of virtual nodes is introduced on the hash ring to solve the problem of uneven cache space distribution. That is, a mechanism of multi-level hash ring is adopted, and one physical node corresponds to a plurality of virtual nodes. The idea core of the virtual node is that each cache server can correspond to a plurality of virtual nodes v-nodes besides one node on the hash ring, and all the ranges of the nodes and v-nodes are the cache URL space distributed by the real cache server, so that the routing query mechanism has no change. Three real Cache nodes correspond to node2, node3 on the Cache hash ring, so that there will be multiple copies of each Cache node over the hash ring, which will more evenly divide the entire URL space due to the randomness of the hash. The number of virtual nodes v-nodes generated on the Cache hash ring by each Cache node is calculated according to the service capacity of the virtual nodes v-nodes, the number of virtual nodes corresponding to the Cache with strong service capacity is large, the covered URL range is wide, and therefore the number of user requests distributed by the front-end agent is large; and vice versa. For example, node1 would be mapped to v-node1a, v-node1b, and node2 would be mapped to v-node2a, v-node2b, and node3 would be mapped to v-node3a, v-node3b, v-node3c.
The number of virtual nodes per cache node is closely related to its current service capabilities. In order to enable the distributed agent caching system to autonomously sense the running condition of the system, an autonomous decision module predicts the running state of future caching nodes, comprehensively evaluates the service capacity of the caching nodes, dynamically decides the distribution number of virtual nodes on a hash ring, reduces the number of virtual nodes of the caching nodes with overload, and reassigns the virtual nodes to the caching nodes with stronger current service performance, and the whole process is shown in fig. 9.
In FIG. 9, a self-decision virtual node migration mutual-aid strategy is adopted to perform sensing monitoring on operation parameters of each Cache node in the distributed proxy Cache, judge whether local overheating exists or not to cause other anomalies, select a Cache node set with lower current load as an aid according to a given multi-copy hierarchical management strategy, and share virtual nodes corresponding to Cache nodes with higher load and reduced performance on a Cache hash ring, thereby achieving hot spot migration and avoiding single-point faults. Virtual node migration mutual assistance from decision making is essentially to solve three problems, namely, evaluating the state and service capability of a cache server; secondly, cache nodes with overheated selected states and reduced service capacity migrate out of virtual nodes of the cache nodes, and cache nodes with normal selected states and stronger service capacity fuse the migrated node lists to bear part of request loads of the cache nodes; thirdly, determining the number of the migrated virtual nodes for different cache nodes.
For the evaluation of the state and the service capability of the cache server, the perception monitoring part is comprehensively added to obtainIt represents the current state and service capabilities of the cache node. />The larger the buffer node, the more busy it is represented, the less service capability is available; />The smaller the size, the more free the cache node is represented and the more service capability is available.
Since the background has many cache nodes in the distributed proxy cache, the autonomous decision making portion will receive the status values of many cache nodes each time. In order to evaluate the busyness of each cache node in all the current cache nodes, the average value of the current states of all the nodes is calculated assuming that n cache nodes exist in the backgroundThe calculation formula is shown in the following formula 1:
two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then it is in a normal state. When the cache cluster needs to be self-decision-made to adjust, it is sure that at least one cache node is already in a hot state, which cache nodes in the hot state need to be selected should need to be adjusted, and it is needed to make a decision on which normal state cache nodes virtual nodes on the hot state cache nodes should be migrated.
First, define a set of cache nodes in all inclusive hot states as H, and n 1 = |h|; the set of cache nodes including all normal states is N, and N 2 = |n| then they satisfy equation 2:
n 1 +n 2 = |h|+|n|=n equation 2
Secondly, arranging all elements in H according to the descending order of state values to obtain a sequence S H The sequence represents an order of how busy all hot cache nodes are, the earlier elements indicate that the cache nodes are busy, the less service is available, and equation 3 is as follows:
S H ={S H。1 ,S H。2 ...S H。i ...}(S H。i ≥S H。(i+1) and i=1, 2, n 1 ) Equation 3
Next, all elements in N are arranged in ascending order of state values to obtain a sequence S N The more idle the buffering node is in the sequence, the higher the service capability, and its formula 4 is as follows:
S N ={S N。1 ,S N。2 ...S N。i ..}(S N。i ≤S N。(i+1) and i=1, 2, n 2 ) Equation 4
Obviously, ifFor S only H The first element in the list, namely the cache node with the largest state value is adjusted, the adjusting effect of each time is only limited to one cache node, and the situation that the difference between the largest state value and the next largest state value in the elements is small is not considered; if pair S H The virtual nodes of all elements in the network are adjusted, the cost is large, the heat degree of many nodes is small, and the adjustment is not needed. Comprehensively consider from sequence S H A prefix sub-sequence that starts to be selected by the first element of (c), which will be the most hot state cache node to be adjusted, because their state value is relatively high, already in a busy state, and the service capability has started to drop.
Definition S H The finally obtained prefix subsequence is S sub-H The prefix subsequence is initiallyThe method of calculating the sequence is as follows:
(1) If |S H S is equal to or less than 2 H -=S sub-H I.e. selecting all elements, and ending the process;
(2) If |S H I > 2, first S Hol And S is Ho2 Adding two elements;
(3) Suppose S Hoi The previous elements have all added S sub-H In the case of S Hoi If the following condition is satisfied, S is Hoi Adding S sub-H The conditional expression is as in equation 5:
(4) If S Hoi Satisfying conditional expression 5 in (3), then for S H The next element S in (a) Ho(i+1) Repeating the determination in (3); if conditional expression formula 5 is not satisfied, the entire selection process ends.
Equation 5 vs S H The element interval difference in the sequence is judged, and S is finally selected H Is a prefix sub-of (a)Sequence, the prefix subsequence S sub-H From S H The elements with the smallest interval in front consist of elements which are uniformly distributed to the hottest type of cache nodes, namely the cache node set which needs to be adjusted at this time. S is selected sub-H Thereafter, it is necessary to go from S N Determining a corresponding mutually assisted cache node set, wherein the cache nodes in the set are fused with S sub-H A virtual node of the hot state cache node. Equation 5 has assumed n 2 =|S N |, for S sub-H Is a hot state cache node S sub-Hoi The corresponding mutually-assisted idle cache node is S Noi Where j=i% n 2
While the invention has been described in terms of preferred embodiments, it will be understood by those skilled in the art that various changes and modifications can be made without departing from the scope of the invention, and it is intended to cover the invention as defined by the appended claims.

Claims (7)

1. A caching method of a distributed database is characterized in that a hash value with a fixed length of a URL is obtained by a hash algorithm, and then a cache object is searched by using the hash value, and the caching method specifically comprises the following steps:
when the request of the user comes, the front-end agent acquires the URL contained in the request, firstly calculates the hash value of the URL, the hash value is relative to a key value on the hash ring, then searches the first node larger than the key value along the clockwise direction of the hash ring, and the front-end agent positions the request on the background cache server which is just searched; when a new cache node is added into the hash ring, the original responsible range of each node does not change greatly, the new cache node only splits the original range of one node, and the cache space is not redistributed when the new cache node is added;
the method further comprises: the method comprises the steps of adopting a self-decision virtual node migration mutual-aid strategy, performing sensing monitoring on operation parameters of each Cache node in a distributed proxy Cache, judging whether local overheating occurs or not to cause other anomalies, selecting a Cache node set with low current load as an aid according to a given multi-copy hierarchical management strategy, and sharing virtual nodes corresponding to Cache nodes with high load and reduced performance on a Cache hash ring;
the method further comprises: adjusting the layer number of the hash ring; adjusting the virtual nodes to rebalance; data migration;
the step of adjusting the number of layers of the hash ring further comprises: if the number of the virtual nodes of the unit weight is reduced to the threshold value, increasing the number of layers of the virtual nodes by 1; if the number of virtual nodes per weight is above the threshold, the layer of virtual nodes is reduced.
2. The method of caching a distributed database according to claim 1, further comprising: hashing by using MD5 algorithm to obtain a 128-bit value for all the inputted character strings: hash all URLs at one [0,2 ] 128 -1]One-to-one mapping with URLs is performed; will all [0,2 ] 128 -1]The hash range is seen as a circular structure, [0,2 128 -1]All hash values within the range are arranged in a clockwise direction in order from large to small, and are uniformly distributed on the hash ring as a whole.
3. The method of caching a distributed database according to claim 2, further comprising: each cache node is responsible for all URLs corresponding to a section of range on the hash ring, and hash calculation is carried out on the IP value of each cache node to obtain a hash value.
4. The method of caching a distributed database according to claim 1, wherein the self-deciding virtual node migration policy further comprises:
A. evaluating the state and service capability of a cache server;
B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and strong service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;
C. for different cache nodes, the number of virtual nodes to migrate is determined.
5. The method of claim 1, wherein,
the adjusting the virtual node for rebalancing further comprises: rebalancing occurs during the addition of new nodes or the deletion/failure of existing nodes in the cluster;
the step of data migration further comprises: and migrating the data in the deleted virtual node to the neighbor node.
6. The hash ring optimization method in the distributed database caching method according to any one of claims 1-5, wherein a self-decision virtual node migration mutual strategy is adopted, operation parameters of each Cache node in the distributed proxy Cache are perceptively monitored, whether local overheating and other anomalies exist or not is judged, and a Cache node set with low current load is selected as an auxiliary according to a given multi-copy hierarchical management strategy to share virtual nodes corresponding to Cache nodes with high load and reduced performance on the Cache hash ring;
the self-deciding virtual node migration policy further comprises:
A. evaluating the state and service capability of a cache server;
B. the method is characterized in that cache nodes with overheated selected states and reduced service capacity are adopted, virtual nodes of the cache nodes are migrated, meanwhile, cache nodes with normal selected states and strong service capacity are selected, the migrated node lists are fused, and a part of request loads of the cache nodes are borne;
C. for different cache nodes, the number of virtual nodes to migrate is determined.
7. The method of hash ring optimization in a distributed database caching method of claim 6, wherein said step of evaluating cache server status and service capabilities further comprises: evaluating the busyness of each cache node in all current cache nodes, assuming that the background has n cache nodes, and calculating the average value of the current states of all nodesThe calculation formula is as follows:
two states of the cache node are defined: (1) For cache node i, ifThen it is hot; (2) For cache node i, if->Then the state is normal; when the cache cluster is self-decided to be regulated, a hot state cache node is selected, and virtual nodes on the hot state cache node are migrated to the cache node in a normal state.
CN201911390078.8A 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof Active CN111177154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911390078.8A CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911390078.8A CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Publications (2)

Publication Number Publication Date
CN111177154A CN111177154A (en) 2020-05-19
CN111177154B true CN111177154B (en) 2023-07-25

Family

ID=70650459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911390078.8A Active CN111177154B (en) 2019-12-27 2019-12-27 Distributed database caching method and hash ring optimization thereof

Country Status (1)

Country Link
CN (1) CN111177154B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917853A (en) * 2020-07-24 2020-11-10 山东云缦智能科技有限公司 Optimization method for distributed cache scaling of content distribution network
CN112380288A (en) * 2020-11-16 2021-02-19 林亮 Decentralized distributed data processing system
CN113507522A (en) * 2021-07-08 2021-10-15 上海七牛信息技术有限公司 Method and system for improving hit rate of PCDN (Primary Contourlet distribution) network requests
CN113689103B (en) * 2021-08-18 2023-11-24 国电南瑞南京控制系统有限公司 Adaptive load balancing using offload intelligent scheduling management method, device and system
CN114629908B (en) * 2022-03-28 2023-10-13 浙江邦盛科技股份有限公司 Data slicing method based on hardware resource density of server node
CN115297131B (en) * 2022-08-01 2023-05-26 东北大学 Sensitive data distributed storage method based on consistent hash
CN115757335B (en) * 2022-11-07 2025-09-19 深圳开鸿数字产业发展有限公司 Database data migration method, device, equipment and storage medium
CN115981848B (en) * 2022-12-17 2024-05-28 上海律保科技有限公司 Memory database fragment adjustment method and equipment
CN118264535B (en) * 2024-03-21 2025-03-25 深圳市智慧城市大数据中心有限公司 A distributed cache method based on hash algorithm
CN118227673B (en) * 2024-05-22 2024-08-02 山东港口科技集团烟台有限公司 Method for caching and processing data of Internet of things

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
CN107197035A (en) * 2017-06-21 2017-09-22 中国民航大学 A kind of compatibility dynamic load balancing method based on uniformity hash algorithm
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN109218438A (en) * 2018-10-12 2019-01-15 山东科技大学 A kind of performance optimization method of distributed cache server cluster

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9332083B2 (en) * 2012-11-21 2016-05-03 International Business Machines Corporation High performance, distributed, shared, data grid for distributed Java virtual machine runtime artifacts
CN104158755B (en) * 2014-07-30 2017-12-05 华为技术有限公司 The methods, devices and systems of transmitting message

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN107197035A (en) * 2017-06-21 2017-09-22 中国民航大学 A kind of compatibility dynamic load balancing method based on uniformity hash algorithm
CN109218438A (en) * 2018-10-12 2019-01-15 山东科技大学 A kind of performance optimization method of distributed cache server cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
巴子言.基于虚节点的一致性哈希算法的优化.《软件》.2015,正文1-2部分. *

Also Published As

Publication number Publication date
CN111177154A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN111177154B (en) Distributed database caching method and hash ring optimization thereof
CN111159193B (en) Multi-layer consistent hash ring and application thereof in creating distributed database
US11431791B2 (en) Content delivery method, virtual server management method, cloud platform, and system
KR101928529B1 (en) Code Distributed Hash Table based MapReduce System and Method
US8495013B2 (en) Distributed storage system and method for storing objects based on locations
US11140220B1 (en) Consistent hashing using the power of k choices in server placement
CN105657064B (en) Swift load balancing method based on virtual node storage optimization
Xu et al. Drop: Facilitating distributed metadata management in eb-scale storage systems
CN101645919B (en) A popularity-based copy level calculation method and its copy placement method
Shao et al. An efficient load-balancing mechanism for heterogeneous range-queriable cloud storage
JP2017033337A (en) Virtual machine arrangement device and virtual machine arrangement method
CN116991580A (en) Distributed database system load balancing method and device
Yang et al. A reinforcement learning based data storage and traffic management in information-centric data center networks
Xu et al. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems
KR101690944B1 (en) Method and apparatus for managing distributed cache in consideration of load distribution in heterogeneous computing environment
CN111917853A (en) Optimization method for distributed cache scaling of content distribution network
CN115981848B (en) Memory database fragment adjustment method and equipment
JP2008140388A (en) Super-peer with load balancing function in hierarchical peer-to-peer system and method of operating the super-peer
Rahmani et al. A comparative study of replication schemes for structured P2P networks
Aznar-Poveda et al. SDKV: A smart and distributed key-value store for the edge-cloud continuum
US11310309B1 (en) Arc jump: per-key selection of an alternative server when implemented bounded loads
March et al. Multi-attribute range queries on read-only DHT
CN108965387B (en) Balancing method and system for improving survivability of P2P data storage
Pan et al. FCAN: Flash crowds alleviation network using adaptive P2P overlay of cache proxies
KR20220075595A (en) Method for configuration of semi-managed dht based on ndn and system therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Request for anonymity

Inventor before: Request for anonymity

CB03 Change of inventor or designer information
TR01 Transfer of patent right

Effective date of registration: 20240423

Address after: Building 7, No. 7 Taiping East Road (South), Mafang Town, Pinggu District, Beijing, 101200

Patentee after: Beijing Jingu Zhitong Green Chain Technology Co.,Ltd.

Country or region after: China

Address before: 3009-315, 3rd Floor, Building B, Building 1, Yard 2, Yongcheng North Road, Haidian District, Beijing, 100089

Patentee before: Zhangxun Yitong (Beijing) Information Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right