CN110413617B

CN110413617B - Method for dynamically adjusting hash table group according to size of data volume

Info

Publication number: CN110413617B
Application number: CN201910692326.8A
Authority: CN
Inventors: 高银康; 王俊昌; 朱彤; 张阳阳
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-08-10
Anticipated expiration: 2039-07-30
Also published as: CN110413617A

Abstract

The invention discloses a method for dynamically adjusting a hash table group according to the size of the data amount. The method performs corresponding element query operation, element deletion operation or element insertion operation in the hash table group according to user instructions, and records the current The number of inserted elements in the hash table group; when performing element insertion, if the current hash table group is full, a new hash table will be added after the last hash table to complete the insertion; When the number of inserted elements in the current hash table group is lower than the threshold, the table delete operation is performed. The present invention can dynamically adjust the hash table group and can better deal with the hash table of the hash conflict, and also can effectively solve the common problems of the hash table being full of new elements and unable to enter and the waste of space due to less elements.

Description

Method for dynamically adjusting hash table group according to size of data volume

Technical Field

The present invention relates to a method for adjusting a hash table set, and more particularly, to a method for dynamically adjusting a hash table set according to the size of data.

Background

The non-relational database is a novel data storage system which is different from the traditional relational database, is non-relational, distributed, does not strictly follow the ACID principle and has the advantages of simple extension, high concurrency, high stability and the like, and key-value storage is concerned and becomes a substitute term of NoSQL. NoSQL is widely applied to various large Internet platforms such as Sina microblogs, Taobao, visual Chinese websites and the like at present. The database mainly uses a data structure directly accessed according to a Key value (Key-value), and the Key value is mapped to a position in a hash table to access a record so as to accelerate the searching speed. In an internet platform which needs to process mass data, due to the fact that requirements for timeliness, reliability and expansibility of a database are high, a hash table designed for the internet platform also has the characteristics of high parallel strength and high flexibility, and particularly when the original hash table generates hash conflicts, the problem of how the hash table deals with processing becomes a difficult problem which needs to be solved urgently in the field.

Generally, an effective method for solving the hash collision is to use a "cuckoo hash", which is mainly based on the principle that the same element is hashed into different tables by using a plurality of hash functions, so as to solve the hash collision. However, when the data volume is large and the data volume changes greatly, the cuckoo hash may have a series of problems that the hash table is full, new elements cannot be stored, less space of the elements is wasted, or the elements are replaced and enter dead cycles, and the like, which all affect the normal operation of the hash table, reduce the operation performance of the whole system, and even cause paralysis.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a method for dynamically adjusting a hash table group according to the size of data volume.

The technical scheme is as follows: the invention discloses a method for dynamically adjusting a hash table group according to the size of data volume, which is characterized by comprising the following steps: receiving a user instruction, wherein the user instruction comprises a query instruction, a deletion instruction or an insertion instruction; performing corresponding element query operation, element deletion operation or element insertion operation in the hash table group according to the user instruction, and recording the number of inserted elements in the current hash table group in real time; the element insertion operation includes: analyzing an element to be inserted based on the user instruction, and inserting the element to be inserted into the current hash table group according to an insertion rule; the inserting rule comprises that after the current hash table group is determined to be full, a hash table is added after the current last hash table for completing inserting; when the number of the inserted elements in the current hash table group is lower than a threshold value, executing a table deleting operation, wherein the table deleting operation comprises the following steps: and taking the current last hash table as a hash table to be deleted, traversing each element, copying the current element to the other hash tables when traversing to one element, and deleting the hash table to be deleted after traversing is finished.

Further, the threshold is proportional to the number of hash tables in the current hash table set.

Further, the element query operation, the element deletion operation, or the element insertion operation is performed in parallel with the hash table deletion operation.

Further, each element in the set of hash tables has a respective status bit, the status bit being one of a first value, a second value, and a third value.

Further, in the table deletion operation, copying the current element to the remaining hash tables specifically includes: setting the state bit of the current element to the second value using an atomic operation; creating a copy element of the current element, inserting the copy element into the remaining hash tables except the table to be deleted according to the insertion rule, judging whether the state bit of the current element is changed into the third value after the insertion is successful, and deleting the successfully inserted copy element if the state bit of the current element is changed into the third value; otherwise, the copied elements are reserved.

Further, the element query operation includes: determining an element to be queried based on the query instruction, and querying whether an element with a data bit matched with the element to be queried exists in the whole hash table group; when there are matching elements: if the state bit of the inquired element is the first value or the second value, feeding back that the inquiry is successful; if the value of the state bit of the inquired element is the third value, the feedback inquiry is failed; when there are no matching elements, the feedback query fails.

Further, the element deletion operation includes: determining elements to be deleted based on the deletion instruction, sequentially traversing the hash table group, and inquiring whether elements with data bits matched with the elements to be deleted exist; when there is no matched element, the feedback deletion fails; when matched elements are inquired in other hash tables except the table to be deleted, the inquired elements are directly deleted; when a matching element is queried in the table to be deleted: if the value of the state bit of the inquired element is the first value, which indicates that the table deleting operation does not start to copy the inquired element, the inquired element is directly deleted from the table to be deleted, and successful deletion is fed back; if the value of the state bit of the queried element is the second value, setting the value of the state bit of the queried element as the third value, and judging whether the table deleting operation is copying the queried element: if the copy is being made, the feedback deletion is successful; if the data is not copied, traversing the whole hash table group again, inquiring whether an element with a data bit matched with the element to be deleted exists or not, if so, feeding back and deleting successfully, and if not, feeding back and deleting unsuccessfully; and if the state bit of the inquired element is the third value, feeding back that the deletion is successful.

Further, the insertion rule includes: according to the sequence number of each hash table, sequentially determining whether the corresponding position of the element to be inserted in each accessible hash table except the table to be deleted is occupied by the inserted element, once determining that the corresponding position of the element to be inserted in a certain accessible hash table is not occupied, inserting the element to be inserted, setting the state bit of the element which is successfully inserted as the first value, and feeding back that the insertion is successful; if the corresponding positions of the elements to be inserted in all the accessible hash tables are occupied by the inserted elements, replacing the inserted elements at the corresponding positions in the first hash table by the elements to be inserted, enabling the current replacement times to be 1, taking the replaced elements as the current elements to be inserted, and executing the following operations: (a) sequentially determining whether the corresponding positions of the elements to be inserted in the hash tables are occupied by the inserted elements or not according to the sequence numbers of the hash tables from the next hash table of the hash table where the elements to be inserted are replaced to the last accessible hash table; once the corresponding position of the current element to be inserted in a certain accessible hash table is determined not to be occupied, inserting the current element to be inserted, setting the state bit of the element which is successfully inserted as the first value, and feeding back the successful insertion; if the corresponding position of the current element to be inserted in the subsequent accessible hash table is occupied by the inserted element, executing the step (b); (b) replacing the element at the corresponding position in the next hash table of the hash table when the element is replaced by the current element to be inserted, adding 1 to the current replacement frequency, judging whether the current replacement frequency exceeds the set maximum replacement frequency, if not, taking the latest element to be replaced as the current element to be inserted, and returning to the step (a); if the number of the elements is larger than the preset number, the current hash table group is considered to be full, an accessible hash table with the same size is added after the last accessible hash table, the elements to be inserted are inserted into the newly added accessible hash table, the state bit is set as the first value after the insertion is successful, and the successful insertion is fed back.

Has the advantages that: compared with the prior art, the method and the device can dynamically adjust the hash table group according to the data volume, can better deal with hash tables with hash conflicts, and can effectively solve the common problems that the hash table is full of new elements and cannot enter, the elements have less space waste and the like. The method and the device dynamically increase and decrease the number of the tables according to different data volumes on the basis of the cuckoo hash, and simultaneously do not influence the operation of other inquiry or deletion and the like of the program when increasing and decreasing. Experimental results show that the algorithm can autonomously increase and decrease the number of hash tables under the environment of dynamic change of the data volume of the hash tables, and better performance and memory resource utilization are achieved.

Drawings

FIG. 1 is a general flow diagram of the method of dynamically adjusting a set of hash tables of the present invention;

FIG. 2 is a flow chart of a table delete operation of the present invention;

FIG. 3 is a flow diagram illustrating a special case of the present invention when a table delete operation is concurrent with an element delete operation;

FIG. 4 is a flow chart of the operation of the element query in the present invention;

FIG. 5 is a flow chart of an element delete operation in the present invention

Fig. 6 is a flow chart of an element insertion operation in the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

The basis of the invention is the commonly used 'cuckoo hash', which can hash the same element into different hash tables by using a plurality of hash functions respectively corresponding to a plurality of hash tables, and the number of the hash tables in the hash table group is determined in advance, namely the storage space is also determined in advance. However, the number of hash tables of "cuckoohash" in the present invention is set to be changeable.

Referring to fig. 1, the method of the present invention generally comprises: receiving a user instruction, wherein the user instruction comprises a query instruction, a deletion instruction or an insertion instruction; and performing corresponding element query operation, element deletion operation or element insertion operation in the hash table group according to the user instruction, and recording the number of elements inserted into the hash table group in real time.

When the element insertion operation is executed, when the elements in the original hash table are almost fully stored, the hash table is newly added behind the last hash table to finish the insertion, so that the storage space is increased under the condition of not changing the original data storage state and the original hash function, and the problem of insufficient space is efficiently and quickly solved.

When the elements in the original hash table are gradually reduced, if the number of the inserted elements in the current hash table group is lower than a threshold value, the table deleting operation is executed, so that the waste of memory space is avoided. Wherein the threshold is proportional to the number of hash tables in the current hash table set. Considering that the "element query operation", "element insertion operation", and "element deletion operation" are normally performed while the redundant table is cleared, the table deletion operation is set to have a separate thread with respect to the preceding operation to detect and clear the element in the last table and delete the table.

In each hash table, each inserted element has a respective status bit set to the upper 16 bits of the corresponding element, the value of the status bit being one of the first value, the second value, and the third value. For example, the value of the status bit may take 0, 1, 2, corresponding to the first value, the second value, and the third value, respectively.

The following describes the element insertion operation, the table deletion operation, the element deletion operation, and the element query operation in detail, respectively. Because the number of the hash tables can be changed, the flag bit flag is set to store the number of the current hash tables. When a user sends an instruction, the program traverses the hash table according to the number of the hash table stored in the flag bit flag to complete the user instruction.

Each table in the traversal hash table group involved in element insertion operation, table deletion operation, element deletion operation and element query operation

Table delete operation

When the table deletion operation is executed, the hash table can suspend the operation of the table to be deleted, perform insertion operation on the remaining elements in the table, insert the elements into the remaining tables except the table to be deleted, and change the flag bit flag at the same time, so as to reduce the total number of the tables, and enable the query and insertion operation to be normally executed.

With reference to the example of fig. 2, when performing a table deletion operation, all elements in the table to be deleted are traversed, and each time one element is traversed, an atomic operation is applied to change the state bit of the element from 0 to 1, a copy element of the element with the changed state bit is created, and then the copy element is inserted into the accessible hash table according to the same insertion rule (which will be described later) as the element insertion operation. In addition, in order to prevent the element deletion operation from not correctly finding the copy element being inserted in the previous table and accessing the element in the table to be deleted, it is necessary to determine whether the status bit of the current element in the table to be deleted becomes 2 due to the element deletion operation after the copy element of the current element is inserted in the previous table, and if the status bit becomes 2, it indicates that the element deletion operation does not correctly find the copy element being inserted in the previous table, and therefore, the copy element in the previous table is deleted; if not, the copy element inserted in the pre-table is retained and the status bit of the copy element is set to 0.

Since the purpose of the table delete operation is to detect and clean the elements in the last table and delete this table, similar to a garbage collection operation, the specific implementation of the table delete operation is described below using a Garbage Collection (GC) function.

Two, element query operation

The main flow of the element query operation comprises the following steps:

step 1, determining an element to be queried (acquire-value) based on a query instruction, traversing the whole hash table group in sequence, and querying matched elements; if the element is inquired, executing the step 2; if the element is not found, go to step 3. Wherein, the query process of the element to be queried (inquiry-value) comprises the following steps: calculating a key value of an element to be queried (acquire-value) according to the hash function of each table, comparing the element to be queried (acquire-value) with a position element corresponding to the key value in the table, if the element to be queried (acquire-value) is the same as the position element corresponding to the key value, determining that the element is queried, otherwise, determining that the element cannot be queried in the table.

Step 2, determining the state bit of the inquired element, if the state bit is 2, indicating that the element is in the table to be deleted and is deleted by the deleting operation, and feeding back the inquiry failure; if the status bit is 0 or 1, it indicates that the element still exists in the hash table set, and the feedback query is successful.

And 3, if the element is not inquired in the whole hash table group, the element does not exist in the table, and the inquiry failure is returned.

In the following, a "query ()" function is used as a specific implementation method of the element query operation, wherein it is assumed that a user inputs an X element, and the algorithm flow can be seen in fig. 4.

Third, element delete operation

The flow of element deletion operation comprises:

determining elements to be deleted (delete-value) based on a deletion instruction of a user, traversing the whole hash table group in sequence, and inquiring the elements to be deleted. When the matched elements are not inquired, the feedback deletion fails, and the element deletion operation is finished; otherwise, executing the step b. Wherein, the query process of the element to be deleted (delete-value) comprises the following steps: calculating a key value of the element (delete-value) to be deleted according to the hash function of each table, comparing the position of the element in the key value corresponding table with the position element corresponding to the key value, if the element (delete-value) to be deleted is the same as the position element corresponding to the key value, inquiring the element, otherwise, inquiring the element in the table.

B, judging whether the inquired elements are in the table to be deleted or not; if the deletion request is not in the table to be deleted, the deletion request is directly deleted, the deletion is fed back to be successful, and the element deletion operation is finished; if the table to be deleted exists, executing the step c.

C, determining the state bit of the inquired element; when the status bit is 0, the GC function is not operated to the element, and the element can be directly deleted in the table to be deleted to finish the deletion instruction of the user, so that the successful deletion is fed back, and the element deletion operation is finished; when the status bit is 1, indicating that it has been copied or is being copied into other hash tables, then the status position 2 of this element in the table to be deleted is executed, and step d is executed; and when the state bit is 2, the element deleting operation is performed on the element to be deleted before, the feedback deleting fails, and the element deleting operation is finished.

D, judging whether the GC function traverses the element, if so, indicating that the element is copied to other hash tables, and if so, deleting the element when the GC function operates to complete the deletion instruction of the user, so that the deletion is fed back successfully, and the element deletion operation is finished; if the GC function is not operating the element, the element is copied to other accessible hash tables, so that the whole hash table group needs to be traversed from the beginning again to inquire the element to be deleted, if the element to be deleted is inquired, the element to be deleted is deleted, the feedback deletion is successful, and if the element is not inquired, the feedback deletion is failed.

The above steps c and d reflect the extreme case of performing an element delete operation while performing a table delete operation, as shown in fig. 3. When the GC function is started, all elements in the table to be deleted are traversed, the flag bits of the elements are changed to be 1, the elements are inserted into the front table, if the elements in the front table which are being copied are not correctly found by the element deleting operation at the moment, and when the elements in the table to be deleted are accessed, the state bits of the matched elements inquired in the table to be deleted are changed to be 2 by the element deleting operation. When the GC function finds the change of the state bit of the corresponding element in the table to be deleted, the copy element in the previous table is deleted immediately when the copy element is inserted into the previous table.

In the following, a "delete ()" function is used as a specific implementation method of the element query operation, wherein it is assumed that a user inputs an X element, and a specific algorithm flow refers to fig. 5.

Four, element insertion operation

The general idea of element insertion operations includes: analyzing an element to be inserted (insert-value) based on the user instruction, and inserting the element to be inserted according to an insertion rule in an accessible hash table except for the table to be deleted (if any). The insertion rule includes that when the current hash table group is full, a hash table is added after the last hash table for completing insertion. Wherein, the query process of the element to be inserted (insert-value) comprises the following steps: calculating a key value of an element to be inserted (insert-value) according to the hash function of each table, comparing the element to be inserted (insert-value) with a position element corresponding to the key value in the table, and if the element to be inserted (insert-value) is the same as the position element corresponding to the key value, inquiring the element, otherwise, inquiring the element in the table.

The insertion rule specifically includes:

(a) according to the sequence number of each hash table, sequentially determining whether the corresponding position of the element to be inserted in each accessible hash table except the table to be deleted is occupied by the inserted element, once determining that the corresponding position of the element to be inserted in a certain accessible hash table is not occupied, inserting the element to be inserted, setting the state bit of the element which is successfully inserted to be 0, feeding back the successful insertion, and ending; if the corresponding positions of the elements to be inserted in all the accessible hash tables are occupied by the inserted elements, replacing the inserted elements at the corresponding positions in the first hash table by the elements to be inserted, enabling the current replacement frequency to be 1, taking the replaced elements as the current elements to be inserted, and executing the step (b);

(b) sequentially determining whether the corresponding positions of the elements to be inserted in the hash tables are occupied by the inserted elements or not according to the sequence numbers of the hash tables from the next hash table of the hash table where the elements to be inserted are replaced to the last accessible hash table; once the corresponding position of the current element to be inserted in a certain accessible hash table is determined not to be occupied, inserting the current element to be inserted, setting the state bit of the element which is successfully inserted to be 0, feeding back the successful insertion, and ending; if the corresponding position of the current element to be inserted in the subsequent accessible hash table is occupied by the inserted element, executing the step (c);

(c) replacing the element at the corresponding position in the next hash table of the hash table when the element is replaced by the current element to be inserted, adding 1 to the current replacement frequency, judging whether the current replacement frequency exceeds the set maximum replacement frequency, if not, taking the latest element to be replaced as the current element to be inserted, and returning to the step (b); if the number of the elements to be inserted is greater than the preset value, the current hash table group is considered to be full, an accessible hash table with the same size is added after the last accessible hash table, the flag bit flag for recording the number of the hash tables is increased by one, the elements to be inserted are inserted into the newly added accessible hash table, the state bit is set to be 0 after the insertion is successful, and the insertion is fed back successfully.

The following is a specific implementation of the element insertion operation, wherein it is assumed that the user inputs X elements, and the specific algorithm flow can be seen in fig. 6.

In summary, the present invention can dynamically increase and decrease the hash table according to the size of the data volume, so as to solve some problems encountered by the application of the current common "cuckoo hash" in the non-relational database, including the hash table that can dynamically adjust the hash table group and better cope with hash collisions, and also can effectively solve the common problems that the hash table is full of new elements and cannot enter, and the elements have less space waste, etc.

Claims

1. A method for dynamically adjusting a hash table set according to the size of data volume, comprising:

receiving a user instruction, wherein the user instruction comprises a query instruction, a deletion instruction or an insertion instruction;

performing corresponding element query operation, element deletion operation or element insertion operation in the hash table group according to the user instruction, and recording the number of inserted elements in the current hash table group in real time; the element insertion operation includes: analyzing an element to be inserted based on the user instruction, and inserting the element to be inserted into the current hash table group according to an insertion rule; the inserting rule comprises that after the current hash table group is determined to be full, a hash table is added after the current last hash table for completing inserting; each element in the set of hash tables has a respective status bit, the status bit being one of a first value, a second value, and a third value;

when the number of inserted elements in the current hash table group is lower than a threshold value, executing a table deletion operation, wherein the table deletion operation comprises the following steps: taking the current last hash table as a hash table to be deleted, traversing each element, copying the current element to the other hash tables when traversing to one element, and deleting the hash table to be deleted after traversing is completed;

in the table deletion operation, copying the current element to the remaining hash tables specifically includes:

setting the state bit of the current element to the second value using an atomic operation;

creating a copy element of the current element, inserting the copy element into the remaining hash tables except the table to be deleted according to the insertion rule, judging whether the state bit of the current element is changed into the third value after the insertion is successful, and deleting the successfully inserted copy element if the state bit of the current element is changed into the third value; otherwise, the copied elements are reserved;

the element deletion operation includes:

determining elements to be deleted based on the deletion instruction, sequentially traversing the hash table group, and inquiring whether elements with data bits matched with the elements to be deleted exist;

when there is no matched element, the feedback deletion fails;

when matched elements are inquired in other hash tables except the table to be deleted, the inquired elements are directly deleted;

when a matching element is queried in the table to be deleted:

if the value of the state bit of the inquired element is the first value, which indicates that the table deleting operation does not start to copy the inquired element, the inquired element is directly deleted from the table to be deleted, and successful deletion is fed back;

if the value of the state bit of the queried element is the second value, setting the value of the state bit of the queried element as the third value, and judging whether the table deleting operation is copying the queried element: if the copy is being made, the feedback deletion is successful; if the data is not copied, traversing the whole hash table group again, inquiring whether an element with a data bit matched with the element to be deleted exists or not, if so, feeding back and deleting successfully, and if not, feeding back and deleting unsuccessfully;

and if the state bit of the inquired element is the third value, feeding back that the deletion is successful.

2. The method of claim 1, wherein the threshold is proportional to the number of hash tables in the current hash table set.

3. The method of claim 1, wherein the element query operation, the element delete operation, or the element insert operation is performed in parallel with the hash table delete operation.

4. The method according to claim 1, wherein copying the current element to the remaining hash tables in the table deletion operation specifically includes:

creating a copy element of the current element, inserting the copy element into the remaining hash tables except the table to be deleted according to the insertion rule, judging whether the state bit of the current element is changed into the third value after the insertion is successful, and deleting the successfully inserted copy element if the state bit of the current element is changed into the third value; otherwise, the copied elements are reserved.

5. The method of claim 1, wherein the element query operation comprises:

determining an element to be queried based on the query instruction, and querying whether an element with a data bit matched with the element to be queried exists in the whole hash table group;

when there are matching elements: if the state bit of the inquired element is the first value or the second value, feeding back that the inquiry is successful; if the value of the state bit of the inquired element is the third value, the feedback inquiry is failed;

when there are no matching elements, the feedback query fails.

6. The method of claim 1, wherein the rule insertion comprises:

(a) determining whether the corresponding position of the element to be inserted in each accessible hash table except the table to be deleted is occupied by the inserted element or not in sequence according to the sequence number of each hash table, inserting the element to be inserted once the corresponding position of the element to be inserted in one accessible hash table is determined to be unoccupied, setting the state bit of the element which is successfully inserted as the first value, feeding back the successful insertion, and ending; if the corresponding positions of the elements to be inserted in all the accessible hash tables are occupied by the inserted elements, replacing the inserted elements at the corresponding positions in the first hash table by the elements to be inserted, enabling the current replacement times to be 1, taking the replaced elements as the current elements to be inserted, and executing the step (b);

(b) sequentially determining whether the corresponding positions of the elements to be inserted in the hash tables are occupied by the inserted elements or not according to the sequence numbers of the hash tables from the next hash table of the hash table where the elements to be inserted are replaced to the last accessible hash table; once the corresponding position of the current element to be inserted in a certain accessible hash table is determined not to be occupied, inserting the current element to be inserted, setting the state bit of the element which is successfully inserted as the first value, feeding back the successful insertion, and ending; if the corresponding position of the current element to be inserted in the subsequent accessible hash table is occupied by the inserted element, executing the step (c);

(c) replacing the element at the corresponding position in the next hash table of the hash table when the element is replaced by the current element to be inserted, adding 1 to the current replacement frequency, judging whether the current replacement frequency exceeds the set maximum replacement frequency, if not, taking the latest element to be replaced as the current element to be inserted, returning to the step (b), if so, determining that the current hash table group is full, adding an accessible hash table with the same size after the last accessible hash table, inserting the current element to be inserted into the newly added accessible hash table, setting the state bit as the first value after the insertion is successful, and feeding back the successful insertion.