WO2018133762A1 - Procédé et appareil de fusion de fichiers - Google Patents
Procédé et appareil de fusion de fichiers Download PDFInfo
- Publication number
- WO2018133762A1 WO2018133762A1 PCT/CN2018/072641 CN2018072641W WO2018133762A1 WO 2018133762 A1 WO2018133762 A1 WO 2018133762A1 CN 2018072641 W CN2018072641 W CN 2018072641W WO 2018133762 A1 WO2018133762 A1 WO 2018133762A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- file
- new
- header
- tree
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- the present invention relates to the field of data storage technologies, and in particular, to a method and apparatus for merging files stored in an external memory.
- the underlying data structure is either a B-tree or its variant B+ tree, or an LSM tree.
- the former has better read-friendliness, while the latter has better write-friendliness.
- these two things seem to have both fish and bear's paws, in the greedy Internet world, they are eager to have a data storage solution that is compatible with both reading and writing.
- the data structure used in LevelDB seems to combine LSM and B-tree, it is not thorough enough.
- the files stored in the disk in LevelDB are divided into multiple levels, and there are many files (SSTable files) in different levels.
- SSTable files files
- the SSTable files need to be merged, because in the SSTable file.
- the keys and corresponding values are stored together, so when merging LevelDB files, all key-value pairs need to be fetched one by one to build a new file.
- the merge process is more complicated and reduces readability. Write performance.
- LevelDB is a high-performance KV storage engine developed by Google, inspired by Google's BigTable.
- LevelDB can play very good performance in the small data volume scenario, in the case of large data volume (hundreds of G) and high frequency write, LevelDB is reading, writing, merging, data cleaning, restart recovery, etc. Many aspects have exposed its shortcomings.
- An object of the embodiments of the present invention is to provide a file merging method and apparatus for data merging.
- a file merging method is provided.
- the file is stored in an external memory, including a file header, a data block, and an index block.
- the file header is used to record metadata information of the file, and the data block is used for storing.
- the index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and the method includes: The first file is additionally written with an additional data block in which the value in the data block of the second file is written; after the additional data block is additionally written, the new index block is additionally written, and the new index block is based on the index block of the first file and the second The index block generated by the file, the index block of the first file and all the valid keys in the index block of the second file and their corresponding values are recorded in the data block of the first file and the logical address in the additional data block respectively.
- a new file header is additionally written after the new index block to record the metadata information of the merged new file.
- the keys and values of the file described in the embodiment of the present invention are stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance. And the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
- the metadata information may include one or more of the following:
- all nodes constituting the B+ tree are physically stored contiguously.
- the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
- the file merging method may further include: updating a file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
- the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
- the file includes a front file header at the head of the file and a back file header at the end of the file.
- the contents of the front file header and the last file header are the same, and the previous file header of the first file is updated according to the new file header as a new file.
- the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
- the file merging method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first file before the merge according to the file header of the first file. And/or in the case of an error in the step of updating the header of the first file, the header of the first file is re-updated according to the new header.
- the file in the merge process can be restored to the first file before the merge according to the file header of the first file.
- an error occurs during the process of updating the header of the first file, it can be re-updated according to the new header.
- the file merging method may further include: performing the following steps to read the target value corresponding to the request key from the target file: obtaining a file header and an index block of the target file; determining, according to the file header, whether the request key is at the file header Within the range of the indicated key; in the case where it is determined that the request key is in the range, based on the B+ tree structure of the index block, the leaf node corresponding to the request key is searched in the index block; stored according to the found leaf node The value corresponding to the key reads the target value at the logical address in the data block in the target file.
- a file merging apparatus is further provided, where the file is stored in an external memory, including a file header, a data block, and an index block, where the file header is used to record metadata information of the file, and the data block is used.
- the index block is used to store the key corresponding to the value in the form of a B+ tree, wherein the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree
- the device includes a first writing unit, configured to write an additional data block after the first file, wherein a value in the data block of the second file is written; a B-tree generating unit, configured to be based on the index block of the first file and the second The index block of the file generates a new B+ tree, and all the valid keys in the index block of the first file and the index block of the second file and their corresponding values are respectively recorded in the data blocks of the first file and the logical addresses in the additional data block.
- the second write unit is configured to additionally write a new index block after appending the data block, wherein the new B+ tree is written; and the third write unit is used after the new index block Add a new header is written to the new file meta merged data record.
- the metadata information may include one or more of the following:
- the file merging device may further include: an updating unit, configured to update the file header of the first file according to the new file header, to replace the metadata in the file header of the first file with the metadata information in the new file header. information.
- an updating unit configured to update the file header of the first file according to the new file header, to replace the metadata in the file header of the first file with the metadata information in the new file header. information.
- the file includes a front file header at the head of the file and a rear file header at the end of the file.
- the contents of the front file header and the back file header are the same, and the update unit updates the previous file header of the first file according to the new file header.
- the file merging device may further include: a first restoring unit, where the step of writing the metadata information of the new file in the new file header fails, the new file is restored according to the file header of the first file The first file before the merge; and/or the second restore unit is configured to re-update the file header of the first file according to the new file header if the step of updating the file header of the first file is erroneous.
- the file merging device may further include a reading unit, configured to read the target value corresponding to the request key from the target file, where the reading unit may include: an acquiring module, acquiring a file header and an index of the target file a determining module, according to the file header, determining whether the request key is within the range of the key indicated by the file header; and the finding module, in the case of determining that the request key is in the range, searching in the index block based on the B+ tree structure of the index block Corresponding to the leaf node of the request key; the reading module reads the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
- the reading unit may include: an acquiring module, acquiring a file header and an index of the target file a determining module, according to the file header, determining whether the request key is within the range of the key indicated by the file header; and the finding module, in the case of determining that the request key is in the range, searching in the
- the key and value of the file described in the file merging method and apparatus of the embodiment of the present invention are separately stored, wherein the key is stored in the form of a B+ tree, thereby maintaining one file when merging the two files Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree.
- the new index block can be easily read from the merged file. The value of the merged file will not be affected.
- Another object of embodiments of the present invention is to provide a new database management method and database system.
- a database management method for storing a plurality of pieces of data, wherein each piece of data includes a corresponding key and value, the method comprising: writing a plurality of pieces of data into an external memory The log file; the data in the log file is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key; when the size of the memory table exceeds a predetermined threshold, the memory table is converted Is a read-only memory table, and writes subsequent data in the log file to the new memory table; writes the data in the read-only memory table to the external accessor to obtain the first-level storage file; and merges two or More first-level storage files to get second-level storage files.
- the file finally stored in the external memory has only two layers, and the redundancy is low, which is convenient to find.
- the data block management method may further include: specifying a primary file name of the first-level storage file by using a first naming rule; and specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule It is different from the second naming rule in order to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the main file name.
- the memory table may be composed of a hash table, where the hash table includes one or more hash buckets, each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table.
- the order of the elements in the jump table is arranged in order according to the size of the keys.
- Inserting the memory table before jumping the table can reduce the lock granularity. For concurrent read and write operations, if the keys are different, the fast lookup insertion can be performed in the jump table corresponding to the respective hash bucket. On the other hand, the expansion is expanded. While the size of the memory table is not increased, the size of the jump table is not increased, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, thereby improving the overall search efficiency.
- the data block management method may further include: maintaining a read-only memory table queue in the internal memory, where data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold , converts the new memory table into another read-only memory table and puts it into the read-only memory table queue.
- the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree.
- the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
- the B+ tree can be physically stored continuously by utilizing the local preloading feature of the disk, so that the index block of the file to be merged can be obtained by simply traversing successive disk blocks in the process of reconstructing the index block.
- the step of merging the two first-level storage files may include: additionally writing an additional data block after the first storage file, where the value in the data block of the second storage file is written; and appending the data block after appending the data block
- Writing a new index block the new index block is generated based on the index block of the first storage file and the index block of the second storage file, the index block of the first storage file and all the keys in the index block of the second storage file and The corresponding value is recorded in the leaf node of the new B+ tree in the data block of the first storage file and the logical address in the additional data block; the new file header is additionally written after the new index block to record the merged new file. Metadata information for the file.
- the two files when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance.
- the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
- the metadata information may include one or more of the following: the number of keys in the index block; the range of keys in the index block; the height of the B+ tree; the logical address of the first leaf node in the B+ tree; and the B+ tree The number of internal nodes.
- the database management method may further include: updating a file header of the first storage file according to the new file header to replace the metadata information in the file header of the first storage file with the metadata information in the new file header.
- the embodiment of the present invention can avoid the damage caused to the file by the abnormal situation during the merge process by setting the double file header.
- the file may include a front file header located at a file header and a subsequent file header located at a tail of the file, and the contents of the front file header and the subsequent file header are the same, and the front file header of the first storage file is updated according to the new file header, as The front file header of the new file, with the new file header as the post file header of the new file.
- the front and back headers of the new file can be updated normally, and can be used to view the metadata information in the new file.
- the database management method may further include: when the step of writing the metadata information of the new file in the new file header is wrong, the new file is restored to the first before the merge according to the file header of the first storage file.
- the file is stored; and/or in the case where the step of updating the header of the first stored file is erroneous, the header of the first stored file is re-updated according to the new header.
- the file in the merge process can be restored to the first before the merge according to the file header of the first storage file.
- the file is stored, and when an error occurs in the process of updating the header of the first stored file, it can be re-updated according to the new header.
- the database management method may further include: searching for a key corresponding to the request key in the memory table, in response to the request for finding the target value corresponding to the request key, and reading the target value in the case of finding; If the request key is not found in the memory table, the read-only memory table is searched for the key corresponding to the request key, and the target value is read in the case of the search; the read-only memory table is not found.
- the request key it is chronologically searched for whether each of the first-level storage files in the disk has a key corresponding to the request key, and in the case of the search, the target value is read; and in each of the first-level storage files. If it is not found, use the binary search method to find whether the second-level storage file in the disk has the key corresponding to the request key, and read the target value if found.
- the database management method may further include: acquiring a file header and an index block of the target storage file in response to the request for reading the target value corresponding to the request key from the target storage file; determining, according to the file header, whether the request key is Within the range of the key indicated by the file header; in the case of determining that the request key is within the range of the key indicated by the file header, searching for the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block; In the case of the search, the target value is read in the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
- the database management method may further include: in response to restarting the request for restoring the internal memory, constructing the second-level storage file list according to the size order of the range of the keys included in the second-level storage file; storing according to the first level The file serial number order of the file, constructing a first-level storage file list; determining, according to the first-level storage file list and the second-level storage file list, the writing progress of the data in the log file being written to the first-level storage file; According to the writing progress, the data in the log file that is not written to the first-level storage file is written to the memory table in the internal memory.
- a database system comprising: an internal memory and an external memory, wherein the internal memory is used to write a plurality of pieces of data to a log file in the external memory, and the external memory will log the file
- the data in the internal memory is written into the memory table in the internal memory, wherein the data written in the memory table is stored in order according to the size of the key.
- the internal memory converts the memory table into a read-only memory table.
- the external memory writes the subsequent data in the log file to the new memory table, and the internal memory writes the data in the read-only memory table to the external accessor to obtain the first-level storage file, and the external memory merges two or more.
- the first level stores the text to get the second level storage file.
- the external storage specifies a primary file name of the first-level storage file by using a first naming rule, and specifies a primary file name of the second-level storage file by using a second naming rule, where the first naming rule is different from the second naming rule.
- the first naming rule is different from the second naming rule.
- the memory table is composed of a hash table, the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table, and each piece of data in the memory table constitutes an element of the jump table, wherein The order of the elements in the jump table is ordered according to the size of the keys.
- the read-only memory table queue is maintained in the internal memory, and the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the external memory converts the new memory table into a new memory table. Make another read-only memory table and put it into a read-only memory table queue.
- the data structure of the storage file may include: a file header for recording metadata information of the storage file, a data block for storing the value, and an index block for storing the key corresponding to the value in the form of a B+ tree.
- the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree, and all the nodes constituting the B+ tree are physically stored continuously.
- the external memory merges the two first-level storage files by performing an operation of: additionally writing the additional data block after the first storage file, wherein the value in the data block of the second storage file is written; and appending the data block Then, a new index block is additionally written, and the new index block is generated based on the index block of the first storage file and the index block of the second storage file, and all of the index block of the first storage file and the index block of the second storage file are valid.
- the key and its corresponding value are recorded in the leaf node of the new B+ tree in the data block and the additional data block of the first storage file respectively; the new file header is additionally written after the new index block to record the merge Metadata information after the new file.
- the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
- FIGS. 1 and 3 are diagrams showing the data structure of a file involved in the file combining scheme of the present invention.
- FIG. 2 is a diagram showing the B+ tree structure of an index block of the present invention.
- FIG. 4 is a schematic flow chart showing a file merging method according to an embodiment of the present invention.
- 5 and 6 show schematic views of a file merge state based on the present invention.
- FIG. 7 shows a schematic flow chart of a method of reading data in an object file.
- FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram showing the structure in which the reading unit can also have a function module.
- FIG. 10 is a schematic diagram showing a hardware configuration of an electronic device in which an embodiment of the present invention can be performed.
- FIG. 11 is a block diagram showing the structure of a database system according to an embodiment of the present invention.
- FIG. 12 is a flow chart showing the data storage between the internal memory 110 and the external memory 120.
- Figure 13 is a static diagram showing the process of storing data.
- Figure 14 is a flow chart showing a complete lookup.
- Figure 15 is a flow chart showing the lookup inside a file.
- FIG. 16 is a schematic flow chart showing restart recovery according to an embodiment of the present invention.
- any specific values are to be construed as illustrative only and not as a limitation. Thus, other examples of the exemplary embodiments may have different values.
- FIG. 1 is a block diagram showing a hardware configuration of an electronic device 1000 in which an embodiment of the present invention can be implemented.
- the electronic device 1000 can be a portable computer, a desktop computer, a mobile phone, a tablet, or the like. As shown in FIG. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like.
- the processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like.
- the memory 1200 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile memory such as a hard disk, and the like.
- the interface device 1300 includes, for example, a USB interface, a headphone jack, and the like.
- the communication device 1400 can, for example, perform wired or wireless communication, and specifically can include Wifi communication, Bluetooth communication, 2G/3G/4G/5G communication, and the like.
- the display device 1500 is, for example, a liquid crystal display, a touch display, or the like.
- Input device 1600 can include, for example, a touch screen, a keyboard, a somatosensory input, and the like. The user can input/output voice information through the speaker 1700 and the microphone 1800.
- the memory 1200 of the electronic device 1000 is configured to store an instruction for controlling the processor 1100 to perform any of the file merging methods provided by the embodiments of the present invention or Database management method. It will be understood by those skilled in the art that although a plurality of devices are illustrated for electronic device 1000 in FIG. 1, the present invention may relate only to some of the devices therein, for example, electronic device 1000 relates only to processor 1100 and storage device 1200. A technician can design instructions in accordance with the disclosed aspects of the present invention. How the instructions control the processor for operation is well known in the art and will not be described in detail herein.
- This embodiment mainly proposes a scheme of merging files stored in an external memory such as a hard disk, a floppy disk, an optical disk, or a USB disk.
- the key and value of the file described in the file merging method and apparatus of the present embodiment are separately stored, wherein the key is stored in the form of a B+ tree, whereby when merging the two files, one file can be kept. Moves the value of another file directly to the previous file, improves the write performance, and reconstructs the index block that stores the key in the form of a B+ tree.
- the new index block can be easily read from the merged file. Value, the read performance of the merged file will not be affected.
- FIG. 1 is a schematic diagram showing the data structure of a file in the file combining scheme of the present embodiment.
- the file described in this embodiment can be physically divided into a file header, a data block, and an index block by blocks, and each block can be composed of a plurality of pages.
- the page referred to in this embodiment is the minimum unit of one I/O, which is generally an integer multiple of the system page, and the size of the pages of different types of blocks may be different.
- the data block is used to store the value.
- the index block is used to store the key corresponding to the value in the form of a B+ tree.
- the B+ tree is composed of a leaf node, an internal node, and a root node.
- the form of the B+ tree here is a person skilled in the art. As is well known, it will not be repeated here.
- each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
- all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved.
- the efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
- the header is used to record the metadata information of the file.
- the metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
- the file header of the file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same.
- the file described in this embodiment may further include a filter, and the filter may be used to determine whether the accessed key is in the file.
- the filter may be a Bloom filter, and the access does not exist.
- FIG. 4 is a schematic flowchart showing a file merging method according to an embodiment of the present invention, including steps S210 to S230.
- the method can combine two or more files.
- the first file and the second file are combined as an example for description.
- step S210 an additional data block is additionally written after the first file, in which the value in the data block of the second file is written.
- the freshness of the second file may be greater than the first file, that is, the second file may be stored later in the external memory, and the first file may be previously stored in the external memory.
- the value in the data block of the second file may be additionally appended after the first file.
- a block in which a value is added after the first file is referred to as an additional data block. That is to say, the value in the data block of the second file can be rewritten in the additional data block after the first file, so that the end of the file F and the address of the additional data block are consecutive.
- new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
- the new index block is generated based on the index block of the first file and the index block of the second file.
- the freshness of the second file may be greater than the first file, so the key value in the second file may be a modification, deletion, replacement, etc. of the key value in the first file, and thus for the first file and The same key exists in the index block of the second file, and the key in the second file with higher freshness can be selected as the valid key, and the key in the first file is discarded, thereby constructing a new index block.
- the keys in the generated new index block are all valid keys, and the corresponding values are all valid values.
- the key in the new index block is also stored in the form of a B+ tree which is regenerated according to the index block of the first file and the index block of the second file, and thus may be referred to as a new B+ tree.
- the index block of the first file and all the valid keys in the index block of the second file and their corresponding values are respectively recorded in the data block of the first file and the logical address in the additional data block in the leaf of the new B+ tree. In the node.
- the index block of the first file and all the nodes of the B+ tree in the index block of the second file are physically stored continuously, so that in the process of reconstructing the new B+ tree, the local portion of the disk can be utilized.
- the feature of the preloading feature is that the index block of the first file and the index block of the second file can be obtained by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
- the index block in the first file is invalidated and replaced by the new index block.
- the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
- step S230 a new file header is additionally written after the new index block to record the metadata information of the merged new file.
- the metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second file and free up storage space.
- FIG. 5 is a schematic diagram showing a merge process of merging G files into F files according to an embodiment of the present invention.
- the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated.
- the merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
- FIG. 6 is a schematic diagram showing another example of a merge process of merging G files into F files according to an embodiment of the present invention.
- both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file.
- the contents of the front file header and the last file header are the same.
- the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header.
- the file header of the file is used as the new file header.
- the present invention adopts a method of maintaining a double file header, and can solve the problem that the file cannot be recovered due to an abnormal situation.
- the first two files of the new file can be updated normally and are the same.
- FIG. 7 is a schematic flowchart showing a method of reading a target value corresponding to a request key from a file.
- step S310 a file header and an index block of the target file are acquired.
- step S320 it is determined according to the file header whether the request key is within the range of the key indicated by the file header. If not, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends.
- step S330 is performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block.
- the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target file, and the reading ends.
- step S340 may be performed to read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
- FIG. 8 is a functional block diagram showing a file merging apparatus according to an embodiment of the present invention.
- the functional modules of the file combining apparatus 500 may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present invention. It will be understood by those skilled in the art that the functional modules described in FIG. 7 can be combined or divided into sub-modules to implement the principles of the above described invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
- the file merging device 500 shown in FIG. 8 can be used to implement the detecting method shown in FIG. 3 to FIG. 6.
- the following is only a brief description of the functional modules that the file merging device 500 can have and the operations that can be performed by the functional modules.
- For details of the reference refer to the description above with reference to FIG. 3 to FIG. 6 , and details are not described herein again.
- the file combining apparatus 500 includes a first writing unit 510, a B-tree generating unit 520, a second writing unit 530, and a third writing unit 540.
- the first writing unit 510 is configured to write an additional data block after the first file, wherein the value in the data block of the second file is written.
- the B-tree generating unit 520 is configured to generate a new B+ tree based on the index block of the first file and the index block of the second file, all the keys in the index block of the first file and the index block of the second file, and each key
- the logical addresses in the data block and the additional data block of the first file are respectively recorded in the leaf nodes in the new B+ tree;
- the second writing unit 530 is configured to additionally write a new index block after appending the data block, in which a new B+ tree is written.
- the third writing unit 540 is configured to additionally write a new file header after the new index block to record the metadata information of the merged new file.
- the file combining apparatus 500 may also optionally include an updating unit 550.
- the update unit 550 can update the file header of the first file according to the new file header to replace the metadata information in the file header of the first file with the metadata information in the new file header.
- the file may include a front file header located at the head of the file and a subsequent file header located at the end of the file, and the contents of the front file header and the subsequent file header are the same.
- the update unit 550 can update the previous file header of the first file as the previous file header of the new file according to the new file header, and use the new file header as the post file header of the new file.
- the file combining apparatus 500 may further include a first restoring unit 560 and a second restoring unit 570.
- the first restoration unit 560 may restore the new file to the first file before the merge according to the file header of the first file in the case where the step of writing the metadata information of the new file in the new file header is erroneous.
- the second restoration unit 570 may re-update the file header of the first file according to the new file header in the case where the step of updating the file header of the first file is erroneous.
- the file combining apparatus 500 may further include a reading unit 580.
- the reading unit 580 can read the target value corresponding to the request key from the target file.
- FIG. 8 is a functional block diagram showing functional modules that a reading unit can have.
- the reading unit 580 may include an obtaining module 581, a determining module 583, a searching module 585, and a reading module 587.
- the obtaining module 581 can acquire the file header and the index block of the target file, and the determining module 583 can determine, according to the file header, whether the request key is within the range of the key indicated by the file header. In the case where it is determined that the request key is within the range, the lookup module 585 can look up the leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block. The reading module 587 can read the target value in the logical address in the data block in the target file according to the value corresponding to the key stored by the found leaf node.
- an electronic device including a memory and a processor.
- the memory is configured to store executable instructions
- the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions.
- the electronic device can be an electronic device 1000 as shown in FIG.
- the keys and values of the file can be stored separately, and the keys are stored in the form of a B+ tree. Therefore, when the two files are merged, one file can be kept unchanged, and the value of the other file can be directly added and written, thereby improving the writing performance.
- the merged index block is a new B+ tree, and the value in the merged file can be conveniently read according to the new index block, and the read performance of the merged file is not affected.
- LevelDB exposes many shortcomings in many aspects such as reading, writing, merging, data cleaning, restarting recovery, etc.
- this embodiment proposes a new database management method and database system.
- FIG. 11 is a block diagram showing the structure of a database system according to an embodiment of the present invention.
- the database system 100 of the present invention mainly includes an internal memory 110 and an external memory 120.
- the internal memory 110 and the external memory 120 can cooperate to complete data storage.
- FIG. 12 is a flow chart showing the cooperation between the internal memory 110 and the external memory 120 to implement data storage.
- a plurality of pieces of data to be stored may be written by the internal memory 110 to a log file in the external memory 120.
- Each piece of data includes corresponding keys and values, and the log files can be sequentially written in the order in which the data arrives.
- step S120 may be performed to write the data in the log file to the memory table in the internal memory 110 by the external memory 120.
- the data written in the memory table can be stored in order according to the size of the key.
- the data stored in the memory table may adopt a jump table structure, so that the data stored in the memory table is arranged in an order according to the size of the key.
- the memory table may be composed of a hash table, where the hash table may include one or more hash buckets, each hash bucket corresponds to one jump table, and each data in the memory table constitutes one of the jump tables. An element in which the order of the elements in the jump table is ordered in order of the size of the key.
- a hash table is embedded before the jump table, so that the lock granularity can be reduced on the one hand, and for concurrent read and write operations, if the keys are not the same, the fast lookup can be performed in the jump table corresponding to the respective hash bucket. insert.
- the size of the jump table is not enlarged, which can reduce the probability that the jump table becomes a linear search as the amount of data becomes larger, and improves the overall search efficiency.
- the internal memory 110 can convert the memory table into a read-only memory table (step S130), at which time the internal memory 110 is not written in the log file.
- the data can be written to a new memory table.
- read-only memory tables can only be read and cannot be written.
- the log file in the external memory 120 and the memory table in the internal memory 110 may have a one-to-one correspondence, that is, for a key-Value data, it may be written to the log file, and then from the log file.
- the newly arrived data can be written into a new log file, and the data in the new log file can be written into the new memory table.
- step S140 may be performed, and the data in the read-only memory table may be written into the external memory 120 by the internal memory 110 to obtain the first-level storage file.
- the external memory 120 may perform step S150 to merge two or more first-level storage files stored therein to obtain a second-level storage file.
- the external memory 120 may specify a primary file name of the first-level storage file according to the first naming rule, and may specify a primary file name of the second-level storage file according to the second naming rule, where the first naming rule and the second naming
- the rules can be set differently to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name. For example, "_0" can be added after the main file name of the first-level storage file, and "_1" is added after the main file name of the second-level storage file. That is, the first level storage file and the second level storage file can be named by xxx_0.hdb, xxx_1.hdb respectively.
- FIG. Figure 13 is a static diagram showing the process of storing data.
- the read-only memory table queue can be maintained in the internal memory.
- the data in the read-only memory table is not all written to the external memory, and the new memory table will be new when the size of the new memory table exceeds a predetermined threshold. Convert to another read-only memory table and put it into a read-only memory table queue. Therefore, by maintaining the memory table queue, it is possible to cope with the problem of blocking when the high frequency is written, because the data is too late to be merged, and the memory table is full.
- the structure of the database system of the present embodiment and the data storage flow of the database system for persistently storing data in the external memory have been described so far with reference to FIGS. 11 to 13.
- the following describes the process of merging the storage files stored in the external storage, the data search process, and the data recovery process when the database system is restarted under special circumstances.
- Fig. 1 a schematic diagram of a data structure of a storage file stored in an external memory has been shown.
- the files described in the present invention can be physically divided into file headers, data blocks, and index blocks by blocks, and each block can be composed of a plurality of pages.
- the page mentioned in this paper is the minimum unit of I/O, which is generally an integer multiple of the system page.
- the size of the pages of different types of blocks can be different.
- the data block is used to store the value.
- the index block is used to store the key corresponding to the value in the form of a B+ tree.
- the form of the B+ tree is well known to those skilled in the art and will not be described here. It should be noted that each leaf node in the B+ tree corresponds to a key, and the logical addresses of all the keys and their corresponding values in the data block are respectively recorded in the leaf nodes in the B+ tree. That is, only the key is stored in the leaf node of the B+ tree, and no value is stored. Instead, the offset of the page in the data block where the value is located and the offset of the value in the page can be stored.
- all the nodes (root node, internal node, and leaf node) constituting the B+ tree are physically and continuously stored, thereby utilizing the local preloading feature of the disk to quickly acquire all nodes in the B+ tree, and the merge can be improved.
- the efficiency of building a new B+ tree in the process (the merge process will be explained in more detail below).
- the header is used to record the metadata information of the file.
- the metadata information may include the number of keys in the index block, the range of keys in the index block, the height of the B+ tree, the logical address of the first leaf node in the B+ tree, and the number of internal nodes in the B+ tree.
- the file header of the storage file may include a front file header and a back file header, and the metadata information of the file recorded by the front file header and the subsequent file header may be the same.
- the storage file may further include a filter, and the filter may be used to determine whether the accessed key is in the file, for example, the filter may be a Bloom filter, and for accessing a key that does not exist, the Bronze may be used.
- the filter quickly determines that the key does not exist, and does not need to go to the B+ tree to query. Because the Bloom filter is actually a hash table, you can judge the existence of the key in the complexity of O(1), and the search time complexity of the B+ tree is O(logn), so you can set the Bloom filter. Improve search efficiency, which can improve read performance.
- FIG. 4 is a schematic flow chart showing a method of merging storage files according to an embodiment of the present invention.
- the method may combine two or more storage files, wherein two or more first-level storage files may be combined into one second-level storage file, or two or more seconds may be combined.
- the level stores the file and generates a new second-level storage file.
- the merging process of the storage file of this embodiment is described here by taking the first storage file and the second storage file as an example.
- step S210 an additional data block is additionally written after the first storage file, wherein the value in the data block of the second storage file is written.
- the freshness of the second storage file may be greater than the first storage file, that is, the second storage file may be stored later in the external storage, and the first storage file may be previously stored in the external storage.
- the value written in the data block of the second storage file may be appended after the first storage file, where A block in which a write value can be added after the first storage file is referred to as an additional data block. That is to say, the value in the data block of the second storage file can be rewritten in the additional data block after the first storage file, so that the end of the file F and the address of the additional data block are consecutive.
- new index information can be created, that is, in step S220, the new index block is additionally written after the additional data block.
- the new index block is generated based on the index block of the first storage file and the index block of the second storage file.
- the freshness of the second storage file may be greater than the first storage file, so the key value in the second storage file may be a modification, deletion, replacement, etc. of the key value in the first storage file, and thus The same key existing in the index block of the first storage file and the second storage file may select a key in the second storage file with higher freshness as a valid key, and discard the key in the first storage file to construct a new key Index block.
- the keys in the generated new index block are all valid keys, and the corresponding values are all valid values.
- the key in the new index block is also stored in the form of a B+ tree, which is regenerated according to the index block of the first storage file and the index block of the second storage file, and thus may be referred to as a new B+ tree.
- the index key of the first storage file and all the valid keys in the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block. In the leaf node.
- all the nodes of the B+ tree in the index block of the first storage file and the index block of the second storage file are physically stored continuously, so that the disk can be utilized in the process of reconstructing the new B+ tree.
- the local preloading feature can obtain the index block of the first storage file and the index block of the second storage file by simply traversing successive disk blocks, thereby improving the construction efficiency of the new B+ tree.
- the index block in the first storage file is invalidated and replaced by the new index block.
- the invalidity mentioned here means that in the subsequent search process, the new index block is used for searching, and the old index block is no longer used. That is, after generating a new index block, the old index block may not be deleted.
- step S230 a new file header is additionally written after the new index block to record the metadata information of the merged new file.
- the metadata information of the new file may include the number of keys in the new index block, the range of keys in the new index block, the height of the new B+ tree, the logical address of the first leaf node in the new B+ tree, and the internal nodes in the new B+ tree. Number and so on. After generating a new file header, you can delete the second storage file and free up storage space.
- FIG. 5 is a schematic diagram showing a merge process of merging G files into F files.
- the F file is unchanged, and only the value in the G file needs to be additionally written into the F file, and a new index block and a new file header are generated.
- the merge process is simpler, and according to the merged B+ tree, the value corresponding to the key in the file can be conveniently found, and the read performance is improved. .
- FIG. 6 is another schematic diagram showing a merge process of merging G files into F files.
- both the F file and the G file in FIG. 6 include a front file header located at the head of the file and a rear file header located at the end of the file.
- the contents of the front file header and the last file header are the same.
- the previous file header of the F file can be updated according to the new file header as the front file header of the new file, and the new file header is used as the new file header.
- the file header of the file is used as the new file header.
- the first two files of the new file can be updated normally and are the same.
- the storage process when the data is persistently stored in the storage file in the external storage, the storage process is first written to the memory table, then written to the read-only memory table, and then written to the external
- the first level storage file in the memory the first level storage file is merged into the second level storage file. Therefore, the freshness of the data is decremented according to the memory table, the read-only memory table, the first-level storage file, and the second-level storage file.
- step S410 may first be performed to find in the memory table whether there is a key corresponding to the request key. For example, when the data in the memory table is stored in the form of a hash table and a jump table, it may first be located according to the request key to the specific hash bucket in the memory table, and then searched in the corresponding jump table.
- step S420 If you find it in the memory table, you can read it directly. If it is not found in the memory table, it can continue to find in the read-only memory table in the internal memory whether or not there is a key corresponding to the request key (step S420). Wherein, when the memory memory maintains a read-only memory table queue with multiple read-only memory tables, the read-only memory table in the read-only memory table queue can be searched one by one in chronological order.
- step S430 If it is not found in the read-only memory table, it can be searched from the first-level storage file in the external memory. Here, it can be chronologically searched whether each first-level storage file in the external storage has a request key corresponding to it. Key (step S430).
- the value corresponding to the request key can be read from the first-level storage file. If it is not found, it can be searched from the second-level storage file in the external storage. Here, it can be used to find whether the second-level storage file has the value corresponding to the request key in the convenient time of the binary search (step S440).
- the value corresponding to the request key can be read from the second-level storage file. In the case that it is not found, it indicates that the request key and the corresponding value are not stored in the database system.
- Figure 15 is a flow chart showing the lookup inside a file.
- the file header and the index block of the target storage file may be first acquired (step S510), and then step S520 is performed to determine, according to the file header, whether the request key is within the range of the key indicated by the file header, and if not, indicating the target storage. The value corresponding to the request key does not exist in the file, and the reading ends.
- step S530 may be performed to find a leaf node corresponding to the request key in the index block based on the B+ tree structure of the index block.
- the leaf node corresponding to the request key is not found in the index block, it indicates that the value corresponding to the request key does not exist in the target storage file, and the reading ends.
- step S540 may be performed to read the target value according to the logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
- Fig. 16 is a schematic flow chart showing the restart of the restart according to the present embodiment.
- the sequence between step S610 and step S620 is not required, and may be performed at the same time or at different times.
- a second level storage file list is constructed. Specifically, the index block of the second-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the second-level storage file list is constructed according to the range order of the keys.
- a first level storage file list is constructed. Specifically, the index block of the first-level storage file, the filter block (in some cases), the file header, and the like may be pre-loaded by means of memory mapping, and then the first-level storage file list is constructed according to the range order of the keys.
- step S630 After the first-level storage file list and the second-level storage file list are constructed, it is possible to determine that the write of the log file written to the first-level storage file is entered (step S630), so that the write progress can be made according to the progress.
- the memory table and the read-only memory table in the internal memory are constructed (step S640).
- log files written in the external memory which are respectively corresponding to the memory table (or the read-only memory table), so that it can be determined according to the constructed first file list and the second file list.
- the data in those log files in multiple log files is not written to the storage file.
- the log file that is not written to the storage file can then be converted to a read-only memory table, where the data in the log file can be written to the memory table for the last generated log file.
- the recovery of the memory table and the read-only memory table in the internal memory can be completed.
- an electronic device including a memory and a processor.
- the memory is configured to store executable instructions
- the processor is configured to execute the electronic device to perform any one of the file merging methods provided in the embodiment according to the control of the executable instructions.
- the electronic device can be an electronic device 1000 as shown in FIG.
- the file finally stored in the external memory has only two hierarchical structures, and the file redundancy is low, which is convenient to find.
- a database management method In the present embodiment, a database management method, a database system, and an electronic device as described below are also provided.
- Aspect 2 The data block management method of aspect 1, further comprising:
- Level storage file Specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule being different from the second naming rule, so as to distinguish whether the storage file is a first-level storage file or a second based on the primary file name.
- Level storage file Specifying a primary file name of the second-level storage file by using a second naming rule, the first naming rule being different from the second naming rule, so as to distinguish whether the storage file is a first-level storage file or a second based on the primary file name.
- the memory table is composed of a hash table
- the hash table includes one or more hash buckets, and each hash bucket corresponds to one jump table.
- Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
- Aspect 4 The database management method of aspect 1, further comprising:
- the data in the read-only memory table is not all written to the external memory, and when the size of the new memory table exceeds a predetermined threshold, the new memory table is converted Make another read-only memory table and put it into the read-only memory table queue.
- An index block configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
- Aspect 6 The database management method of aspect 5, wherein the step of merging the two first level storage files comprises:
- the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All valid keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block.
- a new file header is additionally written after the new index block to record metadata information of the merged new file.
- Aspect 7 The database management method of aspect 6, wherein the metadata information comprises one or more of the following:
- Aspect 8 The database management method of aspect 6, further comprising:
- the file includes a front file header located at a file header and a subsequent file header located at a tail of the file, and the content of the front file header and the subsequent file header are the same.
- Aspect 10 The database management method of aspect 8 or 9, further comprising:
- the new file is restored to the first storage file before the merge according to the file header of the first storage file;
- the file header of the first storage file is re-updated according to the new file header.
- Aspect 11 The database management method of any of aspects 1-9, further comprising:
- Aspect 12 The database management method of aspect 11, further comprising:
- the target value is read at a logical address in the data block in the target storage file according to the value corresponding to the key stored by the found leaf node.
- Aspect 13 The database management method of any of aspects 1-9, further comprising:
- a memory table and a read-only memory table in the internal memory are constructed.
- a database system comprising: an internal memory and an external memory, wherein
- the internal memory is used to write a plurality of pieces of data into a log file in an external memory.
- the external memory writes data in the log file to a memory table in an internal memory, wherein data written in the memory table is stored in an orderly manner according to a size of a key.
- the internal memory converts the memory table into a read-only memory table, and the external memory writes subsequent data in the log file to a new memory table.
- the internal memory writes data in the read-only memory table into an external accessor to obtain a first-level storage file.
- the external memory merges two or more first level storage files to obtain a second level storage file.
- Aspect 15 The database system of aspect 14, wherein
- the external storage specifies a primary file name of the first-level storage file by a first naming rule, and specifies a primary file name of the second-level storage file by a second naming rule, the first naming rule and the The second naming rule is different to distinguish whether the storage file is a first-level storage file or a second-level storage file based on the primary file name.
- the memory table is composed of a hash table
- the hash table includes one or more hash buckets
- each hash bucket corresponds to a jump table.
- Each piece of data in the memory table constitutes an element of the hop table, wherein the order of the elements in the hop table is ordered in order according to the size of the key.
- Aspect 17 The database system of aspect 14, wherein
- the database system of aspect 14, wherein the data structure of the storage file comprises:
- An index block configured to store, in a B+ tree, a key corresponding to the value, wherein a logical address of all the keys and their corresponding values in the data block are respectively recorded in a leaf node in the B+ tree, And all nodes constituting the B+ tree are physically stored continuously.
- the new index block being generated based on an index block of the first storage file and an index block of the second storage file, where the first storage file is All the keys in the index block and the index block of the second storage file and their corresponding values are respectively recorded in the new B+ tree in the data block of the first storage file and the logical address in the additional data block.
- a new file header is additionally written after the new index block to record metadata information of the merged new file.
- An electronic device comprising:
- a memory for storing executable instructions
- the invention can be a system, method and/or computer program product.
- the computer program product can comprise a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement various aspects of the present invention.
- the computer readable storage medium can be a tangible device that can hold and store the instructions used by the instruction execution device.
- the computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above.
- a computer readable storage medium as used herein is not to be interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (eg, a light pulse through a fiber optic cable), or through a wire The electrical signal transmitted.
- the computer readable program instructions described herein can be downloaded from a computer readable storage medium to various computing/processing devices or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in each computing/processing device .
- Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- the computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. carried out.
- the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
- the customized electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by utilizing state information of computer readable program instructions.
- Computer readable program instructions are executed to implement various aspects of the present invention.
- the computer readable program instructions can be provided to a general purpose computer, a special purpose computer, or a processor of other programmable data processing apparatus to produce a machine such that when executed by a processor of a computer or other programmable data processing apparatus Means for implementing the functions/acts specified in one or more of the blocks of the flowcharts and/or block diagrams.
- the computer readable program instructions can also be stored in a computer readable storage medium that causes the computer, programmable data processing device, and/or other device to operate in a particular manner, such that the computer readable medium storing the instructions includes An article of manufacture that includes instructions for implementing various aspects of the functions/acts recited in one or more of the flowcharts.
- the computer readable program instructions can also be loaded onto a computer, other programmable data processing device, or other device to perform a series of operational steps on a computer, other programmable data processing device or other device to produce a computer-implemented process.
- instructions executed on a computer, other programmable data processing apparatus, or other device implement the functions/acts recited in one or more of the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram can represent a module, a program segment, or a portion of an instruction that includes one or more components for implementing the specified logical functions.
- Executable instructions can also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé et un appareil de fusion de fichiers. Le procédé comprend : l'ajout à l'écriture d'un bloc de données annexé après un premier fichier, d'une valeur dans un bloc de données d'un second fichier en cours d'écriture à l'intérieur de ce dernier ; l'ajout à l'écriture d'un nouveau bloc d'index après le bloc de données annexé, du nouveau bloc d'index généré en fonction d'un bloc d'index du premier fichier et d'un bloc d'index du second fichier, les adresses logiques de toutes les clés dans le bloc d'index du premier fichier et du bloc d'index du second fichier et des valeurs correspondantes de ces derniers dans le bloc de données du premier fichier et le bloc de données annexé étant respectivement enregistrées dans des nœuds foliés d'un nouvel arbre B + ; et l'ajout à l'écriture d'un nouvel en-tête de fichier après le nouveau bloc d'index de façon à enregistrer des informations de métadonnées d'un nouveau fichier fusionné. Ainsi, lorsque deux fichiers sont fusionnés, une valeur d'un fichier doit simplement être directement annexée pour une écriture dans l'autre fichier, ce qui permet d'améliorer les performances d'écriture ; et des blocs d'index fusionnés constituent un nouvel arbre B +, des valeurs pouvant être aisément lues dans un grand fichier fusionné au moyen d'une recherche, ce qui permet d'améliorer les performances de lecture.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710040977.XA CN108319625B (zh) | 2017-01-17 | 2017-01-17 | 文件合并方法和装置 |
| CN201710031732.0 | 2017-01-17 | ||
| CN201710040977.X | 2017-01-17 | ||
| CN201710031732.0A CN108319602B (zh) | 2017-01-17 | 2017-01-17 | 数据库管理方法及数据库系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018133762A1 true WO2018133762A1 (fr) | 2018-07-26 |
Family
ID=62907812
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/072641 Ceased WO2018133762A1 (fr) | 2017-01-17 | 2018-01-15 | Procédé et appareil de fusion de fichiers |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018133762A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109800208A (zh) * | 2019-01-18 | 2019-05-24 | 湖南友道信息技术有限公司 | 网络溯源系统及其数据处理方法、计算机存储介质 |
| CN111538702A (zh) * | 2020-04-20 | 2020-08-14 | 北京京安佳新技术有限公司 | 一种基于Hadoop的海量小文件处理方法和设备 |
| CN111984600A (zh) * | 2020-08-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | 一种文件聚合方法、装置、设备及可读存储介质 |
| CN112307016A (zh) * | 2019-07-29 | 2021-02-02 | 华为技术有限公司 | 一种数据单元的合并方法及装置 |
| EP3910489A1 (fr) * | 2020-05-15 | 2021-11-17 | Vail Systems, Inc. | Système de gestion de données utilisant des tranches de données attribuées |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030120602A1 (en) * | 2000-01-12 | 2003-06-26 | June-Kee Jung | Method of combinning multimedia files |
| CN103678491A (zh) * | 2013-11-14 | 2014-03-26 | 东南大学 | 一种基于Hadoop中小文件优化和倒排索引的方法 |
| CN104133867A (zh) * | 2014-07-18 | 2014-11-05 | 中国科学院计算技术研究所 | 分布式顺序表片内二级索引方法及系统 |
| CN105117415A (zh) * | 2015-07-30 | 2015-12-02 | 西安交通大学 | 一种优化的ssd数据更新方法 |
| CN105868286A (zh) * | 2016-03-23 | 2016-08-17 | 中国科学院计算技术研究所 | 基于分布式文件系统小文件合并的并行追加方法及系统 |
| CN106326292A (zh) * | 2015-06-29 | 2017-01-11 | 杭州海康威视数字技术股份有限公司 | 数据结构和文件聚合、读取方法及装置 |
-
2018
- 2018-01-15 WO PCT/CN2018/072641 patent/WO2018133762A1/fr not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030120602A1 (en) * | 2000-01-12 | 2003-06-26 | June-Kee Jung | Method of combinning multimedia files |
| CN103678491A (zh) * | 2013-11-14 | 2014-03-26 | 东南大学 | 一种基于Hadoop中小文件优化和倒排索引的方法 |
| CN104133867A (zh) * | 2014-07-18 | 2014-11-05 | 中国科学院计算技术研究所 | 分布式顺序表片内二级索引方法及系统 |
| CN106326292A (zh) * | 2015-06-29 | 2017-01-11 | 杭州海康威视数字技术股份有限公司 | 数据结构和文件聚合、读取方法及装置 |
| CN105117415A (zh) * | 2015-07-30 | 2015-12-02 | 西安交通大学 | 一种优化的ssd数据更新方法 |
| CN105868286A (zh) * | 2016-03-23 | 2016-08-17 | 中国科学院计算技术研究所 | 基于分布式文件系统小文件合并的并行追加方法及系统 |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109800208A (zh) * | 2019-01-18 | 2019-05-24 | 湖南友道信息技术有限公司 | 网络溯源系统及其数据处理方法、计算机存储介质 |
| CN112307016A (zh) * | 2019-07-29 | 2021-02-02 | 华为技术有限公司 | 一种数据单元的合并方法及装置 |
| CN112307016B (zh) * | 2019-07-29 | 2022-08-26 | 华为技术有限公司 | 一种数据单元的合并方法及装置 |
| CN111538702A (zh) * | 2020-04-20 | 2020-08-14 | 北京京安佳新技术有限公司 | 一种基于Hadoop的海量小文件处理方法和设备 |
| EP3910489A1 (fr) * | 2020-05-15 | 2021-11-17 | Vail Systems, Inc. | Système de gestion de données utilisant des tranches de données attribuées |
| US12072869B2 (en) | 2020-05-15 | 2024-08-27 | Vail Systems, Inc. | Data management system using attributed data slices |
| US12455875B2 (en) | 2020-05-15 | 2025-10-28 | Vail Systems, Inc. | Data management system using attributed data slices |
| CN111984600A (zh) * | 2020-08-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | 一种文件聚合方法、装置、设备及可读存储介质 |
| CN111984600B (zh) * | 2020-08-27 | 2022-07-29 | 苏州浪潮智能科技有限公司 | 一种文件聚合方法、装置、设备及可读存储介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108319602B (zh) | 数据库管理方法及数据库系统 | |
| US9166866B2 (en) | Hydration and dehydration with placeholders | |
| US8738572B2 (en) | System and method for storing data streams in a distributed environment | |
| US10489367B2 (en) | Generating an index for a table in a database background | |
| US8706710B2 (en) | Methods for storing data streams in a distributed environment | |
| US8868926B2 (en) | Cryptographic hash database | |
| US9830324B2 (en) | Content based organization of file systems | |
| US10089338B2 (en) | Method and apparatus for object storage | |
| US9594674B1 (en) | Method and system for garbage collection of data storage systems using live segment records | |
| WO2018133762A1 (fr) | Procédé et appareil de fusion de fichiers | |
| US9715505B1 (en) | Method and system for maintaining persistent live segment records for garbage collection | |
| US20160321294A1 (en) | Distributed, Scalable Key-Value Store | |
| US20210081388A1 (en) | Methods, apparatuses and computer program products for managing metadata of storage object | |
| CN103270499B (zh) | 日志存储方法及系统 | |
| CN103973810B (zh) | 基于互联网协议ip盘的数据处理方法和装置 | |
| WO2020211236A1 (fr) | Procédé et appareil de résolution de conflit de lecture-écriture utilisant un arbre b+, et support de stockage | |
| CN108319625B (zh) | 文件合并方法和装置 | |
| WO2017166815A1 (fr) | Procédé et dispositif de mise à jour de données pour un système de base de données distribué | |
| CN114281779A (zh) | 数据同步方法、装置、计算机设备及存储介质 | |
| US20190384825A1 (en) | Method and device for data protection and computer readable storage medium | |
| JPWO2015087509A1 (ja) | 状態保存復元装置、状態保存復元方法、および、プログラム | |
| CN104133970A (zh) | 一种数据空间管理方法及装置 | |
| US20200341866A1 (en) | Methods, devices and computer program products for data backup and restoration | |
| KR101693108B1 (ko) | 읽기 성능 개선을 위한 티-트리 인덱스를 이용한 데이터베이스 읽기 방법 및 그 장치 | |
| CN111290714A (zh) | 数据读取方法和装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18741671 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC , EPO FORM 1205A DATED 24.10.19. |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18741671 Country of ref document: EP Kind code of ref document: A1 |