Disclosure of Invention
The present disclosure provides a data processing method, an electronic device, and a program product for block storage.
According to one aspect of the present disclosure, there is provided a data processing method for block storage, including:
Constructing a mapping index tree, wherein the mapping index tree is used for uniformly managing the mapping relation between a logical block address and a physical block address, and the mapping index tree is a multi-level cache architecture comprising a cache layer, a segmented memory table and a multi-level ordered table which are sequentially arranged;
Determining a storage priority of data to be written according to a received data writing request, wherein the storage priority is used for representing the importance degree of the data to be written;
writing the data to be written into the cache layer or the segmented memory table for storage according to the storage priority, and
And when the storage data in the segmented memory table or the multi-layer ordered table meets the merging migration rule, merging and migrating the storage data to the storage space of the next stage.
According to the technical scheme of one aspect, the mapping index tree is constructed to uniformly manage the mapping relation between the logical block address and the physical block address, and the mapping index tree is a multi-level cache architecture comprising a cache layer, a segmented memory table and a multi-level ordered table which are sequentially arranged. And then, determining the storage priority of the data to be written according to the received data writing request, wherein the storage priority is used for representing the importance degree of the data to be written, writing the data to be written into a cache layer or a segmented memory table for storage according to the storage priority, and carrying out merging migration on the storage data to the storage space of the next stage when the storage data in the segmented memory table or the multi-layer ordered table meets the merging migration rule.
Therefore, based on the mapping index tree, unified scheduling of cross-layer storage mapping can be realized, multi-layer redundancy management is reduced, and the I/O access efficiency is further improved.
And according to the storage priority of the data to be written, the data to be written is respectively written into the cache layer or the segmented memory table for storage, so that non-urgent writing is allowed to be backlogged on the cache layer, frequent access to the memory table is avoided, and the writing amplification effect is further reduced.
In at least one embodiment of the present disclosure, writing the data to be written into the cache layer or the segmented memory table for storage according to the storage priority includes:
when the storage priority is the first level, writing the data to be written into the cache layer for storage;
And when the storage priority is a second level, writing the data to be written into the segmented memory table for storage, wherein the importance degree of the second level is higher than that of the first level.
According to the technical scheme of the embodiment, frequent access to the memory table can be avoided, and the write amplification effect is reduced.
In at least one embodiment of the present disclosure, when the storage data in the segmented memory table or the multi-layer ordered table meets a merge migration rule, and the storage data is merged and migrated to a storage space of a next level, the method includes:
determining a first proportion of the storage data in the segmented memory table to the total storable capacity of the segmented memory table;
And merging and migrating the storage data in the segmented memory table with the first proportion being greater than or equal to a first capacity threshold value into the multi-layer ordered table.
According to the technical scheme of the embodiment, the merging and migration operation can be independently triggered when a single segmented memory table is full, and other segmented memory tables can continuously receive new write-in requests, so that blocking is avoided.
In at least one embodiment of the present disclosure, the method further comprises:
acquiring system load information;
and determining the first capacity threshold according to the system load information.
According to the technical scheme of the embodiment, the first capacity threshold value can be dynamically adjusted based on the real-time load of the system, so that write amplification is reduced, I/O oscillation is reduced, and system stability is improved.
In at least one embodiment of the present disclosure, when the storage data in the segmented memory table or the multi-layer ordered table meets a merge migration rule, and the storage data is merged and migrated to a storage space of a next level, the method further includes:
For the multi-layer ordered list, when the number of ordered lists in the L n level reaches a number threshold and the second proportion of the storage data in the ordered list in the L n level to the total storable capacity of the storage data is larger than or equal to a second capacity threshold, at least part of ordered lists in the L n level are selected as ordered lists to be combined, and n is an integer larger than or equal to 0;
Merging and sorting the stored data of the L n -level ordered list to be merged with the stored data of the L n+1 -level ordered list to be merged according to the sequence of the logical block addresses to obtain a temporary ordered list, wherein the L n+1 -level ordered list to be merged is an ordered list with a key value range crossed with the key value range of the L n -level ordered list to be merged;
The temporary ordered list is written into the L n+1 hierarchy and the ordered list to be merged in the L n hierarchy and the L n+1 hierarchy is discarded.
According to the technical scheme of the embodiment, the repeated data storage can be reduced by merging the ordered tables with the crossed key value ranges, so that the space utilization rate is improved. And the merged ordered list is continuously distributed according to the order of the logical block addresses, so that the query efficiency can be improved.
In at least one embodiment of the present disclosure, the method further comprises:
determining the corresponding data failure proportion of the ordered list according to the failure number of key value pairs in the ordered list and the total number of key value pairs;
And when the data failure proportion reaches a failure proportion threshold value, merging and migrating the stored data in the ordered list to the ordered list of the next level.
According to the technical scheme of the embodiment, resources can be effectively recovered, and the cost for reading invalid records can be reduced. Partial merging ensures that the multi-layer ordered table is not occupied by large amounts of invalid data or outdated versions for a long period of time.
In at least one embodiment of the present disclosure, the mapping index tree is integrated in the kernel block layer to eliminate duplicate log records and mapping management.
According to the technical scheme of the embodiment, repeated log record and mapping management among the original layers can be effectively eliminated, and storage resource waste and write amplification caused by redundant index and frequent update are avoided.
In at least one embodiment of the present disclosure, the method further comprises:
determining a logic block address to be queried according to the received data query request;
according to the logical block addresses, inquiring in each storage space according to the sequence of the cache layer, the segmented memory table and the multi-layer ordered table;
And feeding back the physical block address corresponding to the logical block address according to the query result.
According to the technical scheme of the embodiment, the data query is performed based on the mapping index tree, so that the query efficiency can be improved.
According to another aspect of the present disclosure, there is provided a data processing apparatus for block storage, comprising:
the building module is used for building a mapping index tree, wherein the mapping index tree is used for uniformly managing a uniform mapping relation between a logical block address and a physical block address, and the mapping index tree is a multi-level cache architecture comprising a cache layer, a segmented memory table and a multi-level ordered table which are sequentially arranged;
the determining module is used for determining the storage priority of the data to be written according to the received data writing request, and the storage priority is used for representing the importance degree of the data to be written;
A writing module for writing the data to be written into the buffer layer or the segmented memory table for storage according to the storage priority, and
And the processing module is used for carrying out merging migration on the storage data to the storage space of the next stage when the storage data in the segmented memory table or the multi-layer ordered table meets the merging migration rule.
According to another aspect of the present disclosure, there is provided an electronic device including a memory storing execution instructions, and a processor executing the execution instructions stored by the memory, such that the processor performs the data processing method for block storage of any of the embodiments of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a readable storage medium having stored therein execution instructions which when executed by a processor are to implement a data processing method for block storage of any of the embodiments of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the data processing method for block storage of any of the embodiments of the present disclosure.
Detailed Description
The disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific examples described herein are intended to be illustrative of the relevant matter rather than limiting of the present disclosure. It should be further noted that, for convenience of description, only a portion relevant to the present disclosure is shown in the drawings.
In addition, embodiments of the present disclosure and features of the embodiments may be combined with each other without conflict. The technical aspects of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Block storage is a storage technology for managing and reading/writing in a unit of a fixed-size data Block (Block). The core mechanism is to realize the direct access to the storage medium through the mapping relation between the logical block address (Logical Block Address, LBA) and the physical block address (Physical Block Address, PBA). The block storage is widely applied to the fields of high-performance computing (HPC), large-scale model training, virtualization platform and the like by virtue of the high-efficiency data management capability, good compatibility and flexible deployment mode.
By performing data management and reading and writing in units of blocks (blocks), the Block storage can provide a unified storage interface between different systems and applications, thereby reducing the adaptation overhead of a file system or a distributed storage layer. In a High Performance Computing (HPC) scenario, block storage is often supported as an underlying layer of parallel file systems (e.g., lustre, GPFS), providing high throughput data access capability for large-scale scientific computing. In large-scale deep learning model training, massive parameter updating and high-concurrency I/O requests place extreme demands on the storage system for low latency and high throughput. In virtualized and containerized environments, efficient management and isolation of block devices becomes particularly critical due to multi-tenant requirements and elastic expansion of computing resources.
Meanwhile, multi-layer virtualization technologies (such as KVM and Docker) are widely applied to cloud computing and large-scale data centers, and provide resource isolation, elastic expansion and efficient management capabilities. However, multi-layer virtualization introduces multiple functionally similar but independent data management layers, including host operating systems, block device mapping mechanisms inside virtual machines or containers, and Flash Translation Layer (FTL) of the underlying storage device (e.g., SSD). These layers each maintain an independent logical block number LBA to PBA mapping, resulting in increased redundancy in data management, and causing problems such as write amplification, memory fragmentation, I/O performance degradation, etc., thereby affecting the overall memory efficiency and life of the system.
Therefore, the present disclosure proposes a technical solution in which unified scheduling of cross-layer memory mapping can be implemented based on a mapping index tree, reducing multi-layer redundancy management, and further improving I/O access efficiency. And according to the storage priority of the data to be written, the data to be written is respectively written into the cache layer or the segmented memory table for storage, so that non-urgent writing is allowed to be backlogged on the cache layer, frequent access to the memory table is avoided, and the writing amplification effect is further reduced.
FIG. 1 shows a flow diagram of a data processing method for block storage according to one embodiment of the present disclosure. The method shown in fig. 1 at least includes steps S110 to S140, and is described in detail as follows:
in step S110, a mapping index tree is constructed, where the mapping index tree is used to uniformly manage a uniform mapping relationship between logical block addresses and physical block addresses, and the mapping index tree is a multi-level cache architecture including a cache layer, a segmented memory table, and a multi-level ordered table that are sequentially arranged.
In this embodiment, the mapping index Tree (MAPPING TREE, M-Tree) may be a multi-level cache architecture integrated within an operating system for uniformly managing the mapping between logical block addresses to physical block addresses. The core structure comprises a cache layer, a segmented memory layer and a plurality of layers of ordered tables which are sequentially arranged.
Specifically, as shown in fig. 2, the buffer layer may be implemented by using a ring-free buffer area, and support multi-thread concurrent writing, so as to temporarily store a newly generated LBA to PBA mapping relationship or a pending write request. That is, when a write request is received, the LBA to PBA map may be cached in the cache layer, thereby avoiding frequent I/O operations caused by direct write persistence structures. Therefore, the direct impact on the partitioned memory table can be reduced by aggregating small-scale random write requests through the cache layer, the write delay is remarkably reduced, and the throughput in a high concurrency scene is improved.
The segmented memory table maps LBAs to a plurality of independent segments through hash shards, and each segment can adopt a jump table structure to maintain ordered LBA-PBA mapping. In an example, hash operation can be performed on the LBA, modulo is performed on the LBA, the LBA is distributed to specific segments, the segments in the segmented memory table are not interfered with each other, concurrency is improved, meanwhile, load balancing can be achieved through hash segmentation, concurrent writing and inquiry are achieved, and global lock competition is avoided.
The multi-layer ordered table consists of multiple layers of persistence structures (e.g., sstables) ordered by LBA, each of which may be accompanied by Bloom Filters (BF) to accelerate queries. Data is flushed from the segmented memory table to the highest level (i.e., L 0 level) of the multi-level ordered table to form persistent storage. And when the number of ordered tables or the proportion of invalid data in a certain layer exceeds a threshold (please refer to the following), the merging (Compaction) operation can be triggered, namely, integrating, sorting and sinking the adjacent layers to the next layer, and cleaning the invalid record. In this way, through hierarchical storage and dynamic merging, redundant data writing (write amplification reduction) is reduced, and storage space utilization is optimized, and then, a bloom filter can quickly filter invalid query paths, so that retrieval efficiency is improved, and furthermore, partial merging strategies reduce I/O resource occupation, and alleviate performance jitter.
In some embodiments of the present disclosure, the mapping index tree may be integrated in a kernel block layer of an operating system, where, as shown in fig. 3, an important middle layer of processing block device I/O requests in the operating system, read-write requests submitted by an upper layer (such as a file system and a user mode application) are uniformly accessed into the kernel block layer, and after queuing, scheduling and forwarding, the data access operation is finally completed. By providing a unified abstract interface and an adaptation function, the kernel block layer ensures stable and efficient access of upper-layer applications to the storage device.
Compared with the mapping index tree placed at a higher layer (file system or user mode) or a lower layer (such as SSD firmware FTL), the kernel block layer has the advantage of realizing cross-layer unified mapping scheduling due to the key nodes positioned in the data flow path, and can cover the requirement of multi-layer data management of hosts, virtual machines/containers and storage media. Therefore, the mapping index tree is arranged on the kernel block layer, repeated log record and mapping management among the original layers can be effectively eliminated, and storage resource waste and write amplification caused by redundant index and frequent update are avoided.
Meanwhile, the unified mapping tree can be used for more efficiently identifying hot spot data, layering cold data and locally merging and optimizing data, so that load self-adaptive adjustment under a high concurrency scene is realized, and performance stability and equipment service life are better considered.
With continued reference to fig. 1, in step S120, a storage priority of the data to be written is determined according to the received data writing request, where the storage priority is used to characterize the importance level of the data to be written.
Wherein the storage priority may be used to characterize the importance level or urgency level of the data to be written, in an example, a person skilled in the art may assign corresponding storage priorities to different data in advance according to the importance level of the data, e.g., a chunk sequential write, metadata update, display synchronization request, crash consistency guarantee operation (e.g., transaction log commit), etc., may be determined as a high priority level. Small-scale random writes (e.g., container temporary logs, edge computation intermediate results, etc.) are determined to be low priority to allow for deferred writes to aggregate bulk operations.
In this embodiment, after receiving the data writing request, the data writing request may be parsed, so as to determine a storage priority corresponding to the data to be written. In one example, a determination may be made based on metadata in the data write request (e.g., I/O type, data size, source application, etc.), to determine a corresponding storage priority for the data to be written. For example, if the bi_opf field contained in the struct bio in the data write request is shown as req_meta or req_sync, which indicates that the write belongs to metadata, log write or explicit synchronization request, is a write with higher urgency, it may be marked as high priority, according to the received data write request.
In step S130, the data to be written is written into the cache layer or the segmented memory table for storage according to the storage priority.
In this embodiment, based on the determined storage priority, the low-priority data to be written may be temporarily stored in the cache layer, while the high-priority data to be written is directly written into the segmented memory table. Therefore, the writing path is dynamically selected according to the storage priority of the data to be written, and the efficient allocation of the I/O resources and the optimization of the system performance can be realized.
In step S140, when the storage data in the segmented memory table or the multi-layer ordered table meets the merge migration rule, the storage data is merged and migrated to the storage space of the next stage.
The merge migration rules may be decision rules for triggering data merge, which are predetermined by those skilled in the art based on previous experience. In one example, the merge migration rule may be based on the system load and the amount of stored data as a basis for determining whether to turn on data merge.
In this embodiment, the merge migration of data may occur at two locations, one is the merge migration of the segment inner layer table to the highest layer (i.e., L 0 layers) of the multi-layer ordered table, and the other is the merge migration of the L n layer ordered table to the L n+1 layer ordered table in the multi-layer ordered table.
In one example, the segmented memory table is written into the L 0 layers of the multi-layer ordered table in sequence to form the ordered table, the number of ordered tables in each layer is limited, merging (Compaction) is triggered after the number of ordered tables in each layer exceeds the number of ordered tables, a new file is generated after merging and is placed in the next layer, and the like until the data is thoroughly cooled or is not updated frequently. As such, newer or hot data tends to stay at higher layers (e.g., layers L 0, L 1), while cold data sinks to deeper layers.
It should be noted that, for merging the segmented memory table into the L 0 layer of the multi-layer ordered table, the purpose is to quickly release the cache space, so that new data can quickly enter persistent storage. Therefore, when the capacity of a certain segment memory table reaches a certain threshold value, the merging migration of the segment memory table is triggered, and other segments can continuously accept new requests, so that the merging migration process is prevented from blocking the whole writing throughput. The method can avoid blocking all writing after a single large memory table reaches the upper limit by means of segment refreshing, and improves expandability and throughput in a high concurrency environment.
For the merging from the L n layer to the L n+1 layer in the multi-layer ordered list, the space is mainly recycled and invalid data is removed, so that too much adjustment space is not needed. Based on the merge migration rule, the hot data updated by high frequency can be reserved at a high layer (such as an L 0 layer and an L 1 layer), and the cold data accessed by low frequency is sunk to a deep layer (such as an L 2 layer and an L n layer), so that the mixed load scene is adapted.
Thus, based on the embodiment shown in fig. 1, the mapping index tree is constructed to uniformly manage the mapping relationship between the logical block address and the physical block address, where the mapping index tree is a multi-level cache architecture including a cache layer, a segmented memory table and a multi-level ordered table that are sequentially arranged. And then, determining the storage priority of the data to be written according to the received data writing request, wherein the storage priority is used for representing the importance degree of the data to be written, writing the data to be written into a cache layer or a segmented memory table for storage according to the storage priority, and carrying out merging migration on the storage data to the storage space of the next stage when the storage data in the segmented memory table or the multi-layer ordered table meets the merging migration rule. Therefore, based on the mapping index tree, unified scheduling of cross-layer storage mapping can be realized, multi-layer redundancy management is reduced, and the I/O access efficiency is further improved. And according to the storage priority of the data to be written, the data to be written is respectively written into the cache layer or the segmented memory table for storage, so that non-urgent writing is allowed to be backlogged on the cache layer, frequent access to the memory table is avoided, and the writing amplification effect is further reduced.
Fig. 4 shows a flow diagram of step S130 in a data processing method for block storage according to one embodiment of the present disclosure. As shown in fig. 4, the step S130 includes at least steps S131 to S132, which will be described in detail below.
In step S131, when the storage priority is the first level, the data to be written is written into the cache layer for storage.
In this embodiment, the first level may be a low priority level, which is applicable to write requests that allow for short delays but require high throughput, such as container temporary logs, edge computation intermediate results, etc. After determining that the storage priority of the data to be written is the first level, the data to be written can be directly distributed to the cache layer for storage. In one example, the cache layer may locally aggregate data in a logical block address range, merging multiple small random writes into a bulk operation, thereby reducing the number of single I/Os.
In step S132, when the storage priority is the second level, the data to be written is written into the segmented memory table for storage, where the importance of the second level is higher than that of the first level.
In this embodiment, the second level is of higher importance than the first level, i.e. the second level may be a high priority level. It should be appreciated that the storage priority is determined as the second level of data to be written, indicating that it has a high importance or strong real-time requirement, and that a fast landing should be guaranteed with priority. Therefore, the data to be written with the second level of storage priority can be directly written into the segmented memory table for storage, so that the instantaneity and the reliability of the high-importance data are ensured.
In one example, when writing data to be written to the segment memory table, a hash operation may be performed based on the logical block address to determine a corresponding segment memory table number (e.g., segment memory table 3). Then, the LBA to PBA mapping to be written is inserted into the jump table structure in the segment memory table corresponding to the segment memory table number in ascending order, so that the query efficiency is ensured.
Therefore, the data of the first level is written into the cache layer, the data of the second level is directly written into the segmented memory table by bypassing the cache layer, the non-urgent writing is allowed to be backlogged on the cache layer, frequent access to the memory table is avoided, and the writing amplification effect is further reduced.
Fig. 5 shows a flow diagram of step S140 in a data processing method for block storage according to one embodiment of the present disclosure. As shown in fig. 5, the step S140 includes at least steps S141 to S142, which will be described in detail below.
In step S141, a first proportion of the storage data in the segmented memory table to the total storable capacity of the segmented memory table is determined.
In this embodiment, for each segment memory table, the data amount of the stored data in the segment memory table and the corresponding storable total capacity of the segment memory table may be obtained. Then, the corresponding first proportion can be obtained according to the data quantity of the stored data in the segmented memory table divided by the total storable capacity of the segmented memory table. For example, if the total storable capacity of the segmented memory table is 1000 mapping records, 800 mapping records are currently stored, the corresponding first ratio is 80%.
In an example, the total storable capacity of each segmented memory table may be a fixed allocation, or may be a proportional allocation (i.e., dynamic capacity) based on the total memory of the system, which is not particularly limited by the present disclosure.
In step S142, the stored data in the segmented memory table with the first ratio greater than or equal to the first capacity threshold is merged and migrated into the multi-layer ordered table.
In this embodiment, the first capacity threshold may be a preset proportional threshold for triggering the merging migration of the stored data in the segmented memory table, for example, the first capacity threshold may be 70% or 80% or the like.
After determining a first ratio corresponding to the segmented memory table, the first ratio may be compared with a first capacity threshold, and if the first ratio threshold is greater than or equal to the first capacity threshold, a merge migration operation of the segmented memory table may be triggered. I.e., merge migration of the stored data in the segmented memory table to the highest level (i.e., L 0 level) in the multi-level ordered table.
Therefore, when the first proportion corresponding to the single segmented memory table reaches the first capacity threshold, the merging and migration operation of the stored data in the segmented memory table can be triggered, and other segmented memory tables can continue to receive new writing requests, so that all writing is prevented from being blocked after the single large memory table reaches the upper limit, and the expandability and throughput capacity under the high concurrency environment are improved.
Based on the foregoing embodiments, fig. 6 shows a schematic flow chart of determining a first capacity threshold value further included in the data processing method for block storage according to an embodiment of the present disclosure. As shown in fig. 6, determining the first capacity threshold includes at least steps S143 to S144, which will be described in detail below.
In step S143, system load information is acquired.
The system load information may be parameter information for characterizing a system resource usage state, which may include, but is not limited to, at least one of a CPU utilization rate (e.g., a percentage of a CPU in a unit time that is not in an idle state), an I/O queue depth (I/O request number to be processed by a storage device, measuring a storage layer congestion level), and a memory usage rate (a ratio of occupied memory to total memory capacity, and identifying a memory resource tension).
In this embodiment, the system load information described above may be periodically collected for subsequent determination of the first capacity threshold.
In step S144, the first capacity threshold is determined according to the system load information.
In this embodiment, the first capacity threshold corresponding to the segmented memory table may be dynamically adjusted according to the collected system load information. The larger the system load pressure is, the larger the first capacity threshold is, so that the merging migration process can be delayed, and the read-write priority of the foreground is ensured. Otherwise, if the system load pressure is smaller, the first capacity threshold value is smaller, so that the memory is quickly cleaned, and the speed of entering the persistent storage of new data is increased.
In an embodiment, based on the collected system load information, a system overall load level may be determined, e.g., a weighted calculation may be performed based on the collected system load information, thereby obtaining a corresponding load score. Then, the load score is compared with a preset load threshold value, and a corresponding first capacity threshold value is determined according to a comparison result.
For example, when the system load score is less than 30%, the first capacity threshold for triggering merging migration may be reduced to 60%, so as to quickly clean up the memory and speed up new data entering persistent storage. When the system load score is >80%, the first capacity threshold that triggers the merge migration may be raised to 90%, delaying the merge migration process to ensure the foreground I/O priority.
Based on the foregoing embodiments, fig. 7 is a flowchart of a method for processing data for block storage according to an embodiment of the present disclosure to trigger a merge migration operation of a segmented memory table. As shown in fig. 7, the triggering of the merge migration operation of the segmented memory table at least includes the following steps:
in step S710, system load information is periodically acquired.
That is, load information of the system is collected according to a set period, and basis is provided for subsequent threshold adjustment and merging trigger decision.
In step S720, it is determined whether the system load is less than 30%, that is, the system load level is quantified according to the collected system load information, and it is determined whether the current system load is lower than 30%.
In step S730, if the system load is lower than 30%, the first capacity threshold is adjusted to 60%.
In step S740, if the system load is greater than or equal to 30%, it is further determined whether the system load is greater than 80%.
In step S750, if the system load is less than or equal to 80%, the first capacity threshold is adjusted to 70%.
In step S760, if the system load is greater than 80%, the first capacity threshold is adjusted to 90%.
In step S770, according to the determined first capacity threshold, it is checked whether the first ratio corresponding to a certain segmented memory table reaches the first capacity threshold.
In step S780, if the first ratio corresponding to the segmented memory table reaches the first capacity threshold (i.e., is greater than or equal to the first capacity threshold), the merge migration is triggered to the L 0 layer of the multi-layer ordered table.
In step S790, if the first ratio corresponding to the segmented memory table does not reach the first capacity threshold, waiting for the next period, and restarting periodically collecting the system load information, so as to perform subsequent judgment and operation.
It should be noted that the foregoing numbers are merely exemplary, and those skilled in the art may set the corresponding first capacity threshold for different load situations according to the specific implementation requirements, which is not particularly limited in this disclosure.
FIG. 8 illustrates a flow diagram of a merge migration operation triggering a multi-layer ordered list further included in step S140 in a data processing method for block storage according to one embodiment of the present disclosure. As shown in fig. 8, the merging and migration operation for triggering the multi-layer ordered list at least includes steps S145 to S147, which will be described in detail below.
In step S145, for the multi-layer ordered list, when the number of ordered lists in the L n level reaches the number threshold and the second proportion of the stored data in the ordered list in the L n level to the total storable capacity of the stored data is greater than or equal to the second capacity threshold, at least a part of ordered lists in the L n level are selected as ordered lists to be combined, and n is an integer greater than or equal to 0.
The number threshold may be a preset upper limit of the number of ordered tables in the L n layer used to determine whether to trigger merge migration. It should be understood that the number thresholds for different layers may be different, for example, 1 for an ordered table of the L 0 level, 2 for an ordered table of the L 1 level, and so on.
The second capacity threshold is a second lower proportional limit for the ordered list to trigger the merge migration operation.
In this embodiment, the system may periodically count the number of ordered tables in the L n level, and may enter the capacity fraction checking stage if the corresponding number threshold is reached (i.e., greater than or equal to the number threshold). In this stage, the system may traverse all of the sorted lists in the L n hierarchy, calculating a second ratio for each sorted list. The second ratio is the ratio of the stored data in the ordered list to the total capacity that can be stored by itself.
If the second proportion corresponding to all ordered tables in the L n level is greater than a second capacity threshold (e.g., 70%), then a merge migration operation of the ordered tables is triggered. A portion may be selected from the L n -level sorted list as the sorted list to be merged. For example, 1 or 2 ordered tables may be randomly selected from the L n hierarchy as ordered tables to be merged for merge migration.
In step S146, the stored data of the L n -level ordered list to be merged and the stored data of the L n+1 -level ordered list to be merged are subjected to merging and sorting according to the order of the logical block addresses to obtain a temporary ordered list, where the L n+1 -level ordered list to be merged is an ordered list with a key value range crossing the key value range of the L n -level ordered list to be merged.
In this embodiment, an ordered list with a key range overlapping the key range of the ordered list to be merged of the L n hierarchy may be selected in the L n+1 hierarchy as the ordered list to be merged of the L n+1 hierarchy, e.g., the key range of the ordered list to be merged of the L n hierarchy covers LBA 0x1000-0x2000, and the key range of the ordered list to be merged of the L n+1 hierarchy covers LBA 0x1800-0x2800.
After determining the to-be-merged ordered list of the L n level and the L n+1 level, merging and sorting the stored data in the to-be-merged ordered list of the two levels according to the order of the addresses of the logic blocks to obtain a temporary ordered list.
In step S147, the temporary ordered list is written into the L n+1 hierarchy and the ordered list to be merged in the L n hierarchy and the L n+1 hierarchy is discarded.
In this embodiment, after the temporary ordered list is obtained, the temporary ordered list may be written into the L n+1 hierarchy, and several ordered lists of the L n hierarchy and the L n+1 hierarchy (i.e., the ordered lists to be merged determined in the two hierarchies described above) designed for the previous merge migration operation may be discarded.
Thus, by merging ordered tables with crossing key value ranges, repeated data storage can be reduced, and space utilization is improved. And the merged ordered list is continuously distributed according to the order of the logical block addresses, so that the query efficiency can be improved.
Based on the foregoing embodiments, fig. 9 is a schematic flow chart of a partial merge migration operation that triggers a multi-layer ordered list further included in step S140 in a data processing method for block storage according to an embodiment of the present disclosure. As shown in fig. 9, the triggering of the partial merge migration operation of the multi-layer ordered list at least includes steps S148 to S149, which are described in detail below.
In step S148, according to the number of failures of the key value pairs and the total number of key value pairs in the ordered list, a data failure ratio corresponding to the ordered list is determined.
In this embodiment, each ordered list may maintain a data failure ratio (number of failures/total number of key pairs), where the number of failures of the key pairs is obtained in the previous merging process, each latest (LBA, PBA) written from the L n layer is the latest data, and the key pair corresponding to the original LBA in the L n+1 is recorded as invalid, so as to obtain the number of failures of each ordered list, and then, the total number of key pairs is combined, so as to calculate the data failure ratio.
In step S149, when the data failure rate reaches a failure rate threshold, the stored data in the ordered list is merged and migrated to the ordered list of the next hierarchy.
In this embodiment, one skilled in the art may preset a failure rate threshold (e.g., 50%) as an upper limit for the data failure rate that triggers the partial merge migration operation based on previous experience. The system may determine the data failure proportion of each ordered list according to a certain period, compare the data failure proportion with the failure proportion threshold, and if the data failure proportion of a certain ordered list reaches the failure proportion threshold (i.e. is greater than or equal to the failure proportion threshold), perform a merging operation on the stored data of the ordered list separately to the lower layer, where the merging operation step may refer to the foregoing embodiment and will not be described herein again.
Therefore, by detecting and timely arranging the ordered list with high failure ratio, resources can be effectively recovered, and the cost for reading invalid records can be reduced. Partial merging ensures that the multi-layer ordered table is not occupied by large amounts of invalid data or outdated versions for a long period of time. It should be appreciated that in a multi-layer ordered list, the smaller the data capacity of the upper layer, the more "fresh" the data it stores, and the smaller the corresponding proportion of data failures. By means of the partial merging and migration mode, space can be made in the middle layer of the multi-layer ordered list, and multi-layer linkage merging caused when the upper layer is merged and migrated downwards from the middle layer is avoided. Thereby maintaining the overall performance and storage efficiency of the system in high concurrency and frequent update scenarios.
Specifically, for high concurrency small random writing and mixed load scenarios, the embodiment of the disclosure provides an adaptive merging mechanism, which can dynamically adjust a merging strategy and combine with a partial merging mechanism, so that write amplification can be reduced, I/O oscillation can be reduced, and storage system stability can be improved.
FIG. 10 illustrates a flow diagram of a data query further included in a data processing method for block storage according to one embodiment of the present disclosure. As shown in fig. 10, the data query includes at least steps S150 to S170, which will be described in detail below.
In step S150, the logical block address to be queried is determined according to the received data query request.
In this embodiment, the data query request may be information for requesting to query a physical block address corresponding to a certain logical block address. After receiving the data query request, the data query request may be parsed to obtain a logical block address to be queried.
In step S160, according to the logical block address, the query is performed in each storage space according to the sequence of the cache layer, the segmented memory table and the multi-layer ordered table.
In this embodiment, during the query, the query may be performed in each storage space according to the sequence of the cache layer, the segmented memory table, and the multi-layer ordered table. That is, the cache layer is queried first, if hit, the result is fed back, and if miss, the segmented memory table is queried. Before the segmented memory table is queried, the segment corresponding to the physical block address corresponding to the logical block address to be queried can be determined based on the logical block address to be queried, so that the segmented memory table corresponding to the segment is determined for query, and unnecessary query is reduced.
If hit in the segmented memory table, the result is fed back, and if miss, the multi-layer ordered table is queried layer by layer. And, when querying the multi-layer ordered list, the bloom filter can be utilized to skip the non-target list, thereby accelerating the querying speed.
In step S170, a physical block address corresponding to the logical block address is fed back according to the query result.
In this embodiment, after the query is performed in the mapping index tree, the physical block address corresponding to the queried logical block address may be fed back to the user or the system based on the query result. Thus, the query efficiency can be improved, and the query delay can be reduced.
Based on the technical solutions of the above embodiments, a specific application scenario of the embodiments of the present application is described below:
The present disclosure proposes a method for organizing high-efficiency block storage data suitable for a multi-layer virtualized environment, which is called ADAPTIDISK, and has the core objective of realizing high-efficiency mapping index management, dynamic data merging optimization, cache optimization and I/O resource scheduling at the block layer, so as to reduce storage management redundancy and improve the overall performance of a storage system. As shown in fig. 11, the ADAPTIDISK mainly includes three parts, namely an M-Tree mapping index, an adaptive merge scheduler, and a multi-level cache architecture, which are explained in detail below.
User state/virtualization layer 1110, which is an upper layer application or virtualization management tool in the virtualized environment, is responsible for initiating read and write requests and is an entry for the entire block storage system.
The kernel block layer 1120 is used as a key middle layer for processing the I/O request of the block device in the operating system, and is used for receiving, queuing, scheduling and forwarding the read-write request submitted by the upper layer (such as a file system and a user mode application), so that the upper layer application can access the storage device stably and efficiently. And provides a foundation for lower mapping index management, dynamic data merging optimization, cache optimization and I/O resource scheduling.
The M-Tree mapping index 1130 is integrated in the kernel block layer and used for managing the mapping from the logical block number (LBA) to the physical block number (PBA), and realizing unified scheduling of cross-layer memory mapping so as to reduce multi-layer redundancy management and improve the I/O access efficiency.
The buffer layer (ring buffer) 1131 adopts a ring-free buffer to support multi-thread concurrent writing, is used for receiving the latest LBA-PBA mapping or writing request, temporarily aggregates a large number of small random writing, reduces the number of times of direct writing into the segmented memory table, relieves the impact pressure of foreground random writing, and writes into the segmented memory table when the buffer layer is full.
The segmented memory table (jump table) 1132 is realized by using the jump table, is designed into a plurality of sections, and is used for determining which section to place by carrying out hash function operation and modular operation on LBAs, each section of the segmented memory table is not interfered with each other, the concurrency is improved, and meanwhile, the load can be balanced by hash fragments, and the range inquiry is facilitated.
A multi-layer ordered list (SSTable) 1133 is organized in a hierarchy that persists the mapping relationships that are flushed from the segmented memory table, the data in each layer is ordered by LBA, and the ordered list of each layer contains a corresponding Bloom Filter (Bloom Filter) to locate the data quickly.
The self-adaptive merging scheduler 1140 dynamically adjusts the merging strategy aiming at high concurrency small random writing and mixed load scenes to reduce the write amplification and reduce the I/O oscillation and improve the stability of the storage system.
And the self-adaptive trigger strategy 1141 is used for periodically evaluating the memory utilization rate and the real-time load of the system by collecting the CPU load rate, the I/O queue depth and the memory utilization condition and dynamically modifying the trigger threshold according to the real-time condition.
Partial merge policy 1142. When the L n layer merges into the L n+1 layer in the multi-layer ordered table, it is mainly to reclaim space and remove invalid data. When a layer of ordered tables is full or the failure data ratio is too high, it is triggered. That is, for each ordered list in the multi-layer ordered list, the failure proportion of the key value pair is stored for each ordered list, and if the failure proportion reaches 50%, the ordered list is singly combined downwards.
The multi-level cache structure 1150 comprises a hierarchical structure of a buffer area, a segmented memory table and a multi-layer ordered table, and combines cache optimization, intelligent write strategy and crash consistency management to improve the response speed and stability of the storage system.
The write strategy 1151 is that when writing, only writing to the ring buffer area of the M-Tree is needed, and the background uses the work queue thread to periodically detect the buffer state, so that when the system is idle or the buffer is close to saturation, the aggregated data is batched and transferred to the segmented memory table, and the cost caused by frequent small-scale writing is reduced. Special cases, i.e. high priority or urgent requests, large order writes, bypass the partition buffer and write directly into its memory table segment.
Based on the system architecture shown in fig. 11, fig. 12 shows a flow diagram of a data processing method for block storage according to another embodiment of the present disclosure. As shown in fig. 11, the data processing method at least includes steps S1210 to S1290, which will be described in detail below.
In step S1210, it is determined whether the data is of high priority or not according to the received data writing request.
In step S1220, if the data is of high priority, the data is hashed to determine the number of the segmented memory table to be written.
In step S1230, the data is directly inserted into the segment memory table corresponding to the determined segment memory table sequence number.
In step S1240, if it is not high priority, the data is written into the ring buffer.
In step S1250, it is determined whether the data in the buffer is full.
In step S1260, if the data in the buffer is full, the data in the buffer is batch transferred to the segment memory table.
In step S1270, it is determined whether the data amount in the segment memory table has reached a preset threshold.
In step S1280, if the amount of data in the segmented memory table has reached the preset threshold, a flush operation is triggered to persist the data to the disk or the next-level storage structure.
In step S1290, if the data amount in the segmented memory table does not reach the preset threshold, the relevant lock resource is released, and the current writing process is completed.
Therefore, based on the embodiment shown in fig. 12, the multi-level cache architecture and the adaptive scheduling mechanism provided by the embodiment of the disclosure are fully utilized, so that efficient data writing and storage management are realized.
The following describes apparatus embodiments of the present disclosure that may be used to perform the data processing method for block storage in the above-described embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the data processing method for block storage described in the present disclosure.
FIG. 13 illustrates a block diagram of a data processing apparatus for block storage according to one embodiment of the present disclosure. For ease of illustration, certain steps of the above method will be described with respect to modules. It should be understood that the respective module performing one or several steps of the above-described method may be one or more hardware modules specifically configured to perform the respective steps, or be implemented by a processor configured to perform the respective steps, or be stored in a computer-readable medium for implementation by a processor, or be implemented by some combination.
Referring to fig. 13, a data processing apparatus for block storage according to one embodiment of the present disclosure includes a construction module 1310, a determination module 1320, a writing module 1330, and a processing module 1340.
The building module 1310 is configured to build a mapping index tree, where the mapping index tree is used to uniformly manage a uniform mapping relationship between a logical block address and a physical block address, and the mapping index tree is a multi-level cache architecture including a cache layer, a segmented memory table and a multi-level ordered table that are sequentially arranged;
the determining module 1320 is configured to determine, according to the received data writing request, a storage priority of the data to be written, where the storage priority is used to characterize an importance level of the data to be written;
the writing module 1330 is configured to write the data to be written into the cache layer or the segmented memory table for storage according to the storage priority level, and
The processing module 1340 is configured to, when the storage data in the segmented memory table or the multi-layer ordered table meets a merge migration rule, merge migrate the storage data to a storage space of a next stage.
In some embodiments of the present disclosure, writing the data to be written into the cache layer or the segmented memory table for storage according to the storage priority includes writing the data to be written into the cache layer for storage when the storage priority is a first level, and writing the data to be written into the segmented memory table for storage when the storage priority is a second level, where the importance of the second level is higher than that of the first level.
In some embodiments of the present disclosure, the processing module 1340 is configured to determine a first proportion of the storage data in the segmented memory table to the total storable capacity of the segmented memory table, and merge and migrate the storage data in the segmented memory table with the first proportion being greater than or equal to a first capacity threshold into the multi-layer ordered table.
In some embodiments of the present disclosure, the processing module 1340 is further configured to obtain system load information and determine the first capacity threshold according to the system load information.
In some embodiments of the present disclosure, the processing module 1340 is further configured to select, for the multi-layer ordered list, at least a portion of the ordered list from the L n level when the number of ordered lists of the L n level reaches a number threshold and the second proportion of the storage data in the ordered list of the L n level in the total storable capacity thereof is greater than or equal to a second capacity threshold, n is an integer greater than or equal to 0, merge the storage data of the ordered list of the L n level with the storage data of the ordered list of the L n+1 level according to the order of logical block addresses, to obtain a temporary ordered list, where the ordered list to be merged of the L n+1 level is an ordered list with a key value range crossing the key value range of the ordered list to be merged of the L n level, write the temporary ordered list into the L n+1 level, and discard the ordered list to be merged of the L n level and the L n+1 level.
In some embodiments of the present disclosure, the processing module 1340 is further configured to determine a data failure ratio corresponding to the ordered table according to the number of failures of key value pairs and the total number of key value pairs in the ordered table, and when the data failure ratio reaches a failure ratio threshold, merge and migrate the stored data in the ordered table to an ordered table of a next level.
In some embodiments of the present disclosure, the mapping index tree is integrated in the kernel block layer to eliminate duplicate log records and mapping management.
In some embodiments of the present disclosure, the processing module 1340 is further configured to determine a logical block address to be queried according to the received data query request, query in each storage space according to the logical block address and the sequence of the cache layer, the segmented memory table, and the multi-layer ordered table, and feed back a physical block address corresponding to the logical block address according to the query result.
The disclosure also provides an electronic device. FIG. 14 shows a schematic diagram of a hardware implementation employing a processing system.
As shown in fig. 14, the hardware structure of the electronic device 1000 may be implemented using a bus architecture. The bus architecture may include any number of interconnecting buses and bridges depending on the specific application of the hardware and the overall design constraints. Bus 1100 connects together various circuits including one or more processors 1200, memory 1300, and/or hardware modules. Bus 1100 may also connect various other circuits 1400, such as peripherals, voltage regulators, power management circuits, external antennas, and the like. Bus 1100 may be an industry standard architecture (ISA, industry Standard Architecture) bus, a peripheral component interconnect (PCI, PERIPHERAL COMPONENT) bus, or an extended industry standard architecture (EISA, extended Industry Standard Component) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one connection line is shown in the figure, but not only one bus or one type of bus.
The present disclosure also provides a readable storage medium having stored therein a computer program for implementing the above method when executed by a processor. A "readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples of a readable storage medium include an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, a portable read-only memory (CDROM), and so forth.
The present disclosure also provides a computer program product, and the methods of the present disclosure may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. The processes or functions of the present disclosure are performed in whole or in part when the computer program or instructions are loaded and executed.
The computer program or instructions may be stored in a readable storage medium or transmitted from one readable storage medium to another readable storage medium, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The readable storage medium may be any available medium that can be accessed or a data storage device such as a server, data center, etc. that integrates one or more available media. Usable media may be magnetic media such as floppy disks, hard disks, magnetic tape, optical media such as digital video disks, and semiconductor media such as solid state disks. The computer readable storage medium may be volatile or nonvolatile storage medium, or may include both volatile and nonvolatile types of storage medium.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description of the present specification, reference to the terms "one embodiment/manner," "some embodiments/manner," "example," "specific example," or "some examples," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment/manner or example is included in at least one embodiment/manner or example of the present disclosure. In this specification, the schematic representations of the above terms are not necessarily for the same embodiment/manner or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments/manners or examples. Furthermore, the various embodiments/modes or examples described in this specification and the features of the various embodiments/modes or examples can be combined and combined by persons skilled in the art without contradiction.
It will be appreciated by those skilled in the art that the above-described embodiments are merely for clarity of illustration of the disclosure, and are not intended to limit the scope of the disclosure. Other variations or modifications will be apparent to persons skilled in the art from the foregoing disclosure, and such variations or modifications are intended to be within the scope of the present disclosure.