[go: up one dir, main page]

CN121166563A - A method and apparatus for hard drive waste recycling - Google Patents

A method and apparatus for hard drive waste recycling

Info

Publication number
CN121166563A
CN121166563A CN202511706885.1A CN202511706885A CN121166563A CN 121166563 A CN121166563 A CN 121166563A CN 202511706885 A CN202511706885 A CN 202511706885A CN 121166563 A CN121166563 A CN 121166563A
Authority
CN
China
Prior art keywords
storage object
interval
data
hash
object identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202511706885.1A
Other languages
Chinese (zh)
Other versions
CN121166563B (en
Inventor
张镇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Cloud Computing Technology Co ltd
Original Assignee
Zhongdian Cloud Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Cloud Computing Technology Co ltd filed Critical Zhongdian Cloud Computing Technology Co ltd
Priority to CN202511706885.1A priority Critical patent/CN121166563B/en
Publication of CN121166563A publication Critical patent/CN121166563A/en
Application granted granted Critical
Publication of CN121166563B publication Critical patent/CN121166563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种硬盘垃圾回收方法及装置。本方法包括:构建索引容器哈希树,以对象标识、数据键、属性键三元组为哈希键,存储记录区间信息及对应存储对象标识,并采用LRU策略缓存高频数据;根据待回收存储对象的元数据生成哈希键并查找哈希树;基于记录区间信息与当前存储对象标识的关系,执行三类判定规则:若记录区间一致且对应存储对象标识较小则直接判定为垃圾,若较大则重新反查并更新缓存;若区间交叉或覆盖,按存储对象标识大小判定或重新反查;若区间不存在则直接反查;最终根据反查结果迁移有效数据并释放垃圾空间。本方法通过哈希缓存减少反查RPC请求,突破并发限制,加速垃圾数据判定,显著提升GC回收效率并保障存储集群容量稳定。

This application relates to a method and apparatus for hard disk garbage collection. The method includes: constructing an index container hash tree, using the triplet of object identifier, data key, and attribute key as the hash key, storing record interval information and corresponding storage object identifiers, and caching high-frequency data using an LRU strategy; generating hash keys based on the metadata of the storage object to be reclaimed and searching the hash tree; based on the relationship between the record interval information and the current storage object identifier, executing three types of judgment rules: if the record intervals are consistent and the corresponding storage object identifier is small, it is directly judged as garbage; if it is large, a reverse lookup is performed and the cache is updated; if the intervals overlap or cover, judgment is made based on the size of the storage object identifier or a reverse lookup is performed; if the interval does not exist, a reverse lookup is performed directly; finally, valid data is migrated and garbage space is released based on the reverse lookup results. This method reduces reverse lookup RPC requests through hash caching, overcomes concurrency limitations, accelerates garbage data judgment, significantly improves GC collection efficiency, and ensures the stability of storage cluster capacity.

Description

Hard disk garbage recycling method and device
Technical Field
The present application relates to the field of hard disk garbage disposal technologies, and in particular, to a method and an apparatus for recycling hard disk garbage, a computer readable storage medium, and an electronic device.
Background
In a storage system, in order to fully exert the writing performance of a hard disk, a Redirect-on-Write (ROW) technology is commonly adopted in the industry, namely, when a writing request for writing a hard disk pool object is received, the system does not directly cover original data, but redirects and aggregates new data into large-granularity input-output (IO) operation, and finally, the new data is sequentially written into the storage object distributed in the hard disk pool. The technology is widely applied in the field of distributed storage, and the data writing efficiency is remarkably improved due to the sequential writing characteristic. However, the ROW technique is based on the characteristic of an additional write mechanism, which causes the history data of the same storage location to become invalid data (i.e. "garbage data"), and the invalid space needs to be reclaimed by a garbage reclamation (Garbage Collection, GC) mechanism to reuse the resource.
For large object design of ROW technology (typically, the single object capacity is 128 MB), the existing GC recycling process generally adopts a mode of scanning the large object, and when detecting that the garbage data amount in a certain object reaches a preset threshold value, the GC moving operation is triggered, namely, the effective data in the object is identified and moved to a newly allocated storage object, and the ineffective space of the original object is released after the migration is completed. In the process, metadata review is a core link of judging whether the data in the object is valid data or not (if valid, migration is needed, and if invalid, direct release is needed) by reversely checking metadata information. However, concurrency control of metadata review faces the key challenges that if the concurrency of review is too high, a large number rpc (remote procedure call) requests are executed concurrently, network or server resource competition is caused, rpc is blocked, and if the concurrency is too low, system resources cannot be fully utilized, and GC recovery efficiency is low. Therefore, GC recovery efficiency of existing ROW techniques is largely limited by the concurrency capability of metadata review. In summary, it has become an urgent need in the industry to develop a more efficient, reliable and flexible method for recycling hard disk waste.
Disclosure of Invention
Aiming at the technical bottleneck, the application provides a novel hard disk garbage recycling method. More specifically, the application provides a GC recycling method based on a hash caching mechanism, which aims to improve the overall performance of garbage recycling by optimizing the concurrent processing efficiency of metadata reverse check.
In order to achieve the above object, the present application provides the following technical solutions:
The first aspect of the application provides a method for recycling hard disk garbage, which comprises the following steps:
s1, constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recording interval information and a corresponding storage object identifier as hash values, and the recording interval information comprises a recording interval range and a recording type;
s2, generating a hash key according to metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
s3, judging the validity of the data based on the record interval information obtained by searching in the previous step and the storage object identification of the current storage object to be recovered;
And S4, migrating the effective data to the new storage object according to the judgment result of the previous step, and releasing the garbage space in the original storage object.
Optionally, in the method of the present application, the hash value in step S1 includes a lower limit, an upper limit, a record type and a corresponding storage object identifier of a record interval, where if the data is of an array type, the lower limit and the upper limit of the record interval are real boundaries of the array interval, and if the data is of a single value type, the lower limit and the upper limit of the record interval are both single-value values.
Optionally, in the method of the present application, the following determination rule is executed in step S3 to determine the validity of the data:
S31, if the recording section information exists and matches the recording section of the metadata information:
When the storage object identifier corresponding to the recording interval information is larger than the storage object identifier of the current storage object to be recovered, judging that the data corresponding to the recording interval information is junk data;
When the storage object identifier corresponding to the record interval information is smaller than or equal to the storage object identifier of the current storage object to be recovered, re-initiating the reverse check and updating the storage object identifier in the hash tree;
S32, if the recording section information exists but intersects or overlaps with the recording section of the metadata information:
When the storage object identifier corresponding to the recording interval information is larger than the storage object identifier of the current storage object to be recovered, judging that the data of the crossed or covered part is junk data;
when the storage object identifier corresponding to the record interval information is smaller than or equal to the storage object identifier of the current storage object to be recovered, reinitiating the rechecking to determine the data validity;
and S33, if the recorded interval information does not exist, directly initiating reverse check to determine the validity of the data.
Optionally, in the method of the present application, the step S31 of reinitiating the inverse search and updating the storage object identifier in the hash tree includes obtaining the latest storage object identifier through the inverse search, and if the latest storage object identifier is greater than the storage object identifier of the current storage object to be recovered, updating the storage object identifier corresponding to the record in the hash tree to the latest storage object identifier.
Optionally, the method of the present application further comprises:
and S5, sequencing the recorded interval information in the reverse checking result from small to large according to the interval lower limit so as to optimize the processing efficiency of the subsequent interval.
Optionally, the method of the present application further comprises:
and S6, when the crossed or covered record sections exist in the reverse checking result, splitting the record sections according to the principle that the sections with large storage object identifiers cover the sections with small storage object identifiers.
Optionally, the method of the present application further comprises:
and S7, when the continuous record interval with consistent storage object identification exists in the check result, merging the record interval into a larger record interval.
Optionally, in the method of the present application, the hash tree in step S1 is a cache hash tree based on a Least Recently Used (LRU) policy, and is used for preferentially retaining the record interval information of the high-frequency access, so as to reduce the number of the anti-check requests.
A second aspect of the present application provides a hard disk waste recycling device, the device comprising:
the hash tree construction module is used for constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recorded interval information and a corresponding storage object identifier as hash values, and the recorded interval information comprises a recorded interval range and a recorded type;
The hash searching module is used for generating a hash key according to the metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
The data validity judging module is used for searching the obtained record interval information and the storage object identification of the current storage object to be recovered based on the hash searching module and judging the data validity according to a preset judging rule;
And the data processing module is used for migrating the effective data to the new storage object according to the judging result of the data effectiveness judging module and releasing the garbage space in the original storage object.
The device realizes the steps of the hard disk garbage recycling method during operation.
Further, the device of the application further comprises:
The recording interval sequencing module is used for sequencing the recording interval information in the reverse checking result from small interval lower limit to large interval lower limit so as to optimize the subsequent interval processing efficiency;
The recording interval splitting module is used for splitting the recording interval according to the principle that the interval with large storage object identification covers the interval with small storage object identification when the crossed or covered recording interval exists in the reverse checking result;
And the recording interval merging module is used for merging the continuous recording intervals with consistent storage object identifiers into larger recording intervals when the back check results exist.
A third aspect of the application provides an electronic device comprising a memory and a processor;
A memory for storing a computer program;
and the processor is used for executing the computer program to realize the steps of the hard disk garbage collection method.
A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned hard disk garbage collection method.
In summary, the hard disk garbage recycling method based on hash cache provided by the application realizes performance optimization and stability improvement in a plurality of technical layers by introducing the reverse-check hash cache mechanism, and has the following specific advantages:
(1) The reverse check rpc request overhead is remarkably reduced, namely, the reverse check hash cache is used for pre-storing or dynamically caching the metadata key information, so that a large number of rpc reverse check requests are prevented from being repeatedly initiated, the processing load of network communication and a server side is effectively reduced, and the communication link time delay in the reverse check process is reduced.
(2) The limitation of the anti-checking concurrency capability is broken through, namely the anti-checking concurrency amount in the traditional scheme is limited by the processing capability and resource competition of rpc requests, the scheme localizes the metadata information of high-frequency anti-checking through the hash cache, the validity judgment of more data objects can be processed in batches by single anti-checking operation, the large-scale data verification can be completed without depending on high-concurrency rpc requests, and the concurrency processing efficiency is remarkably improved.
(3) The garbage data judging efficiency is accelerated, the response time of the data validity judgment is greatly shortened due to the quick query characteristic of the hash cache, the quick screening of the data in the target object can be completed in a short time, the valid data and the garbage data are clearly distinguished, and an efficient data basis is provided for the subsequent GC moving operation.
(4) The method has the advantages of improving the overall efficiency of GC recovery and guaranteeing the stability of cluster capacity, obviously reducing the time occupation ratio of the reverse check stage in the GC recovery process through the cooperative optimization of the links, effectively shortening the single GC cycle, improving the release rate of garbage space, simultaneously, avoiding the problem of shortage of cluster capacity caused by space recovery lag due to the rapid and efficient garbage recovery capability, and ensuring the long-term stable availability of the storage cluster capacity.
In conclusion, the scheme optimizes the whole link of the reverse checking process through the hash caching mechanism, achieves multi-dimensional improvement from communication overhead and concurrency capability to judging efficiency, and provides a more efficient and stable garbage recycling solution for a storage system based on the ROW technology.
Additional features and advantages of the application will be set forth in the description which follows, or may be learned by practice of the application as set forth hereinafter. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings referred to in the description of the embodiments will be briefly described below. It is noted that the drawings illustrate only some embodiments of the application. Other relevant drawings may be derived from these drawings by those skilled in the art without the inventive effort.
Fig. 1 is a topology diagram of a cluster according to an embodiment of the present application.
Fig. 2 is a flowchart of the overall implementation of the hard disk garbage collection method of the present application.
Fig. 3 is a flowchart of storing an object GC (adding the LRU hash cache determination) in the present application.
Fig. 4 is a distribution structure diagram of index metadata in the present application.
Fig. 5 is a schematic diagram of an index back-check flow in an embodiment of the present application (it is necessary to send rpc to each slice of metadata for querying, and if the results of 3 slices are consistent, it can be confirmed that the metadata is garbage or data).
Fig. 6 is a schematic diagram of a resolution and merging rule of a section recx in a hash cache in an embodiment of the present application, wherein an abscissa in the diagram is recx, an ordinate in the diagram is a storage object ID, if recx to be rechecked is between cache 1 and cache 2, an effective recx is resolved into 3 parts of a red mark according to a principle that ID is large and covered with small, wherein recx to be rechecked only needs to recheck the red mark part, the rechecked is completed, the hash cache is updated, thus a larger recx cache (formed by splicing all red lines) is formed, and whether a rechecked request falls in the section is determined to be garbage or not later.
Fig. 7 is a construction diagram of the hard disk garbage collection device according to the present application.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the embodiments described are merely some, but not all, embodiments of the application. All other embodiments, which can be made by a person skilled in the art without creative efforts, are included in the protection scope of the present application based on the embodiments of the present application.
The term "comprising" and any variations thereof (e.g., "comprising," "including," etc.) are intended to be interpreted as open-ended terms, meaning "including, but not limited to," i.e., the recited list is not an exhaustive list, but may also include other non-explicitly recited list. The term "based on" should be understood as "based at least in part on," i.e., the basis or condition referred to may not be the only factor, but may also relate to other relevant factors. The term "one embodiment" should be understood as "at least one embodiment," i.e., the described embodiment is not the only possible implementation, and other similar embodiments are possible.
In the present disclosure, the terms "a" and "an" and "the" are used to modify a related element or feature, which is presented by way of illustration and not limitation. Unless the context clearly indicates otherwise, "a" shall be understood as "at least one" and "a plurality" as "at least two". Those skilled in the art should reasonably interpret these terms in terms of the semantic and logical relationships of the context to ensure that they encompass the possibilities of "one or more".
Fig. 2 shows an overall implementation flow of the hard disk garbage collection method provided by the application, which comprises the following steps:
s1, constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recording interval information and a corresponding storage object identifier as hash values, and the recording interval information comprises a recording interval range and a recording type;
s2, generating a hash key according to metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
s3, judging the validity of the data based on the record interval information obtained by searching in the previous step and the storage object identification of the current storage object to be recovered;
And S4, migrating the effective data to the new storage object according to the judgment result of the previous step, and releasing the garbage space in the original storage object.
In order to more clearly illustrate the technical solution of the present application, the following description will be further given by way of examples of specific scenarios.
Assuming that 4 disks are arranged under each node in A3-node cluster, the topological structure of the whole cluster is shown in fig. 1, an index pool shown in the figure is mainly used for storing metadata information corresponding to user data, and a data pool is used for storing actual data of a user; user data (including single value, sv, and array value arrayvalue types, wherein array values are generally described in units of fixed size, e.g., 512 bytes, e.g., [0,3] total of 4 groups of 512=2kbytes) is first mapped to an index object when written, mapped by the index object to a corresponding master node (one copy is selected as a master node from 3 copies for redundancy, and the other two copies are slave nodes), e.g., index_ objA1/B1/C1 master node is node0, index_ objA2/B2/C2 master node is node1, index_ objA3/B3/C13 master node is node2, then aggregated writing is applied to the master node, the respective storage objects are assumed to be 3 copies redundancy, the storage objects of each application are sequentially obj0 (tgt 0, tgt4, tgt 8), obj1 (tgt 1, tgt5, tgt 9), obj 2/C1/C2 are aggregated, and the requests are written together by 3/C1/C3/C2/C1 are aggregated together, i.e., the requests are written together by 3/C1/C3/C2. After the aggregation is completed, metadata information (such as index container cont, index obj object/dkey/akey/recx, back end storage address) of the index object is recorded and stored in metadata space of the index pool, meanwhile, the metadata information is recorded in obj0 to be used for locating a storage position of the index object in obj0, after the object repeatedly overwrites a new obj storage object, data in an old obj storage object becomes garbage data, when the garbage amount of the storage object reaches a GC recycling threshold (such as static threshold 80%), recycling of the garbage space can be completed through a GC garbage recycling algorithm, specifically, index metadata information recorded in the obj storage object is read and used for inverse searching (under a distributed scene, the index object is assumed to be a 3-copy object, namely, to be searched on 3 copy fragments of the object, 3 times of rpc read requests are sent, as shown in fig. 5), then, an inverse searching result (such as index container index/dkey/494) is the metadata information of the index object is changed into garbage data, and if the garbage recycling result is more effective than the data of the index object is more than the index container/dkey, and if the index object is more effective, and the index object is more effective than the storage data is required to be stored, and the index is more effective than the index data is effectively stored.
In summary, because the communication time and concurrency of the anti-checking request rpc are limited, the bottleneck of GC recovery is limited by the anti-checking step, and the method for improving GC recovery efficiency based on the anti-checking cache is provided by combining the write-in feature of ROW (the ID of the storage object of the later application), and fig. 3 shows the flow of storing the GC of the object (the judgment of LRU hash cache is added), the specific implementation steps of the method are as follows:
1. Constructing an index container hash tree, namely creating a corresponding LRU hash table according to the index container, taking obj, dkey, akey triples as keys of the hash, wherein the value comprises recx [ low, high ] and a storage object ID, if the data represented by the metadata are of an array type, the low and the high are real interval boundaries of recx, and if the data are sv data, the low and the high are values of sv, and the constructed hash tree is shown in figure 4.
2. Hash tree searching, namely generating a Hash key (obj, dkey, akey) according to index metadata information (index container, obj, dkey, akey, recx), and searching recx intervals on the index container Hash tree.
3. Section consistency judging rules, namely if recx of the reverse check of the storage object exists in the hash table and the sections are consistent, judging rules are as follows:
3.1 If the object ID of the check recx is smaller than the object ID of the hash cache, it can be directly determined as garbage, if the obj with the object ID of 100 is GC, and if one piece recx of metadata is recorded on the hash tree after the check, and the object ID of the metadata is 150, the recx is necessarily garbage data on the object 100;
3.2 If the storage object ID is greater than or equal to the ID of the storage object in the cache, the inverse checking is restarted, and the hash cache is updated, if the obj with the storage object ID of 100 is GC, and if the storage object ID of one segment recx in metadata is 90 after the inverse checking record on the hash tree, whether recx is garbage or data on the storage object 100 cannot be judged, and at the moment, the inverse checking needs to be initiated to confirm, if the storage object with the inverse checking result is 120, the storage object is updated to the hash cache by 120.
4. Section cross coverage judgment rules are that if recx of storage object check and recx in cache are inconsistent or cross exists, judgment is carried out according to the following rules:
4.1 The intersecting or covering part judges that garbage is generated when the storage object ID is smaller than the cache storage object ID, if the obj with the storage object ID of 100 is doing GC, one section recx [2,5] in metadata searches the corresponding cache { recx [1,8], objId =101 } through a hash table, and the storage object ObjId in the cache is larger, so that recx [2,5] on the 100 objects can be judged to be garbage;
4.2 When the storage object ID is larger than the cache storage object ID, the cross or overlay part needs to check whether the storage object ID is garbage or data, for example, if obj with the storage object ID of 100 is doing GC, a segment recx [2,5] in metadata searches the corresponding cache { recx [1,8], objId =90 ] through the hash table, and because the storage object ObjId =100 doing GC is larger, the check is needed to check whether the storage object ID is garbage or data.
5. If recx of the storage object check is not in the cache, directly initiating check confirmation.
6. Recx section ordering rules, namely the inverse checking result is ordered from small to large according to recx, the section coverage part is split, and if the storage values of two recx sections under one akey on the hash tree are { recx [2,5], objId =100 }, { recx [6,9], objId =90 }, the recx on the hash tree is { recx [2,5], objId =100 }, { recx [6,9], objId =90 }, respectively.
7. And recx, according to the rule that the storage object ID is large and small in coverage, as shown in figure 6, if the storage values of two recx intervals under one akey on the hash tree are { recx [2,5], objId =100 }, { recx [3,9], objId =90 }, respectively, recx on the hash tree is split into { recx [2,5], objId =100 }, { recx [6,9], objId =90 }, according to the rule that the storage object ID is large and small in coverage.
8. If the storage object IDs are identical when the recx section merging rules recx are continuous, the storage object IDs can be merged into one larger recx section, and if the storage values of the two recx sections on the hash tree under one akey are { recx [2,5], objId =100 }, { recx [6,9], objId =100 }, the storage object IDs can be merged into one recx, namely { recx [2,9], objId =100 }.
Fig. 7 shows a hard disk garbage recycling device according to the present application, where the device includes:
the hash tree construction module is used for constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recorded interval information and a corresponding storage object identifier as hash values, and the recorded interval information comprises a recorded interval range and a recorded type;
The hash searching module is used for generating a hash key according to the metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
The data validity judging module is used for searching the obtained record interval information and the storage object identification of the current storage object to be recovered based on the hash searching module and judging the data validity according to a preset judging rule;
And the data processing module is used for migrating the effective data to the new storage object according to the judging result of the data effectiveness judging module and releasing the garbage space in the original storage object.
The method for recycling the hard disk garbage comprises the steps of realizing the method for recycling the hard disk garbage disclosed by the application when the device is operated.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In the figures, each block may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As shown in FIG. 8, an embodiment of the application also discloses an electronic device comprising a processor 310, a communication interface 320, a memory 330 for storing a processor executable computer program and a communication bus 340. Wherein the processor 310, the communication interface 320 and the memory 330 perform communication with each other through the communication bus 340. The processor 310 implements the steps of the hard disk garbage collection method described above by running the executable computer program.
It will be appreciated that the electronic device may contain, in addition to a memory and a processor, input devices (e.g., a keyboard), output devices (e.g., a display), and other communication modules. These input devices, output devices, and other communication modules communicate with the processor through an I/O interface (i.e., an input/output interface).
The operations of the present application may be implemented by writing computer program code using one or more programming languages, or a combination thereof. The programming languages include, but are not limited to, the following types:
object-oriented programming languages, such as Java, smalltalk, C ++, etc.;
Conventional procedural programming languages, such as the "C" language or similar programming languages.
The execution of the program code includes, but is not limited to:
executing entirely on the user's computer;
Partially executing on the user computer and partially executing on the remote computer;
executing as a stand-alone software package;
Executing entirely on a remote computer or server.
In a scenario involving a remote computer, the remote computer may be connected to the user computer through any type of network, including, but not limited to, a Local Area Network (LAN) or a Wide Area Network (WAN). In addition, the remote computer may also be connected to an external computer through an Internet service provider, for example, using the Internet.
Further, the application also discloses a computer readable storage medium, which when the instructions in the computer readable storage medium are executed by a processor of the electronic equipment, the electronic equipment can execute the steps of the hard disk garbage collection method disclosed by the application.
In the context of the present application, a computer-readable storage medium refers to a tangible medium capable of storing computer program code and related data. Specific examples include, but are not limited to, the following:
(1) Portable computer magnetic disks-removable magnetic storage media such as floppy disks.
(2) The hard disk comprises fixed storage devices such as a mechanical hard disk, a solid state hard disk and the like.
(3) Random Access Memory (RAM) is a volatile storage medium used for temporarily storing data and program code.
(4) Read Only Memory (ROM), a nonvolatile storage medium for storing fixed programs and data.
(5) Erasable programmable read-only Memory (EPROM) or Flash Memory (Flash Memory), a non-volatile storage medium that supports multiple erases and programming.
(6) Optical fiber storage devices-storage media based on optical fiber technology.
(7) Portable compact disc read-only memory (CD-ROM), a read-only medium that stores data in the form of an optical disc.
(8) Optical storage devices such as DVD, blu-ray disc, etc. based on optical principles.
(9) Magnetic storage devices such as magnetic tape, magnetic disk, and the like based on magnetic principles.
(10) Any suitable combination of the above, for example, combining multiple storage media to meet different storage requirements.
These computer readable storage media may be used to store program code and related data described in the present application to support program execution and persistent storage of data.
In particular, the processes described in the flowcharts may be implemented as computer software programs according to embodiments of the present application. For example, embodiments of the present application relate to a computer program product comprising a computer program loaded on a non-transitory computer readable medium. The computer program contains program code for performing the hard disk garbage collection method disclosed herein. The above-described functions defined in the embodiments of the present application can be achieved when the computer program is executed by a processing apparatus.
While the above discussion contains several specific implementation details, these should not be construed as limiting the scope of the present application. The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure to which the present application relates is not limited to the solutions formed by the specific combinations of features described above. Meanwhile, the application also covers other technical schemes formed by any combination of the technical features or the equivalent features without departing from the conception disclosed above.
It will be further understood by those skilled in the art that various changes may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for elements thereof, without departing from the spirit and scope of the technical solutions of the embodiments of the present application. Such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the core spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method for recycling hard disk garbage, the method comprising the steps of:
s1, constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recording interval information and a corresponding storage object identifier as hash values, and the recording interval information comprises a recording interval range and a recording type;
s2, generating a hash key according to metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
s3, judging the validity of the data based on the record interval information obtained by searching in the previous step and the storage object identification of the current storage object to be recovered;
And S4, migrating the effective data to the new storage object according to the judgment result of the previous step, and releasing the garbage space in the original storage object.
2. The method of claim 1, wherein the hash value in step S1 includes a lower limit, an upper limit, a record type, and a corresponding storage object identifier of a record interval, wherein the lower limit and the upper limit of the record interval are real boundaries of a plurality of sets of intervals if the data is of the plurality of sets of types, and the lower limit and the upper limit of the record interval are both single-value values if the data is of the single-value type.
3. The method according to claim 1, wherein the following decision rule is executed in step S3 to decide the validity of the data:
S31, if the recording section information exists and matches the recording section of the metadata information:
When the storage object identifier corresponding to the recording interval information is larger than the storage object identifier of the current storage object to be recovered, judging that the data corresponding to the recording interval information is junk data;
When the storage object identifier corresponding to the record interval information is smaller than or equal to the storage object identifier of the current storage object to be recovered, re-initiating the reverse check and updating the storage object identifier in the hash tree;
S32, if the recording section information exists but intersects or overlaps with the recording section of the metadata information:
When the storage object identifier corresponding to the recording interval information is larger than the storage object identifier of the current storage object to be recovered, judging that the data of the crossed or covered part is junk data;
when the storage object identifier corresponding to the record interval information is smaller than or equal to the storage object identifier of the current storage object to be recovered, reinitiating the rechecking to determine the data validity;
and S33, if the recorded interval information does not exist, directly initiating reverse check to determine the validity of the data.
4. The method of claim 3, wherein the re-initiating the inverse search and updating the storage object identifier in the hash tree in step S31 includes obtaining a latest storage object identifier through the inverse search, and updating the storage object identifier corresponding to the record in the hash tree to the latest storage object identifier if the latest storage object identifier is greater than the storage object identifier of the current storage object to be reclaimed.
5. A method according to claim 3, characterized in that the method further comprises:
and S5, sequencing the recorded interval information in the reverse checking result from small to large according to the interval lower limit so as to optimize the processing efficiency of the subsequent interval.
6. A method according to claim 3, characterized in that the method further comprises:
and S6, when the crossed or covered record sections exist in the reverse checking result, splitting the record sections according to the principle that the sections with large storage object identifiers cover the sections with small storage object identifiers.
7. A method according to claim 3, characterized in that the method further comprises:
and S7, when the continuous record interval with consistent storage object identification exists in the check result, merging the record interval into a larger record interval.
8. The method according to claim 1, wherein the hash tree in step S1 is a least recently used policy-based cache hash tree for preferentially retaining record interval information of high-frequency access, and reducing the number of back-check requests.
9. A hard disk waste recycling device, the device comprising:
the hash tree construction module is used for constructing an index container hash tree, wherein the hash tree takes a triplet of an object identifier, a data key and an attribute key as a hash key, and takes recorded interval information and a corresponding storage object identifier as hash values, and the recorded interval information comprises a recorded interval range and a recorded type;
The hash searching module is used for generating a hash key according to the metadata information of the storage object to be recovered, and searching record interval information corresponding to the metadata information through the hash tree;
The data validity judging module is used for searching the obtained record interval information and the storage object identification of the current storage object to be recovered based on the hash searching module and judging the data validity according to a preset judging rule;
And the data processing module is used for migrating the effective data to the new storage object according to the judging result of the data effectiveness judging module and releasing the garbage space in the original storage object.
10. The apparatus of claim 9, wherein the apparatus further comprises:
The recording interval sequencing module is used for sequencing the recording interval information in the reverse checking result from small interval lower limit to large interval lower limit so as to optimize the subsequent interval processing efficiency;
The recording interval splitting module is used for splitting the recording interval according to the principle that the interval with large storage object identification covers the interval with small storage object identification when the crossed or covered recording interval exists in the reverse checking result;
And the recording interval merging module is used for merging the continuous recording intervals with consistent storage object identifiers into larger recording intervals when the back check results exist.
CN202511706885.1A 2025-11-20 2025-11-20 Hard disk garbage recycling method and device Active CN121166563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511706885.1A CN121166563B (en) 2025-11-20 2025-11-20 Hard disk garbage recycling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511706885.1A CN121166563B (en) 2025-11-20 2025-11-20 Hard disk garbage recycling method and device

Publications (2)

Publication Number Publication Date
CN121166563A true CN121166563A (en) 2025-12-19
CN121166563B CN121166563B (en) 2026-01-30

Family

ID=98040155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511706885.1A Active CN121166563B (en) 2025-11-20 2025-11-20 Hard disk garbage recycling method and device

Country Status (1)

Country Link
CN (1) CN121166563B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106734A1 (en) * 2008-10-24 2010-04-29 Microsoft Corporation Blob manipulation in an integrated structured storage system
CN109033278A (en) * 2018-07-11 2018-12-18 江苏通付盾科技有限公司 Data processing method, device, electronic equipment and computer storage medium
CN111240588A (en) * 2019-12-31 2020-06-05 清华大学 Persistent memory object storage system
US20220121532A1 (en) * 2020-10-16 2022-04-21 Vmware, Inc. Fault-tolerant uploading of data to a distributed storage system
CN118170315A (en) * 2024-03-13 2024-06-11 武汉汇迪森信息技术有限公司 A method for accelerating read operations of Key-Value storage systems using learning indexes
CN119493519A (en) * 2023-08-17 2025-02-21 华为技术有限公司 Data management method and device
CN120653205A (en) * 2025-06-16 2025-09-16 北京金山云网络技术有限公司 Garbage collection testing method and device, electronic equipment and storage medium
CN120687043A (en) * 2025-08-22 2025-09-23 中电云计算技术有限公司 A method and device for improving IO read performance based on object affinity

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106734A1 (en) * 2008-10-24 2010-04-29 Microsoft Corporation Blob manipulation in an integrated structured storage system
CN109033278A (en) * 2018-07-11 2018-12-18 江苏通付盾科技有限公司 Data processing method, device, electronic equipment and computer storage medium
CN111240588A (en) * 2019-12-31 2020-06-05 清华大学 Persistent memory object storage system
US20220121532A1 (en) * 2020-10-16 2022-04-21 Vmware, Inc. Fault-tolerant uploading of data to a distributed storage system
CN119493519A (en) * 2023-08-17 2025-02-21 华为技术有限公司 Data management method and device
CN118170315A (en) * 2024-03-13 2024-06-11 武汉汇迪森信息技术有限公司 A method for accelerating read operations of Key-Value storage systems using learning indexes
CN120653205A (en) * 2025-06-16 2025-09-16 北京金山云网络技术有限公司 Garbage collection testing method and device, electronic equipment and storage medium
CN120687043A (en) * 2025-08-22 2025-09-23 中电云计算技术有限公司 A method and device for improving IO read performance based on object affinity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JAVAJAYV: "Redis缓存图解记忆", 《HTTPS://WWW.CNBLOGS.COM/JAYV/P/14553794.HTML》, 18 March 2021 (2021-03-18), pages 1 - 29 *

Also Published As

Publication number Publication date
CN121166563B (en) 2026-01-30

Similar Documents

Publication Publication Date Title
CN114253908A (en) Data management method and device of key value storage system
CN108810041A (en) A kind of data write-in of distributed cache system and expansion method, device
CN103581331B (en) The online moving method of virtual machine and system
CN110888837B (en) Object storage small file merging method and device
US11500856B2 (en) RDMA-enabled key-value store
CN110321301A (en) A kind of method and device of data processing
CN114020707B (en) Storage space recovery method, storage medium, and program product
CN113867627A (en) A method and system for optimizing the performance of a storage system
US20240264773A1 (en) Data Prefetching Method, Computing Node, and Storage System
US11397706B2 (en) System and method for reducing read amplification of archival storage using proactive consolidation
CN114968845A (en) A method, system, device and storage medium for cache processing
CN112835511A (en) Data writing method, apparatus, device and medium of distributed storage cluster
CN106202485A (en) Data manipulation method and system
CN110162395B (en) Memory allocation method and device
WO2020215580A1 (en) Distributed global data deduplication method and device
CN111459913A (en) Capacity expansion method and device of distributed database and electronic equipment
CN116107514B (en) Data processing methods and apparatus for object storage
CN120687043B (en) Method and device for improving IO read performance based on object affinity
CN120632897B (en) Simplified duplicate removal and security enhancement system and method for AI mirror image
CN121166563B (en) Hard disk garbage recycling method and device
CN111581157B (en) Object storage platform, object operation method, device and server
US20240126750A1 (en) Accelerating query execution by optimizing data transfer between storage nodes and database nodes
CN110134662A (en) SDN distributed storage system, data processing method and storage medium
CN113986471A (en) Method, device, equipment and storage medium for safely deleting mirror image file of virtual machine
CN109814891B (en) A data update method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant