[go: up one dir, main page]

CN120188138A - Recovering data from sequential storage media - Google Patents

Recovering data from sequential storage media Download PDF

Info

Publication number
CN120188138A
CN120188138A CN202280101808.1A CN202280101808A CN120188138A CN 120188138 A CN120188138 A CN 120188138A CN 202280101808 A CN202280101808 A CN 202280101808A CN 120188138 A CN120188138 A CN 120188138A
Authority
CN
China
Prior art keywords
segments
segment
medium
data
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280101808.1A
Other languages
Chinese (zh)
Inventor
亚伊尔·托弗
阿萨夫·纳塔逊
阿维夫·库温特
伊丹·扎克
迈克尔·斯特恩伯格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN120188138A publication Critical patent/CN120188138A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0682Tape device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0686Libraries, e.g. tape libraries, jukebox

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供了一种用于管理多个顺序访问存储设备的顺序存储介质以用于在所述顺序存储介质上对目标数据进行重复数据删除存储的系统。处理器标识介质上的目标数据的片段的重复片段。处理器确定预测总恢复时间小于阈值的组合。该组合包括标识的重复片段、重写片段和新的数据片段中的一个或多个。重写片段是被指定用于作为新数据重写到介质上的重复片段。新的数据片段包括当前未存储在介质上的重复片段中的数据,即,非重复片段。新的数据片段被指定用于作为新数据首次被写入到介质上。处理器根据组合指示写入,即,将重写片段和/或新的数据片段写入标识的介质上。

A system for managing sequential storage media of multiple sequential access storage devices for deduplicating target data on the sequential storage media is provided. A processor identifies duplicate segments of segments of target data on the medium. The processor determines a combination for which the predicted total recovery time is less than a threshold. The combination includes one or more of the identified duplicate segments, overwritten segments, and new data segments. The overwritten segments are duplicate segments designated to be overwritten on the medium as new data. The new data segments include data that is not currently stored in the duplicate segments on the medium, i.e., non-duplicate segments. The new data segments are designated to be written to the medium for the first time as new data. The processor writes according to the combination instruction, i.e., writes the overwritten segments and/or the new data segments to the identified medium.

Description

Recovering data from sequential storage media
Background
In some embodiments of the invention, the invention relates to storage systems, and more particularly, but not exclusively, to management of deduplication-based storage systems that store data on sequential storage media.
The deduplication (deduplication/dedup) based storage system eliminates redundant storage of the same data block by storing references to previously stored data blocks, rather than storing another duplicate data block of the same data block. But even with deduplication, storage requirements are such that the disk is not large enough to meet the requirements. The emphasis on auxiliary storage and archiving is moving from disks with deduplication capabilities to sequential storage media (e.g., tape technology) due to lower tape prices and/or longer data retention times. Seek times in magnetic tape are relatively long, on the order of tens of seconds, which is the time required for a magnetic tape drive to roll the magnetic tape to a position where data is read. According to the linear tape Open (LINEAR TAPE Open, LTO) 9 standard, the length of the tape (i.e., the maximum seek time) exceeds 1 km. Most applications use tapes without any optimization other than built-in compression, since seek times in the tape are relatively long.
Disclosure of Invention
It is an object of the present invention to provide a computing device, system, computer program product, and method for managing sequential storage media accessed by multiple sequential access storage devices for deduplication storage of target data.
The above and other objects are achieved by the features of the independent claims. Other implementations are apparent from the dependent claims, the description and the drawings.
According to a first aspect, an apparatus for deduplication storage of data includes a processor for identifying at least one medium by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and a data segment stored on the medium, the at least one medium storing a repeating segment for at least one segment of target data stored on the plurality of media, identifying, for each respective identified medium, a repeating segment of the segment of target data stored on the medium by accessing a medium-specific deduplication mechanism, determining a combination of a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one rewriting segment, and (iii) at least one new data segment, wherein the at least one rewriting segment comprises at least one repeating segment for being rewritten, the at least one first data segment comprises a non-repeating segment for being written for a time, and (iii) at least one of the at least one new data segment identified on the medium.
The predicted total recovery time for reconstructing the target data is calculated based on segment locations, optionally defined by the global index map and/or a medium-specific deduplication mechanism, to select which segments are to be written as new data, and which existing duplicate segments can be used. A selection is made to ensure that the data is written in such a way that the predicted total recovery time is less than the threshold while maximizing the utilization of the repeated segments.
Since repeated segments may be scattered over different locations of the medium of the sequential access storage device, the greater the number of non-sequential areas in which the repeated segments used to reconstruct the target data are located, the longer the total recovery time to read the repeated segments. Reusing the repeated segments to reconstruct the target data improves utilization of existing media resources, but at the cost of longer overall recovery time. In the other extreme, the total recovery time for reading the segments is the shortest in the case where no duplicate segments are used and all segments of the target data are newly written in sequence. The cost of a shorter total recovery time is to use a larger amount of media resources to store the target data. Embodiments described herein balance these two conflicting requirements by selecting a combination of segments of the target data that are predicted to have the total recovery time less than a selected threshold while maximizing utilization of existing duplicate segments such that the predicted total recovery time remains below the threshold.
Rewriting some of the identified repeated segments as new data (i.e., rewritten segments) along with the non-repeated segments (e.g., buffered in other storage media such as an HDD or SSD prior to dumping to the tape) may further improve the performance of the tape drive, e.g., prevent "shoe-cleaning" (back and forth tape movement), and reduce repeated stopping and starting of tape movement, which increases the life of the tape and/or increases the durability of the tape.
According to a second aspect, an apparatus for recovering data includes a processor for receiving a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, reading metadata to obtain a location of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one rewriting segment, and (iii) at least one new data segment, wherein the at least one rewriting segment comprises at least one repeating segment that is rewritten, the at least one new data segment comprises a non-repeating segment that is first written, reading the segment from the location obtained from the metadata, and recovering the target data from the read segment.
When the writing is completed according to the selected combination, it is expected that data recovery is performed below the threshold.
According to a third aspect, a method for deduplication storage of data includes identifying at least one medium storing a repeating segment for at least one segment of target data stored on a plurality of media by accessing a global index mapping between each of the plurality of media and the data segment stored on the media of a plurality of sequential access storage devices, identifying, for each respective identified medium, a repeating segment of the target data stored on the medium by accessing a medium specific deduplication mechanism, determining a combination of a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one overwriting segment, and (iii) at least one new data segment, wherein the at least one overwriting segment comprises at least one repeating segment for being overwritten, the at least one new data segment comprising a non-first repeating segment for being written, and (ii) at least one of the at least one overwriting segment and the at least one new data segment being written on the identified medium.
According to a fourth aspect, a non-transitory medium storing program instructions for deduplication storage of data, the program instructions, when executed by at least one processor, cause the at least one processor to identify at least one medium by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and a data segment stored on the medium, the at least one medium storing a duplicate of at least one segment of target data for storage on the plurality of media, identify, for each respective identified medium, a duplicate of a segment of the target data stored on the medium by accessing a medium-specific deduplication mechanism, determine a combination of predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one duplicate segment, (ii) at least one overwrite segment, and (iii) at least one new data segment, wherein the at least one overwrite segment comprises at least one duplicate segment for being overwritten, the at least one first-time segment comprising at least one new data segment for being overwritten, and (iii) at least one new data segment for being overwritten on the at least one media.
According to a fifth aspect, a method for recovering data includes receiving a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, reading metadata to obtain locations of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one overwriting segment, and (iii) at least one new data segment, wherein the at least one overwriting segment comprises at least one repeating segment that is overwritten, the at least one new data segment comprises a non-repeating segment that is first written, reading the segment according to the locations obtained from the metadata, and recovering the target data from the read segment.
According to a sixth aspect, a non-transitory medium storing program instructions for retrieving data, which when executed by at least one processor, cause the at least one processor to receive a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, read metadata to obtain locations of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total retrieval time less than a threshold, the combination comprising at least one of (i) at least one duplicate segment, (ii) at least one overwrite segment, wherein the at least one overwrite segment comprises at least one duplicate segment that is overwritten, the at least one new data segment comprising a first-time written non-duplicate segment, and (iii) retrieve the target data from the read segment from the locations according to the metadata.
In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to write metadata indicating a location of the combination on the medium for reading and obtaining the predicted total recovery time less than the threshold.
The metadata provides the positions of the segments of the combination so as to obtain the predicted total recovery time less than the threshold during a read phase.
In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to read the metadata to determine a location of the combined segments to read to obtain the predicted total recovery time that is less than the threshold.
In a further implementation form of the first aspect, the second aspect and the third aspect, the at least one re-written segment and the at least one new data segment are written sequentially onto the available space of the identified medium.
Sequentially writing the repeated segments and/or the new data provides a faster sequential data read time than reading the repeated segments, which results in a longer read time due to the medium and/or read head moving to different positions of the repeated segments.
In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to calculate a plurality of combinations, each combination comprising a different combination of a repeat section and a overwrite section, calculate the predicted total recovery time for each combination, and select a combination having the predicted total recovery time less than the threshold and having a maximum repetition rate.
Calculating the different combinations enables the selection of the best combination, providing the highest repetition rate, while keeping the predicted total recovery time below the threshold.
In a further implementation form of the first aspect, the second aspect and the third aspect, each combination comprises a different subset of the repeated segments, wherein identified repeated segments excluded from a subset are classified as re-written segments.
Excluding some identified repeating segments, but writing them sequentially with non-repeating segments, reduces the total recovery time, keeping the predicted total recovery time below the threshold.
In a further implementation form of the first, second and third aspects, the processor is further configured to calculate a baseline restoration time for reading the plurality of segments of the target data using the first set of identified repeated segments, calculate a difference between the threshold and the baseline restoration time, iteratively delete at least one of the identified repeated segments from the first set until the predicted total restoration time is less than the threshold, wherein identified repeated segments deleted from the first set are added as overwrite segments to a second set.
A set of repeated segments may be added/deleted based on the prediction result to obtain the optimal segment combination. Different combinations may include different numbers of repeating segments. The remaining fragments included in the combination are re-written repeated fragments and sequentially written non-repeated fragments.
In a further implementation form of the first aspect, the second aspect and the third aspect, the threshold defines a maximum time from receiving a resume request to reconstructing the target data from the read fragment and ready.
The threshold is defined in terms of time associated from a user perspective.
In a further implementation form of the first aspect, the second aspect and the third aspect, the plurality of sequential access storage devices comprises a tape drive, the plurality of media comprises a tape cartridge, and the predicted total recovery time is calculated by determining an order of the segments for reading the target data and based on a maximum seek time for a first segment, based on seek times between segments for the order of reading, read throughput, read head wrap (wrap) variation and tape library loading time.
These parameters depend on the connection type (e.g., cFC, SAS), tape drive/tape library type, inserted tape cartridge type (LTO 9, LTO 8.) and drive compression on/off, providing a custom selection of which duplicate segments to use and which duplicate segments to overwrite as new data.
In a further implementation form of the first, second and third aspects, the order for reading is calculated based on a nearest neighbor method that takes into account the positions of the segments on different reels that are physically close to each other and minimizes the total distance traversed between the segments by the read head.
Moving the head of the tape drive across multiple wraps may be faster than moving the head of the tape drive across the length of one wrap. The predicted total recovery time may be further reduced by selecting different movements of the head across the wrap tape to change the order of reading. In addition, the head of the tape drive is more efficient across the tape than running the tape back and forth (also referred to as "shoe-cleaning") and greatly increases the life of the tape. In addition, moving the head of the tape drive across the tape may reduce the number of stops and restarts of tape movements, improve the efficiency of tape movements and increase the durability of the tape.
In a further implementation form of the first, second and third aspects, the threshold is defined according to a service level agreement (SERVICE LEVEL AGREEMENT, SLA).
The selection of the combination may be in accordance with the SLA associated with the target data. For each target data written to the sequential storage system, the client may select the SLA that defines the longest time. The threshold may be customized, with different values selected by different users.
In a further implementation form of the first, second and third aspects, the processor is further configured to update at least one medium-specific deduplication mechanism of the at least one identified medium to indicate the identified duplicate segments, the at least one overwrite segment and a non-duplicate segment included in the selected combination of the target data, update the global index to indicate a mapping between the segments of the target data used during the accessing, update a global file map to store identifiers of the at least one segment of the target data, and update an identification table to store metadata for each identifier.
In a further implementation form of the first aspect, the second aspect and the third aspect, when at least two media are identified, a respective combination is selected for each media for which the respective predicted total recovery time is less than the threshold.
In the context of deduplicated data stored on multiple media read in parallel, the total reconstruction time remains below the threshold.
In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to aggregate repeating segments on the identified medium that are physically close to each other, wherein the combination comprises at least one cluster of repeating segments, and the new data is included in at least one overwriting cluster.
Aggregation reduces the number of elements in the combination, thereby reducing the total number of possible combinations, optionally to a smaller number. The small number of combinations enables efficient selection of one combination.
In a further implementation form of the first aspect, the second aspect and the third aspect, the predicted total recovery time is calculated from seek times between clusters.
When reading data within a cluster in sequence, the total read and seek time of the cluster is shorter than the non-aggregated segments. This is because when fragments in a cluster are close to each other, reading the entire data of the cluster sequentially (including the uncorrelated data between the fragments) is faster than stopping at the end of the fragments, skipping the uncorrelated data between the fragments, and seeking to the beginning of the next fragment—the latter is much slower.
In a further implementation of the second aspect, the order in which the segments are read is according to an order selected during writing to obtain the predicted total recovery time that is less than the threshold.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, these materials, methods, and examples are illustrative only and not necessarily limiting.
Drawings
Some embodiments of the invention are described herein, by way of example only, with reference to the accompanying drawings. Referring now in specific detail to the drawings, it is emphasized that the details shown are merely illustrative and for purposes of illustrative discussion of embodiments of the invention. In this regard, it will be apparent to those skilled in the art how embodiments of the invention may be practiced in conjunction with the description of the drawings.
In the drawings:
FIG. 1 is a flow diagram of a method for computing a combination of fragments of target data with a predicted total recovery time less than a threshold to deduplicate components of a system storing target data on sequential storage media accessed by a sequential access storage device, according to some embodiments;
FIG. 2 is a block diagram of components of a system for computing a combination of segments of target data with a predicted total recovery time less than a threshold for deduplication storage on a sequential storage medium (e.g., tape), according to some embodiments;
FIG. 3A is a flow chart of a method of deduplication storage on a sequential storage medium according to some embodiments;
FIG. 3B is a flow chart of a method of computing a combination of fragments of target data for deduplication storage on a sequential storage medium, according to some embodiments;
FIG. 4 is a method of reading a combination of fragments of target data written to have a predicted total recovery time less than a threshold, according to some embodiments;
one schematic of FIG. 5 helps explain why the linear loops (recall) are very inefficient, another schematic helps explain why the ordered loops are more efficient, and
FIG. 6 includes a schematic diagram that helps explain the calculation of predicted seek times for different combinations of identified repeating segments, according to some embodiments described herein.
Detailed Description
In some embodiments of the invention, the invention relates to storage systems, and more particularly, but not exclusively, to management of deduplication-based storage systems that store data on sequential storage media.
Aspects of some embodiments relate to systems, methods, computing devices and/or apparatuses and/or computer program products (storing code instructions executable by one or more processors) for managing sequential storage media (also referred to herein as "media") of a plurality of sequential access storage devices for deduplication storage of target data on the media. Sequential access storage devices are not random access, and may be, for example, tape libraries having hundreds of tape drives and thousands of cartridges. The processor divides the target data into a plurality of data segments (also referred to herein as "segments"). The processor identifies repeated segments of one or more segments of target data on one or more media. The processor determines a combination of the predicted total recovery time being less than a threshold (e.g., defined by a service level agreement (SERVICE LEVEL AGREEMENT, SLA)). The combination includes at least one of (i) one or more identified repeating segments, (ii) one or more overwriting segments, (iii) one or more new data segments. The overwrite segments have the same data as the identified repetition segments, wherein the overwrite segments are designated for being overwritten onto the medium as new data, e.g. over available space (e.g. appended to the end of existing written data). The new data segment comprises data that is not currently stored in the repeated segment on the medium, i.e. the new data segment is a non-repeated segment. The new data fragment is designated for being written on the medium for the first time as new data. The processor writes, i.e. writes the re-written fragment and/or the new data fragment on the identified medium according to the combination indication.
An aspect of some embodiments relates to systems, methods, computing devices and/or apparatuses and/or computer program products (storing code instructions executable by one or more processors) for recovering data stored on one or more media. The processor receives a request to read target data. The processor reads the metadata to obtain the location of the segment of target data on one or more media. Fragments of the target data are stored as combinations selected during the write phase to obtain a predicted total recovery time less than a threshold. The processor instructs to read the fragment according to the location obtained from the metadata. It should be noted that, as indicated by the metadata, even if some segments are duplicates, it is possible to read over-write segments instead of duplicate segments in order to obtain a predicted total recovery time that is less than the threshold. The processor recovers the target data from the read fragment.
The predicted total recovery time for reconstructing the target data is calculated based on segment locations, optionally defined by the global index map and/or a medium-specific deduplication mechanism, to select which segments are to be written as new data, and which existing duplicate segments can be used. A selection is made to ensure that the data is written in such a way that the predicted total recovery time is less than the threshold while maximizing the utilization of the repeated segments.
Since repeated segments may be scattered over different locations of the medium of the sequential access storage device, the greater the number of non-sequential areas in which the repeated segments used to reconstruct the target data are located, the longer the total recovery time to read the repeated segments. Reusing the repeated segments to reconstruct the target data improves utilization of existing media resources, but at the cost of longer overall recovery time. In the other extreme, the total recovery time for reading the segments is the shortest in the case where no duplicate segments are used and all segments of the target data are newly written in sequence. The cost of a shorter total recovery time is to use a larger amount of media resources to store the target data. Embodiments described herein balance these two conflicting requirements by selecting a combination of segments of the target data that are predicted to have the total recovery time less than a selected threshold while maximizing utilization of existing duplicate segments such that the predicted total recovery time remains below the threshold.
Writing some of the identified repeated segments as new data (i.e., re-written segments) along with the non-repeated segments (e.g., buffered in other storage media such as an HDD or SSD prior to dumping to the tape) may further improve the performance of the tape drive, e.g., prevent "shoe-cleaning" (back and forth tape movement), and reduce repeated stopping and starting of tape movement, which increases the life of the tape and/or increases the durability of the tape.
At least some embodiments described herein address the technical problem of balancing the conflicting requirements of reducing seek time over deduplicated segments dispersed across one or more different sequential storage devices (e.g., cartridges) and increasing the duplicate data deletion of data to reduce the amount of storage media used. At least some embodiments described herein select a minimum seek time to recover data with possible de-duplication (the best possible de-duplication that is optional or near the best possible de-duplication) that is less than a defined maximum seek time threshold, e.g., as defined by a service level agreement (SERVICE LEVEL AGREEMENT, SLA).
Although different deduplication systems exist, this solution is designed for disk-based storage systems (i.e., random access data storage devices such as hard disks and/or Solid State Disks (SSDs)). Deduplication designed for disk-based storage cannot be easily migrated to tape storage because the amount of data stored by tape storage is large (e.g., at least about 100 times relative to disk storage) and/or the tape differs from the method of operation of disk storage.
There are existing methods of managing data stored on magnetic tape. In the most common approach, no deduplication is used. When data is written to tape, the data will be written in its original (i.e., non-deduplicated) form. Even if the data is moved according to a deduplication format (e.g., on disk), the data will first expand to the original non-deduplication format and be stored in a non-deduplication format. In another example method, data is sent to the tape in such a way that the data sent together is only deduplicated with the other parts of the group. For such operations, the data is processed to find matches within the group, which are saved as a unit. For example, one such method focuses on how to save the set of data in a manner that reduces recovery time. In another example, a full backup and a series of incremental data are created and saved together as a unit (referred to as a SILO). The limitation of this approach is that a closed set is required. Such closed groups require extensive computational preprocessing and significantly reduce the deduplication rate. For example, a SILO method with monthly full data (assuming that 2% of the daily variation falls partially over previous variations) will achieve a deduplication rate of approximately 1:1 because the incremental data has a lower common chunk with the full data (i.e., will not be included in the incremental data if they are not altered). The primary benefit is to delete the duplicate data from each other from the full backup. Month full data and daily delta data for a year will achieve a ratio of about 1:12, as full data is also much larger (in the conventional case).
An exemplary physical structure of tape storage is now described. The magnetic tape includes tracks on which data is written and from which data is read. The tracks follow the length of the tape from the beginning of the tape to the end of the tape. A magnetic head for a magnetic tape drive is used for reading and writing, the magnetic head comprising a plurality of read and write elements to facilitate reading/writing a plurality of adjacent tracks. Such a collection of adjacent tracks is called wrap. In addition, the magnetic tape is divided into a plurality of zones. The tape drive head covers the width of a single zone. Each zone is made up of an even number of wraps. Data is recorded on the tape in a linear serpentine fashion, with a first wrap written from the beginning to the end, then a second wrap written from the end to the beginning, then a third wrap written from the beginning to the end, and so on. The recording and reading directions are the directions of the associated wrap. The wraps 0, 2, 4 were made from the beginning of the tape to the end of the tape, while the wraps 1,3, 5 were made from the end of the tape to the beginning of the tape. The tape zone consists of an equal number of beginning to end wraps and end to beginning wraps. In LTO 9 tape (tape constructed according to release 9 of the linear tape open specification), each wrap has 32 tracks, each zone has 52 wraps (26 head-to-tail wraps and 26 end-to-head wraps) and 4 zones. In modern magnetic tapes, the length of the tape (i.e., the length of each wrap) may exceed 1 km (1035 meters in LTO 9).
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The invention is capable of other embodiments or of being practiced or of being carried out in various ways.
The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions that cause a processor to perform aspects of the invention.
A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or to an external computer or external storage device over a network such as the internet, a local area network, a wide area network, and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (local area network, LAN) or a wide area network (wide area network, WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including programmable logic circuitry, field-programmable gate array (FPGA) GATE ARRAY, programmable logic array (programmable logic array, PLA), or the like, may execute the computer-readable program instructions by utilizing state information of the computer-readable program instructions to customize the electronic circuitry to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may be performed out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Referring now to FIG. 1, FIG. 1 is a flow diagram of a method for computing a combination of fragments of target data with a predicted total recovery time less than a threshold to deduplicate components of a system 100 storing target data 112A on a sequential storage medium 160 accessed by a sequential access storage device 150, according to some embodiments. Referring also to FIG. 2, FIG. 2 is a block diagram of components of a system 200 for computing a combination of segments of target data with a predicted total recovery time less than a threshold for deduplication storage on a sequential storage medium (e.g., tape) 252, according to some embodiments. Referring also to fig. 3A, fig. 3A is a flow chart of a method of deduplication storage on a sequential storage medium according to some embodiments. Referring also to fig. 3B, fig. 3B is a flow chart of a method of computing a combination of fragments of target data for deduplication storage on a sequential storage medium, according to some embodiments. Referring also to fig. 4, fig. 4 is a method of reading a combination of fragments of target data written to have a predicted total recovery time less than a threshold, in accordance with some embodiments. Referring also to fig. 5, the schematic diagram 502 of fig. 5 helps explain why the linear recall is very inefficient, and the other schematic diagram 504 helps explain why the ordered recall is more efficient. Referring also to fig. 6, fig. 6 includes a schematic diagram that helps explain the calculation of predicted seek times for different combinations of identified repeating segments according to some embodiments described herein.
The components of the systems 100 and/or 200 may implement the actions of the methods described with reference to fig. 3A and 3B and fig. 4-6, e.g., code instructions (e.g., code 106A) stored in memory 106 are executed by one or more processors 102 of computing device 104.
The computing device 104 obtains the target data 112A for deduplication storage on the sequential access storage device 150, optionally on a medium 160 accessed by the sequential access storage device 150.
The sequential access storage device 150 and/or the medium 160 may be referred to herein as sequential access memory and/or non-random access memory, such as tape layers.
The medium 160 may be referred to herein as a sequential access storage medium and/or a non-random access storage medium. The sequential access storage device sequentially reads and/or writes data, i.e., does not provide random access (e.g., in random access memory) that reads and/or writes data in any order. Media 160 may comprise a magnetic storage medium of a cartridge type, such as a magnetic tape, a sequential access memory (sequential access memory, SAM), and/or an optical storage medium. Sequential access storage devices may include tape drives and/or optical disk drives for optical storage media.
For example, target data 112A may be obtained from client 112 and/or server 110. The computing device 104 processes the target data 112A, for example, performs computations to segment, hash, and/or maintain the different data structures accessed, as described herein. Global index 116A is accessed to identify which of media 160 may store a repeating segment of target data 112A, as described herein. For each identified medium, a corresponding target-specific deduplication mechanism 150A is accessed to validate the duplicate data segments of the target data 112A. The selected repeated segments, new writes of the repeated segments, and non-repeated segments are stored on the identified media 160. The global index 116A, global file map 116B, associated media-specific deduplication mechanism 150A, and associated identification table 150B (e.g., metadata) are updated to indicate the presence of duplicate segments and/or newly stored non-duplicate segments to enable reading of segments of target data below a time threshold.
For example, the computing device 104 may be implemented as one or more of a computing cloud, a cloud network, a computer network, one or more virtual machines (e.g., hypervisors, virtual servers), network nodes (e.g., switches, virtual networks, routers, virtual routers), a single computing device (e.g., client terminal), a set of computing devices arranged in parallel, a network server, a web server, a storage server, a local server, a remote server, a client terminal, a mobile device, and a fixed device.
Different architectures of system 100 may be implemented. For example, computing device 104 may act as a server providing centralized storage for target data 112A obtained from different client terminals 112 and/or servers 110. In another example, computing device 104 may act as a local manager of medium 160 for local storage of target data 112A.
Computing device 104 includes one or more processors 102, e.g., implemented as one or more central processing units (central processing unit, CPUs), one or more graphics processing units (graphics processing unit, GPUs), one or more field programmable gate arrays (field programmable GATE ARRAY, FPGAs), one or more digital signal processors (DIGITAL SIGNAL processors, DSPs), one or more application-specific integrated circuits (ASICs), one or more custom circuits, processors coupled to other units, and/or special-purpose hardware accelerators. One or more processors 102 may be implemented as a single processor, a multi-core processor, and/or a processor cluster for parallel processing (which may include homogeneous and/or heterogeneous processor architectures). It is noted that one or more processors 102 may be designed to implement in hardware one or more features stored as code instructions 106A.
Memory 106 stores code instructions executable by one or more processors 202, such as random access memory (random access memory, RAM), dynamic random access memory (dynamic random access memory, DRAM), and/or storage class memory (storage class memory, SCM), non-volatile memory, magnetic media, semiconductor memory devices, hard drives, removable memory, and optical media (e.g., DVD or CD-ROM). For example, the memory 106 stores one or more of the code 106A, which when executed by the processor 102, implements one or more features and/or acts of the methods described with reference to fig. 3A and 3B and fig. 4-6. A sufficiently large memory may be selected to store the global index 116A.
Computing device 104 can include a data storage device 116 for storing data, e.g., a queue, a global index 116A, a global file map 116B, and/or one or more media-specific deduplication mechanisms 150A that accumulate target data 112A for storage on media 160, each of which can be associated with an identification table 150B, as described herein. For example, the data storage device 116 may be implemented as a Solid State Disk (SSD) STATE DRIVE, a fast storage layer of memory, local hard drives, virtual memory, solid state devices, storage devices, etc., and/or as a remote server and/or computing cloud (e.g., accessed using a network connection).
The computing device 104 communicates with one or more sequential access storage devices 150 that are accessed (i.e., read and/or written) on one or more media 160. For example, sequential access storage device 150 may be implemented as a tape drive. Sequential access medium 160 may be accessed, for example, as a tape library, a cartridge (e.g., a cartridge in a drive and/or a cartridge in a slot), such as an LTO-9 cartridge storing 18 terabytes (terabyte, TB). Sequential access storage device 150 may include built-in compression, e.g., 1:2.5 average as in the LTO-9 specification.
Computing device 104 may include one or more of a network interface 118 for connecting to network 114, such as a network interface card, a wireless interface to a wireless network, a physical interface for connecting to a cable for a network connection, a virtual interface implemented in software, network communication software providing higher-level network connections, and/or other implementations.
For example, the network 114 may be implemented as the Internet, a local area network, a virtual private network, a wireless network, a cellular network, a local bus, a point-to-point link (e.g., wired), and/or combinations thereof.
For example, computing device 104 may communicate with sequential-access storage device 150 (e.g., for reading data from medium 160 and/or storing data on medium 160) via network interface 118 over network 114 and/or another interface, such as a virtual and/or physical interface, using a wireless and/or physical cable link.
The computing device 104 may include one or more physical user interfaces 108 and/or be in communication with one or more physical user interfaces 108, the one or more physical user interfaces 108 including mechanisms that facilitate user input of data and/or viewing of data. For example, the exemplary user interface 108 includes one or more of a touch screen, a display, a keyboard, a mouse, voice activated software using a speaker and microphone.
Referring back now to fig. 2, the components described with reference to system 200 may correspond to, combine, include, and/or may represent an exemplary implementation of the components of system 100 described with reference to fig. 1. The system 200 may include a flash storage layer 202, the flash storage layer 202 including one or more flash data storage devices 204, such as SSDs. For example, the flash storage layer 202 may store target data that is being processed for storage on the tape storage layer 250. It should be noted that magnetic tape represents an exemplary implementation of a sequential access storage device and/or sequential storage medium, and that magnetic tape drives and/or magnetic tape cartridges are not necessarily limiting of other implementations of sequential access storage devices and/or sequential storage media. The system 200 includes a tape storage layer 250, the tape storage layer 250 including a global index 216A (e.g., a global sparse index) as described herein. The global sparse index 216A may be stored on a fast data storage device (e.g., SSD 206). The tape storage layer 250 includes a plurality of magnetic tape cartridges 252, wherein each magnetic tape cartridge 252 is read and/or written to by a tape drive (not shown). Each cartridge 252 is associated with a tape specific deduplication mechanism, as described herein. For example, each tape-specific deduplication mechanism may include a sparse index 254 and a full index 256, as described herein. It should be noted that other deduplication mechanisms described herein may be used, with sparse and full indexes representing one example that is not necessarily limiting. Each set of sparse index 254 and full index 256 may be stored, for example, on fast data storage 206.
Referring back now to FIG. 3A, in 302, target data is obtained for storage on a medium of a sequential access storage device. The target data may be, for example, data for archiving.
In 304, the target data may be processed by dividing the target data into segments, computing fingerprints of the segments, and selecting a representation of the segments.
The target data may be divided into smaller portions, referred to herein as segments, i.e., the target data is divided into data segments (target segments).
For example, the target data may be divided into constant size data segments and/or variable size data segments. A constant size segment is a process that divides data into fragments of exactly the same size (e.g., 16KB or 32KB or other sizes). It should be noted that in a deduplication based system, constant segmentation is not widely used, because a small number of insertions or deletions of several bytes will change all fragments from the change point, and no deduplication will occur. Variable-sized segmentation is the process of dividing data into segments according to a window of data. For example, a rolling hash function is used for rolling windows of 64 to 256 bytes. When the hash value of the rolling hash satisfies a certain condition, the fragment is cut, and a new fragment is started. The condition may be selected such that the minimum/average/maximum size of the fragments is a desired value.
A representation of the target data may be extracted. Optionally, samples of the data segments are identified and a representation is calculated for the sampled data segments.
As used herein, the term fingerprint (or data fingerprint) refers to a representation extracted from target data and/or from data stored on a sequential access medium as described herein. For example, the fingerprint may be extracted by feeding a data fragment sampled from the target data into a strong hash function (e.g., SHA1 outputting a 20 byte cryptographic hash, or a newer version of the secure hash algorithm (secure hash algorithm, SHA) outputting a longer hash value). The fingerprints are used to quickly compare target data to fingerprints of one or more indexes of a global index and/or media-specific deduplication mechanism, i.e., rather than comparing actual target data on a byte-by-byte basis. Note that the chance of false positive comparison may be selected. For example, when using a strong hash, the chance of false positives is very low (for 1PB storage, fragments are about 16 Kilobytes (KB), yielding 2 36 fragments, with SHA1 signed 20 bytes = 2 160, which gives a 1:2 124 chance of false positives).
It should be noted that the target data may be segmented and/or hashed using the same segmentation and/or hashing used by the medium-specific deduplication mechanism of the medium. For example, when disk deduplication and media deduplication use the same value, the segmentation and/or hashing process may be skipped, but in general, the segmentation of disk deduplication may be smaller.
Incoming target data may be written to a fast layer (with or without deduplication), such as a fast data storage device. The target data may be accumulated on the fast data storage device, for example in a buffer. For example, the target data may be moved to the medium according to a threshold and/or data retention policy defined for the fast data storage device. An accumulated amount of target data may be moved into the medium. For example, the target data may be moved in blocks of hundreds of GB at a time. For example, a data storage device storing target data is 10TB in size, and the threshold rule is to move 2TB of target data into the medium every time the data passes the 8TB mark. Or the target data may be written directly to the medium. The fast layer (or HDD within the tape layer, for example) may be used as a scratch pad until sufficient target data is written.
In 306, a global index may be accessed. In some embodiments, the method continues to feature 308 without using a global index.
The global index maps between the representative fingerprint and the medium storing the data piece corresponding to the representative fingerprint. The global index is used for efficient searching to match fingerprints, which indicates the medium with the highest number of possible matching fingerprints for further evaluation to determine the actual match. One or more target segments (e.g., fingerprints thereof) of target data are searched (e.g., looked up) in a global index to determine whether repeated segments of the respective target segments are likely to be stored on one or more media, i.e., match candidates for the one or more target fingerprints. The fingerprints of the target data that may be searched include all fingerprints, a subset of fingerprints selected using the same method for selecting a sparse index entry for a global index entry and/or medium-specific deduplication mechanism, or a combination thereof, e.g., if the number of matches found is insufficient.
The global index enables quick searching of target data set segments to quickly identify which media require more careful inspection to verify the storage of duplicate data segments. For example, an operation without a global index would require access to all indexes of all deduplication mechanisms for all media, which is computationally intensive and/or inefficient in finding duplicate segments of a large number of data segments and/or on a large number of media storing a large amount of data.
Search results are obtained that indicate a subset of media that is likely to store matching data segments that are duplicates of the target data segment. For example, a list of media is obtained, where each media indicates how many target data segments are likely to have duplicate segments stored on the media. The best matching medium may be selected, for example, a predefined number of media having the highest number of matches (indicating the highest number of data segments that repeat with the target segment), such as the best matching 4 or 5 (or other number) media. Other searches of the medium-specific deduplication mechanism may be limited to best matching mediums to avoid lengthy search times and/or to increase the efficiency with which the processor performs the search.
For example, as described herein, the global index may be stored on a high performance storage device (e.g., flash memory, e.g., RAM, SSD, etc.) in communication with a processor accessing the global index. The global index may be stored on a fast memory to increase memory utilization and/or system performance by enabling a fast lookup to quickly identify which media are relevant for further analysis to verify the storage of duplicate segments of the corresponding target segment.
Alternatively, the global index is implemented as a sparse index. The sparse index includes a sparse obtained data segment (e.g., its fingerprint) selected from each medium, i.e., a subset of the fingerprints of the data segments stored on the medium. For example, 2 fingerprint entries are selected every 4 megabytes (megabyte, MB) of data or other value. The sparse design of the global index supports storing samples of data segments from all data segments stored on all media, which can be very large, for quickly identifying which media may store duplicate segments.
The global index (optionally, sparse index) may be implemented as a key-value database (KVDB) and/or other searchable data structure that includes entries for all media storing related data entries. For example, the search key may be a fingerprint (or portion thereof), or a weak hash created from a fingerprint. For each key there may be multiple values indicating one or more media where it is possible to store a repeated segment of the data segment represented by the key, i.e. the same data segment may be stored on one or more media.
The sparse index may map a weak hash of the target data segment's fingerprint onto one or more media, with some probability of false positive results that require verification to determine whether the target data segment is actually stored on the mapped media. The weak hash may be implemented as part of a strong hash (e.g., bytes 10 through 17 in a 20-byte full hash). Weak hashing may be used to detect areas of data stored on a medium that have a high probability of containing similar data. It should be noted that weak hashes are not necessarily used to determine the actual identity of the duplicate data, as the false positive opportunities are not necessarily low enough to confirm the presence of the duplicate data. But since the number of entries in the global index is large enough (e.g., in the range of 10s x G), even if a 64-bit hash is used, the collision probability is low enough to be sent to the correct storage area.
The validation of the storage of the target fragment on the medium is provided by a medium specific deduplication mechanism mapping the medium, as described herein. The relatively small weak hash design supports a medium that quickly determines the most likely repeated segment of the stored target data. The use of a relatively small weak hash may increase the utilization of memory storing the global index.
The data segments represented in the global index are sampled from data segments represented in a plurality of medium-specific deduplication mechanisms of a plurality of media. Sampling the data segments represented in the plurality of media-specific deduplication mechanisms for inclusion in the global index enables efficient coverage of large amounts of data stored in the plurality of media while improving utilization of the memory storing the global index.
The global index provides compatibility for different implementations of the deduplication mechanism for different media independent of the implementation of the deduplication mechanism for each medium.
The value of the segment size and/or resolution of the global index may be configurable.
An exemplary calculation using the global index as a sparse index is now provided. The global index includes a subset of index entries of a medium-specific deduplication mechanism of multiple mediums. There are M entries per 10GB, for example 10GB/M >1GB/N. Each entry of the global index may include the following fields [ weak hash (8 to 16 bytes), media ID (4 bytes) ].
In 308, a media-specific deduplication mechanism is accessed for each respective media identified by the global index as potentially storing one or more duplicate segments of the target data segment thereon.
Each respective media-specific deduplication mechanism may be stored on a high-performance storage device in communication with the specific media to improve memory utilization and/or system performance by enabling a quick lookup to quickly ascertain which media actually stores the duplicate segments of the target data segment. For example, each media-specific deduplication mechanism may be stored on its own independent high-performance data storage device, which may enable parallel access to multiple media-specific deduplication mechanisms. In another example, multiple media-specific deduplication mechanisms may be stored on a common high-performance data storage device.
An exemplary method for searching for a media-specific deduplication mechanism is now described. A sparse index of media-specific deduplication mechanisms is searched for a representation (e.g., a fingerprint) of the target data. For example, searching for all fingerprints, searching for a subset of fingerprints that may be stored by the medium as indicated by the global index, searching for a subset of fingerprints that are selected using the same method used to select the fingerprints for inclusion in the sparse index, or a combination thereof, e.g., if an insufficient number of matches are found. The search results indicate which areas of the medium are likely to store data segments that are duplicative of the target data segment. The entries of the full index corresponding to the matching entries of the sparse index are read and compared to the target data segment (e.g., its fingerprint) to determine if a matching entry exists. The matching item indicates a repeated segment of the target data segment stored on the medium.
An exemplary implementation of a searchable media-specific deduplication mechanism is now described. It should be noted that different implementations of different media-specific deduplication mechanisms for different media are possible. The media-specific deduplication mechanism may include a full index that includes all representations (e.g., fingerprints) of all data segments stored on the media. For all data segments stored on the medium, the full index may map a strong hash (e.g., fingerprint) of the data segments stored on the medium to an offset storage location on the medium. The full index may be stored on a fast data storage device. An exemplary calculation of the full index size is now provided. Number of entries per complete media=45 TB/fragment size (45 KB) =about 1G entries. Each entry includes the following fields [ ID (8 bytes), offset in the medium (8 bytes), size (4 bytes), hash (20 bytes SHA 1) ]. Without optimization, each entry is 40 bytes. The total size of the complete media (e.g., cartridge) is 40GB.
Since a full index is typically too large for efficient and/or fast searching, one or more of the following data structures (associated with the full index) may be accessed before the full index is accessed:
* A sparse index includes samples that point to representations of locations in the full index that may include one or more fingerprints of target data. The full index is then accessed to compare the full fingerprints to confirm whether duplicate segments of the target data segment exist. An exemplary calculation of sparse index size is now provided. The sparse index of the medium is a ratio of N hashes (e.g., partial fingerprints of strong hashes) per GB of data. For example, the hash may be selected based on a local minimum or maximum (this ratio may be adjusted as a trade-off between accuracy and size). The number of entries in the sparse index is 45tb×n/1 gb=45 k×n. Each entry includes the following fields [ weak hash (8 to 16 bytes), block ID (4 bytes) in tape ]. The total size of the sparse index for each complete medium is <1mb×n.
The number of data segments represented in the sparse index may be a defined percentage of the number of data segments represented in the full index, e.g., every 100 data segments of the full index, approximately 1 data segment in the sparse index, or other value. The number of data segments represented in the global index may be a defined percentage of the number of data segments represented in the sparse index of the medium, e.g., every 100 to 1000 data segments of the sparse index of the medium, approximately 1 data segment in the global index.
For example, the media sparse index may reside as an array in a flash disk and/or as a separate table in a global sparse index DB (e.g., using a flash data storage device).
* A weak hash table includes a mapping from a weak hash of a data segment (e.g., a portion of a full fingerprint) to a segment ID of the data segment in the full index. When a weak hash is found in the weak hash table, a series of IDs around the found ID may be loaded into the cache and the fingerprint may be found in the cache. If a weak hash is not found, the process may return to the weak hash table until all the non-found fingerprints are searched in the weak hash table.
* Bloom filters indicate whether a fingerprint is likely to be stored in the full index. Bloom filters may give false positive answers.
In 310, one or more media are selected for deduplication storage of target data based on the analysis of the matching results.
Alternatively, a single identified medium is selected for deduplication storage of target data. A single medium may be selected that has the largest number of stored data segments that match (i.e., are duplicate segments of) the target data segment. A single tape may efficiently store target data using a deduplication machine.
Or a combination of two or more identified media is selected for deduplication storage of target data. Parallel write access to a medium may be achieved using multiple media, which may be faster than multiple sequential accesses to the same single medium. The number of media in the combination may be limited (e.g., by a maximum value) to avoid too much media being involved in the case of data recovery, which may be inefficient.
The use of multiple media may result in higher deduplication rates than a single media without necessarily increasing the time to perform recovery, for example, when multiple media match more target segments than a single media. Furthermore, using multiple media may increase memory utilization and/or reduce memory consumption by reducing instances of multiple media storing the same data segment in a global sparse index.
The combination of identified media may be selected to achieve a highest target deduplication rate for the target data under defined constraints. Constraints for selecting combinations of identified media may be, for example, dividing the target data into predefined units (e.g., files), and/or writing divisions of the same units to the same media in order to recover the predefined units from a single media. Ensuring that partitions of the same data unit (e.g., file) are stored on the same medium allows the unit to be efficiently recovered from a single medium. For example, it is more efficient to ensure that the same file is recreated from a single cartridge and that the recreated file can be accessed faster than multiple cartridges are accessed to recreate the file, in contrast to being less efficient to restore the file across multiple different cartridges.
At 312, the processor determines a combination of segments that predicts a total recovery time less than a threshold. The combination includes one or more of one or more identified repeating segments, one or more overwriting segments, one or more new data segments. The overwrite segments have the same data as the identified repetition segments, wherein the overwrite segments are designated for being overwritten onto the medium as new data, e.g. over available space (e.g. appended to the end of existing written data). The cost of overwriting the segments is to use additional storage space on the medium, which has the advantage of reducing seek time so that a predicted total recovery time less than a threshold is obtained compared to seek time of existing copy segments. The new data segment (i.e., non-duplicate segment) includes data in duplicate segments that are not currently stored on the medium, i.e., the non-duplicate target segment is the portion of the target medium where duplicate segments (i.e., no matching data segments) are not found on the selected medium by searching the global index and/or the medium-specific deduplication mechanism. The new data fragment is designated for being written on the medium for the first time as new data.
Referring back now to FIG. 3B, an exemplary method for determining segment combinations is described.
In 350, a threshold is received. The threshold value may be obtained by, for example, manual input by a user, reading from memory, and/or automatic calculation. The threshold may be defined for each target data or may be defined globally for all target data. The threshold may be defined for each client or globally for a plurality of clients.
The threshold may be defined in accordance with a service level agreement (SERVICE LEVEL AGREEMENT, SLA). The SLA may be used to customize the threshold for each user. For example, different users select different values.
The threshold may define a maximum time from receipt of a recovery request to reconstruction of target data from the read fragment and ready. The threshold may represent a time of relevance from a user perspective, e.g., how long the user needs to wait from requesting the target data to the target data being available to the user.
In implementations where the sequential access storage device is a tape drive and the medium is a tape cartridge, the predicted total recovery time may be calculated by determining the order in which the segments of the target data are read and based on one or more (optionally all) of the parameters of maximum seek time for the first segment, seek time between segments based on the order for reading, read throughput, read head wrap change, and tape library loading time. These parameters depend on the connection type (e.g., cFC, SAS), tape drive/tape library type, inserted tape cartridge type (LTO 9, LTO 8.) and drive compression on/off, providing a custom selection of which duplicate segments to use and which duplicate segments to overwrite as new data.
The order of reading may be calculated based on a nearest neighbor method that takes into account the locations of the segments on different wraps that are physically close to each other and minimizes the total distance traversed between the segments by the read head. Moving the head of the tape drive across multiple wraps may be faster than moving the head of the tape drive across the length of one wrap. The predicted total recovery time may be further reduced by selecting different movements of the head across the wrap tape to change the order of reading. In addition, moving the head of the tape drive across the tape is more efficient than running the tape back and forth (also referred to as "shoe-cleaning") and greatly increases the life of the tape. In addition, moving the head of the tape drive across the tape reduces the number of stops and restarts of tape movements, improves the efficiency of tape movements and increases the durability of the tape.
Exemplary methods for calculating the optimal read order of a tape include a Time-based access order system (Time-based Access Order System, TAOS) and a recommended access order (Recommended Access Order, RAO).
Fig. 5 explains how the read time of a fragment based on the nearest neighbor method is lower than the linear read.
At 352, the processor may aggregate the repeating segments that are physically close to each other on the identified medium. The intervals between the repeated segments including data not related to the target data are included in the cluster. The combinations described herein are calculated on a cluster basis rather than a single segment.
As used herein, the term repeating segment may refer to a cluster of repeating segments. For example, the combinations described herein may include clusters of repeating segments, rather than individual repeating segments.
Aggregation reduces the number of elements in the combination, thereby reducing the total number of possible combinations, optionally to a smaller number. The small number of combinations enables efficient selection of one combination. For example, when 100 repeating segments are found and clustered into 4 clusters, it is computationally more efficient to find the best combination using 4 elements than to use 100 elements.
The predicted total recovery time is calculated from seek times between clusters. When reading data within a cluster in sequence, the total read and seek time of the cluster is shorter than the non-aggregated segments. This is because when fragments in a cluster are close to each other, reading the entire data of the cluster sequentially (including the non-relevant data between the fragments) is faster than stopping at the end of the fragments, skipping the non-relevant data between the fragments, and seeking to the beginning of the next fragment—the latter is much slower. For example, for a tape that reads 32MB to 64MB of data in advance after reaching a relocation, the time to read a single segment of a cluster of 32MB is the same.
At 354, the processor may calculate one or more candidate combinations. A predicted total recovery time for each candidate combination is calculated.
Calculating the different combinations enables the selection of the best combination, providing the highest repetition rate, while keeping the predicted total recovery time below the threshold.
Different methods may be used to calculate the candidate combinations.
In one approach, multiple candidate combinations are computed sequentially and/or in parallel, where each combination includes a different repeating segment and a different overwriting segment. Each combination includes a different subset of the repeating segments (e.g., different numbers, different data), wherein identified repeating segments that are excluded from the subset are classified as overwriting segments. Excluding some identified repeating segments, but writing them sequentially with non-repeating segments, reduces the total recovery time, keeping the predicted total recovery time below the threshold.
When aggregation is performed and the number of clusters is small, all cluster combinations are optionally calculated. When the number of elements (e.g., clusters, fragments) is so large that evaluation of all combinations is computationally prohibitive (e.g., the time of computation is too long), a small number of candidate combinations may be evaluated, e.g., randomly selected and/or using heuristics. For each candidate combination, a predicted total recovery time is calculated.
In another method, a baseline recovery time is calculated for reading segments of the target data using a baseline set of identified repeated segments (optionally, all identified repeated segments). The baseline recovery time represents the recovery time at maximum repetition. When the baseline restoration time is less than the threshold, the identified repeated segments (e.g., all segments) are used. When the baseline restoration time is greater than the threshold, a difference between the threshold and the baseline restoration time is calculated. One or more identified repeating segments are iteratively deleted from the baseline set to obtain a candidate set. Identified duplicate segments deleted from the baseline/candidate set will be added as overwrite segments to the other set. The predicted total recovery time is calculated for a combination of the candidate set of repeating segments and the other set of overwriting segments. The iteration is continued until the predicted total recovery time of the candidate set is less than a threshold. One or more of the deleted repeated segments may be added back to the candidate set in an attempt to minimize the gap between the total recovery time of the candidate set and the threshold.
At 356, a candidate combination is selected that predicts a total recovery time less than a threshold and has a maximum repetition rate.
Assuming that the reading of multiple media is done in parallel, when two or more media are identified, a respective combination is selected for each media for which the respective predicted total recovery time is less than a threshold. In an environment of repeated data stored on multiple media read in parallel, the total reconstruction time remains below the threshold.
Referring back to FIG. 3A, in 314, the combined overwrite segments and/or new data segments (i.e., non-duplicate target segments) are written onto the identified media, optionally sequentially written over the available space of the identified media, which segments may be appended to the end of the existing write data.
Sequentially writing the repeated segments and/or the new data provides a faster sequential data read time than reading the repeated segments, which results in a longer read time due to the medium and/or read head moving to different positions of the repeated segments.
In the case of a selected combination of media, the storage on the different media may be performed in parallel.
At 316, the process instructs to write metadata indicating metadata of the locations of the segment combinations on the medium for reading and obtaining a predicted total recovery time less than a threshold. The metadata provides the positions of the segments of the combination to enable the predicted total recovery time of less than the threshold to be obtained in a read phase.
The processor is further configured to instruct to read the metadata for determining a location of the combined segments to read to obtain a predicted total recovery time less than a threshold.
In 318, a media-specific deduplication mechanism of the selected media is updated to indicate the identified duplicative segments of the targets stored on the selected media, as well as to indicate the newly stored overwrite segments and non-duplicative segments of the target data.
In the case of a combination of selected media, the updating of the different media-specific deduplication mechanisms may be performed in parallel.
The complete index and associated data structures (e.g., sparse index, weak hash table, bloom filter) of each medium-specific deduplication mechanism associated with the selected single medium or combination of media are updated.
To make the data readable again, the updating includes adding metadata that indicates how to construct the target data from different segments on a single medium or combination of media, including newly added non-duplicate segments, overwritten segments, and pre-existing duplicate data segments. The metadata may be a mapping of data segments stored on the medium that are combined to reconstruct the target data. Metadata may be stored on the flash layer.
The identification table of the media specific deduplication mechanism is updated to store the metadata for each new segment identifier. Each metadata identifying each data segment stored in the table may include, for example, one or more of an indication of the medium storing the data segment, an offset of the medium storing the data segment, a size of the data segment, a fingerprint. The identifier of the medium may be included in the metadata and/or calculated using the fingerprint. The fingerprint included in the metadata is a fingerprint used to determine the medium, e.g., as described herein.
At 320, the global index is updated to indicate the mapping between segments of the target data.
At 312, a global file map for storing identifiers of segments of the target data is updated. The identifier may be a code, such as a number, letter, symbol, and/or combinations thereof, that uniquely identifies each data segment.
Referring back to FIG. 4, in 402, the processor receives a request to read target data stored as fragments on one or more media of a sequential access storage device.
In 404, the processor reads metadata (which was written during the write phase, as described herein) to obtain the location of the fragment of the target data on the medium.
Fragments of the target data are stored as combinations selected during the write phase to obtain a predicted total recovery time less than a threshold.
Other relevant data may be read, such as identifiers of data fragments of the target data obtained from the global file map. The metadata for each data fragment may be obtained, for example, from an identification table.
In 406, the processor instructs to read the fragment according to the location obtained from the metadata.
It should be noted that metadata indicates that the re-written segments written during the write phase are read, rather than repeated segments comprising the same data, in order to obtain a predicted total recovery time that is less than a threshold, as described herein.
Optionally, the order in which the segments are read is according to an order selected during writing to obtain a predicted total recovery time that is less than a threshold. The order may be based on a nearest neighbor method determined during the write phase, as described herein.
At 408, the processor recovers the target data from the read fragment. When the writing is completed according to the selected combination, it is expected that data recovery is performed below the threshold.
Referring back now to fig. 5, schematic 502 helps explain why linear recall is very inefficient and schematic 504 helps explain why ordered recall is more efficient. Schematic 502 depicts reading from a tape in a wrap-around order based on the original order of reading of files/clips. This causes the tape to be moved back and forth, which is inefficient and wears the tape, as described herein. Schematic 504 depicts an optimal read sequence based on the nearest neighbor approach, considering that sometimes the head of a tape drive is moved between wraps faster than the head is moved over the length of a wrap. It is visually apparent that the total length traversed by the read head at 504 is much shorter than at 502.
Referring back now to fig. 6, schematic diagrams 602, 604, 606, and 608 that help explain the calculation of predicted seek times for different combinations of identified repeating segments according to some embodiments described herein. Each combination includes an increased number of used repeated segments and a reduced number of repeated segments written sequentially as new data. These combinations increase the deduplication rate, but at the cost of increased seek time. A combination having a maximum seek time less than a threshold and a maximum data de-duplication rate is selected.
It should be appreciated that read throughput and seek time depend on the tape drive and cartridge type. The numbers here are just examples of calculations. The cartridge loading time in the library is omitted for simplicity.
Four combinations of identified repeating and non-repeating segments of target data are depicted in schematics 602, 604, 606, and 608.
Schematic 602 depicts a first combination in which identified repeating segments are not used. All incoming target data 650 is written as a new segment at the end of wrap 2 652, which requires almost a full seek 654 time of about 90 seconds. The data read time was calculated as 10 Gigabytes (GB)/300 Megabytes (MB)/second (S) =33 seconds. The total predicted seek time is 123 seconds.
Schematic 604 depicts a second combination in which approximately 25% de-duplication is achieved. Fragments 135-136 660 are repeated. The first seek 662 to segment 135 increases by about 75 seconds. The second seek 664 from segment 136 to segment 241 increases by about 70 seconds. The data read time was 33 seconds. The predicted total seek time is 178 seconds.
Schematic 606 depicts a third combination in which approximately 40% de-duplication is achieved. Fragments 1-5 670 are repeated. The first seek 672 to segment 1 takes 100 seconds. The second seek 674 from segment 5 to segment 241 increases by about 80 seconds. The data read time was 33 seconds. The predicted total seek time is 213 seconds.
Schematic 608 depicts a fourth combination in which approximately 63% de-duplication is achieved. Fragments 1-5 and fragments 135-136 are repeated. The first seek 682 to segment 1 increases by 100 seconds. The second seek 684 from segment 5 to segment 135 increases by 25 seconds. The third seek 686 from segment 135 to segment 241 increases by 70 seconds. The data read time was 33 seconds. The predicted total seek time is 228 seconds.
The selected combination is the combination that has the highest percentage of deduplication and still meets the threshold (e.g., defined by the SLA). If the threshold is greater than 228 seconds, a fourth combination is selected. If the threshold is between 213 and 228 seconds, a third combination is selected. If the threshold is between 178 and 213 seconds, a second combination is selected. Otherwise, the first combination will be selected.
Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
The description of the various embodiments of the present invention is intended for purposes of illustration only and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to better explain the principles of the embodiments, the practical application, or technical improvements over the technology found in the marketplace, or in order to enable others skilled in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent expiring in this application many relevant deduplication mechanisms and mediums will be developed, the scope of the term deduplication mechanism and medium being intended to include all such new technologies a priori.
The term "about" as used herein means ± 10%.
The terms "including", "having" and their cognate terms mean "including but not limited to". This term includes the term "by. The composition" and "consisting essentially of.
The phrase "consisting essentially of" means that the composition or method may include additional ingredients and/or steps, provided that the additional ingredients and/or steps do not substantially alter the basic and novel characteristics of the composition or method as required.
As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "a complex" or "at least one complex" may include a plurality of complexes, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude features from other embodiments.
The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. Any particular embodiment of the invention may include a plurality of "optional" features unless the features contradict each other.
In the present application, various embodiments of the application may be presented in a range format. It should be understood that the description of the range format is merely for convenience and brevity and should not be construed as a fixed limitation on the scope of the present application. Accordingly, the description of a range should be considered to have specifically disclosed all possible sub-ranges as well as individual values within the range. For example, a description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., as well as individual numbers within that range, such as 1, 2,3, 4, 5, and 6. This applies regardless of the breadth of the range.
When a range of numbers is referred to herein, it is intended to encompass any of the recited numbers (fractional or integer) within the range indicated. The phrases "a range between a first indicated number and a second indicated number" and "a range from a first indicated number to a second indicated number" are used interchangeably herein to mean that all fractions and integers between the first indicated number and the second indicated number are included.
It is appreciated that certain features of the invention, which are, for clarity of description, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as any suitable other embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments unless the embodiments are not described as being without these elements.
The inventors' intent is that all publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. Furthermore, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present application. With respect to the use of section titles, the section titles should not be construed as necessarily limiting. Further, one or more of any priority documents of the present application are incorporated by reference herein in their entirety.

Claims (21)

1.一种用于对数据进行重复数据删除存储的设备(104),其特征在于,包括:1. A device (104) for deduplicating data, comprising: 处理器(102),用于:A processor (102) configured to: 通过访问在多个顺序访问存储设备(150)的多个介质中的每个介质与存储在所述介质上的数据片段之间进行映射的全局索引(116A)来标识至少一个介质(160),所述至少一个介质(160)存储用于存储在所述多个介质上的目标数据(112A)的至少一个片段的重复片段;Identifying at least one medium (160) storing duplicate segments of at least one segment of target data (112A) stored on the plurality of media by accessing a global index (116A) mapping between each of a plurality of media of a plurality of sequential access storage devices (150) and data segments stored on the media; 对于每个相应标识的介质,通过访问介质特定的重复数据删除机制(150A)来标识存储在所述介质上的所述目标数据的片段的重复片段;For each correspondingly identified medium, identifying duplicate segments of segments of the target data stored on the medium by accessing a medium-specific deduplication mechanism (150A); 确定预测总恢复时间小于阈值的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段,其中,所述至少一个重写片段包括用于被重写的至少一个重复片段,所述至少一个新的数据片段包括用于被首次写入的非重复片段;以及Determining a combination for which the predicted total recovery time is less than a threshold, the combination comprising at least one of: (i) at least one duplicate segment, (ii) at least one rewritten segment, and (iii) at least one new data segment, wherein the at least one rewritten segment comprises at least one duplicate segment for being rewritten, and the at least one new data segment comprises a non-duplicate segment for being written for the first time; and 将所述(ii)至少一个重写片段和所述(iii)至少一个新的数据片段中的至少一个写入所述标识的介质上。At least one of the (ii) at least one rewritten segment and the (iii) at least one new data segment is written onto the identified medium. 2.根据权利要求1所述的设备,其特征在于,所述处理器还用于写入元数据,所述元数据指示所述组合在所述介质上的位置,以用于进行读取并获得小于所述阈值的所述预测总恢复时间。2. The device according to claim 1 is characterized in that the processor is also used to write metadata, which indicates the location of the combination on the medium for reading and obtaining the predicted total recovery time less than the threshold. 3.根据权利要求2所述的设备,其特征在于,所述处理器还用于读取所述元数据,以确定所述组合的所述片段的位置,从而进行读取以获得小于所述阈值的所述预测总恢复时间。3. The device according to claim 2 is characterized in that the processor is further used to read the metadata to determine the position of the combined fragments, so as to read to obtain the predicted total recovery time less than the threshold. 4.根据权利要求1所述的设备,其特征在于,所述至少一个重写片段和所述至少一个新的数据片段被顺序写入所述标识的介质的可用空间上。4. The device according to claim 1, characterized in that the at least one overwritten segment and the at least one new data segment are sequentially written to the available space of the identified medium. 5.根据上述权利要求中任一项所述的设备,其特征在于,所述处理器还用于:5. The device according to any one of the preceding claims, characterized in that the processor is further configured to: 计算多个组合,每个组合包括重复片段和重写片段的不同组合;Calculate multiple combinations, each combination including a different combination of repeated segments and rewritten segments; 对于每个组合,计算所述预测总恢复时间;以及For each combination, calculating the predicted total recovery time; and 选择所述预测总恢复时间小于所述阈值且具有最大重复率的组合。The combination whose predicted total recovery time is less than the threshold and has the largest repetition rate is selected. 6.根据权利要求5所述的设备,其特征在于,每个组合包括所述重复片段的不同子集,其中,从子集中排除的标识的重复片段被分类为重写片段。6. The apparatus of claim 5, wherein each combination comprises a different subset of the repeated segments, wherein the identified repeated segments excluded from the subset are classified as overwritten segments. 7.根据上述权利要求中任一项所述的设备,其特征在于,所述处理器还用于:7. The device according to any one of the preceding claims, characterized in that the processor is further configured to: 计算用于使用所述标识的重复片段的第一集合来读取所述目标数据的多个片段的基线恢复时间;calculating a baseline recovery time for reading a plurality of segments of the target data using the identified first set of repeated segments; 计算所述阈值与所述基线恢复时间之间的差;以及calculating a difference between the threshold and the baseline recovery time; and 迭代地从所述第一集合中删除所述标识的重复片段中的至少一个标识的重复片段,直到所述预测总恢复时间小于所述阈值,iteratively deleting at least one of the identified repeated segments from the first set until the predicted total recovery time is less than the threshold, 其中,从所述第一集合中删除的所述标识的重复片段作为重写片段被添加到第二集合中。The identified repeated segments deleted from the first set are added to the second set as rewritten segments. 8.根据上述权利要求中任一项所述的设备,其特征在于,所述阈值限定了从接收到恢复请求到从读取的片段重建所述目标数据并就绪的最大时间。8. The device according to any one of the preceding claims, characterized in that the threshold defines a maximum time from receiving a recovery request to rebuilding the target data from the read fragments and being ready. 9.根据上述权利要求中任一项所述的设备,其特征在于,所述多个顺序访问存储设备包括磁带驱动器,所述多个介质包括盒式磁带(252),所述预测总恢复时间通过确定用于读取所述目标数据的所述片段的顺序以及根据对第一片段的最大寻道时间、根据用于读取的顺序的在片段之间的寻道时间、读取吞吐量、读取磁头绕带(wrap)变化和磁带库加载时间来计算。9. A device according to any of the above claims, characterized in that the multiple sequential access storage devices include tape drives, the multiple media include cassette tapes (252), and the predicted total recovery time is calculated by determining the order of the fragments used to read the target data and based on the maximum seek time to the first fragment, the seek time between fragments based on the order used for reading, the read throughput, the read head wrap variation and the tape library load time. 10.根据权利要求9所述的设备,其特征在于,所述用于读取的顺序基于最近邻方法计算,所述最近邻方法考虑了在物理上彼此接近的不同绕带上的片段的位置,并使通过读取磁头在所述片段之间遍历的总距离最小化。10. The apparatus according to claim 9, characterized in that the order for reading is calculated based on a nearest neighbor method, which takes into account the positions of fragments on different windings that are physically close to each other and minimizes the total distance traversed by the read head between the fragments. 11.根据上述权利要求中任一项所述的设备,其特征在于,所述阈值是根据服务水平协议(SLA)限定的。11. The device according to any of the preceding claims, characterized in that the threshold value is defined according to a service level agreement (SLA). 12.根据上述权利要求中任一项所述的设备,其特征在于,所述处理器还用于:12. The device according to any one of the preceding claims, wherein the processor is further configured to: 更新至少一个标识的介质的至少一个介质特定的重复数据删除机制,以指示所述目标数据的选定组合中包括的所述标识的重复片段、所述至少一个重写片段和所述非重复片段;updating at least one medium-specific deduplication mechanism of at least one identified medium to indicate that the identified duplicate segments, the at least one overwritten segment, and the non-duplicate segments are included in the selected combination of the target data; 更新所述全局索引,以指示在所述访问期间使用的所述目标数据的所述片段之间的映射;updating the global index to indicate a mapping between the segments of the target data used during the access; 更新全局文件映射(116B),以存储所述目标数据的所述至少一个片段的标识符;以及updating a global file map (116B) to store an identifier of the at least one segment of the target data; and 更新标识表(150B),以存储每个标识符的元数据。The identification table (150B) is updated to store metadata for each identifier. 13.根据上述权利要求中任一项所述的设备,其特征在于,当标识出至少两个介质时,为每个介质选择相应预测总恢复时间小于所述阈值的相应组合。13. The device according to any one of the preceding claims, characterized in that when at least two media are identified, a corresponding combination whose corresponding predicted total recovery time is less than the threshold is selected for each medium. 14.根据上述权利要求中任一项所述的设备,其特征在于,所述处理器还用于对在所述标识的介质上的物理上彼此接近的重复片段进行聚集,其中,所述组合包括重复片段的至少一个集群,所述新的数据包括在至少一个重写集群中。14. A device according to any of the above claims, characterized in that the processor is also used to cluster repeated fragments that are physically close to each other on the identified medium, wherein the combination includes at least one cluster of repeated fragments and the new data is included in at least one overwritten cluster. 15.根据权利要求14所述的设备,其特征在于,所述预测总恢复时间是根据集群之间的寻道时间计算的。15. The device according to claim 14, wherein the predicted total recovery time is calculated based on the seek time between clusters. 16.一种用于恢复数据的设备(104),其特征在于,包括:16. A device (104) for recovering data, comprising: 处理器(102),用于:A processor (102) configured to: 接收用于读取目标数据的请求,所述目标数据存储在将所述目标数据存储为多个片段的顺序访问存储设备的多个介质中的至少一个介质上;receiving a request for reading target data, the target data being stored on at least one medium of a plurality of media of a sequential access storage device storing the target data as a plurality of fragments; 读取元数据以获得所述目标数据的所述多个片段在所述至少一个介质上的位置;reading metadata to obtain locations of the plurality of segments of the target data on the at least one medium; 其中,所述目标数据的所述片段被存储为在写入阶段期间为获得小于阈值的预测总恢复时间而选择的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段,其中,所述至少一个重写片段包括被重写的至少一个重复片段,所述至少一个新的数据片段包括首次被写入的非重复片段;wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of the following: (i) at least one duplicate segment, (ii) at least one overwritten segment, and (iii) at least one new data segment, wherein the at least one overwritten segment comprises at least one duplicate segment that is overwritten, and the at least one new data segment comprises a non-duplicate segment that is written for the first time; 根据从所述元数据中获得的所述位置读取所述片段;以及reading the segment according to the position obtained from the metadata; and 从读取的片段恢复所述目标数据。The target data is restored from the read segments. 17.根据权利要求16所述的设备,其特征在于,读取片段的顺序根据在写入期间为获得小于所述阈值的所述预测总恢复时间而选择的顺序。17. The apparatus of claim 16, wherein the order in which the segments are read is based on an order selected during writing to obtain the predicted total recovery time less than the threshold. 18.一种用于对数据进行重复数据删除存储的方法,其特征在于,包括:18. A method for deduplicating data for storage, comprising: 通过访问在多个顺序访问存储设备的多个介质中的每个介质与存储在所述介质上的数据片段之间进行映射的全局索引来标识至少一个介质,所述至少一个介质存储用于存储在所述多个介质上的目标数据的至少一个片段的重复片段(306);identifying at least one medium storing duplicate segments of at least one segment of target data stored on the plurality of media by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and data segments stored on the media (306); 对于每个相应标识的介质,通过访问介质特定的重复数据删除机制来标识存储在所述介质上的所述目标数据的片段的重复片段;for each correspondingly identified medium, identifying duplicate segments of segments of the target data stored on the medium by accessing a medium-specific deduplication mechanism; 确定预测总恢复时间小于阈值的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段(312),其中,所述至少一个重写片段包括用于被重写的至少一个重复片段,所述至少一个新的数据片段包括用于被首次写入的非重复片段;以及Determining a combination for which the predicted total recovery time is less than a threshold, the combination comprising at least one of: (i) at least one duplicate segment, (ii) at least one rewritten segment, and (iii) at least one new data segment (312), wherein the at least one rewritten segment comprises at least one duplicate segment for being rewritten and the at least one new data segment comprises a non-duplicate segment for being written for the first time; and 将所述(ii)至少一个重写片段和所述(iii)至少一个新的数据片段中的至少一个写入所述标识的介质上(314)。At least one of the (ii) at least one rewritten segment and the (iii) at least one new data segment is written to the identified medium (314). 19.一种存储用于对数据进行重复数据删除存储的程序指令(106A)的非暂态介质(106),其特征在于,当所述程序指令由至少一个处理器(102)执行时,使得所述至少一个处理器进行以下操作:19. A non-transitory medium (106) storing program instructions (106A) for deduplicating data, wherein when the program instructions are executed by at least one processor (102), the at least one processor performs the following operations: 通过访问在多个顺序访问存储设备的多个介质中的每个介质与存储在所述介质上的数据片段之间进行映射的全局索引来标识至少一个介质,所述至少一个介质存储用于存储在所述多个介质上的目标数据的至少一个片段的重复片段;identifying at least one medium storing duplicate segments of at least one segment of target data stored on the plurality of media by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and segments of data stored on the media; 对于每个相应标识的介质,通过访问介质特定的重复数据删除机制来标识存储在所述介质上的所述目标数据的片段的重复片段;for each correspondingly identified medium, identifying duplicate segments of segments of the target data stored on the medium by accessing a medium-specific deduplication mechanism; 确定预测总恢复时间小于阈值的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段,其中,所述至少一个重写片段包括用于被重写的至少一个重复片段,所述至少一个新的数据片段包括用于被首次写入的非重复片段;以及Determining a combination for which the predicted total recovery time is less than a threshold, the combination comprising at least one of: (i) at least one duplicate segment, (ii) at least one rewritten segment, and (iii) at least one new data segment, wherein the at least one rewritten segment comprises at least one duplicate segment for being rewritten, and the at least one new data segment comprises a non-duplicate segment for being written for the first time; and 将所述(ii)至少一个重写片段和所述(iii)至少一个新的数据片段中的至少一个写入所述标识的介质上。At least one of the (ii) at least one rewritten segment and the (iii) at least one new data segment is written onto the identified medium. 20.一种用于恢复数据的方法,其特征在于,包括:20. A method for recovering data, comprising: 接收用于读取目标数据的请求,所述目标数据存储在将所述目标数据存储为多个片段的顺序访问存储设备的多个介质中的至少一个介质上(402);receiving a request for reading target data, the target data being stored on at least one medium of a plurality of media of a sequential access storage device storing the target data as a plurality of fragments (402); 读取元数据以获得所述目标数据的所述多个片段在所述至少一个介质上的位置(404);reading metadata to obtain locations of the plurality of segments of the target data on the at least one medium (404); 其中,所述目标数据的所述片段被存储为在写入阶段期间为获得小于阈值的预测总恢复时间而选择的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段,其中,所述至少一个重写片段包括被重写的至少一个重复片段,所述至少一个新的数据片段包括首次被写入的非重复片段;wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of the following: (i) at least one duplicate segment, (ii) at least one overwritten segment, and (iii) at least one new data segment, wherein the at least one overwritten segment comprises at least one duplicate segment that is overwritten, and the at least one new data segment comprises a non-duplicate segment that is written for the first time; 根据从所述元数据中获得的所述位置读取所述片段(406);以及reading the segment according to the position obtained from the metadata (406); and 从读取的片段恢复所述目标数据(408)。The target data is restored from the read segments (408). 21.一种存储用于恢复数据的程序指令(106A)的非暂态介质(106),其特征在于,当所述程序指令由至少一个处理器(102)执行时,使得所述至少一个处理器进行以下操作:21. A non-transitory medium (106) storing program instructions (106A) for recovering data, characterized in that when the program instructions are executed by at least one processor (102), the at least one processor is caused to perform the following operations: 接收用于读取目标数据的请求,所述目标数据存储在将所述目标数据存储为多个片段的顺序访问存储设备的多个介质中的至少一个介质上;receiving a request for reading target data, the target data being stored on at least one medium of a plurality of media of a sequential access storage device storing the target data as a plurality of fragments; 读取元数据以获得所述目标数据的所述多个片段在所述至少一个介质上的位置;reading metadata to obtain locations of the plurality of segments of the target data on the at least one medium; 其中,所述目标数据的所述片段被存储为在写入阶段期间为获得小于阈值的预测总恢复时间而选择的组合,所述组合包括以下中的至少一项:(i)至少一个重复片段,(ii)至少一个重写片段,以及(iii)至少一个新的数据片段,其中,所述至少一个重写片段包括被重写的至少一个重复片段,所述至少一个新的数据片段包括首次被写入的非重复片段;wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of the following: (i) at least one duplicate segment, (ii) at least one overwritten segment, and (iii) at least one new data segment, wherein the at least one overwritten segment comprises at least one duplicate segment that is overwritten, and the at least one new data segment comprises a non-duplicate segment that is written for the first time; 根据从所述元数据中获得的所述位置读取所述片段;以及reading the segment according to the position obtained from the metadata; and 从读取的片段恢复所述目标数据。The target data is restored from the read segments.
CN202280101808.1A 2022-12-15 2022-12-15 Recovering data from sequential storage media Pending CN120188138A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/086169 WO2024125801A1 (en) 2022-12-15 2022-12-15 Restoration of data from sequential storage media

Publications (1)

Publication Number Publication Date
CN120188138A true CN120188138A (en) 2025-06-20

Family

ID=84799592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280101808.1A Pending CN120188138A (en) 2022-12-15 2022-12-15 Recovering data from sequential storage media

Country Status (2)

Country Link
CN (1) CN120188138A (en)
WO (1) WO2024125801A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208820B2 (en) * 2012-06-29 2015-12-08 International Business Machines Corporation Optimized data placement for individual file accesses on deduplication-enabled sequential storage systems
GB2514555A (en) * 2013-05-28 2014-12-03 Ibm Deduplication for a storage system
US9798629B1 (en) * 2013-12-16 2017-10-24 EMC IP Holding Company LLC Predicting backup failures due to exceeding the backup window

Also Published As

Publication number Publication date
WO2024125801A1 (en) 2024-06-20

Similar Documents

Publication Publication Date Title
US10621142B2 (en) Deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system
US10303797B1 (en) Clustering files in deduplication systems
USRE49148E1 (en) Reclaiming space occupied by duplicated data in a storage system
US9880746B1 (en) Method to increase random I/O performance with low memory overheads
US8943032B1 (en) System and method for data migration using hybrid modes
US8949208B1 (en) System and method for bulk data movement between storage tiers
US9715434B1 (en) System and method for estimating storage space needed to store data migrated from a source storage to a target storage
US8639669B1 (en) Method and apparatus for determining optimal chunk sizes of a deduplicated storage system
US9367448B1 (en) Method and system for determining data integrity for garbage collection of data storage systems
US10838923B1 (en) Poor deduplication identification
US10503697B1 (en) Small file storage system
US10459648B1 (en) Change rate estimation
US9996426B1 (en) Sparse segment trees for high metadata churn workloads
US10223377B1 (en) Efficiently seeding small files with certain localities
US12045203B2 (en) Systems and methods for physical capacity estimation of logical space units
US20240037034A1 (en) Data intake buffers for deduplication storage system
US10845994B1 (en) Performing reconciliation on a segmented de-duplication index
US20240143213A1 (en) Fingerprint tracking structure for storage system
CN120188138A (en) Recovering data from sequential storage media
CN118159936A (en) Parallel data deduplication mechanism on sequential storage media
CN119678127A (en) Selecting a set of sequential storage media in a deduplication storage system
WO2023241771A1 (en) Deduplication mechanism on sequential storage media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination