CN120188138A

CN120188138A - Recovering data from sequential storage media

Info

Publication number: CN120188138A
Application number: CN202280101808.1A
Authority: CN
Inventors: 亚伊尔·托弗; 阿萨夫·纳塔逊; 阿维夫·库温特; 伊丹·扎克; 迈克尔·斯特恩伯格
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2025-06-20
Also published as: WO2024125801A1

Abstract

A system for managing sequential storage media of multiple sequential access storage devices for deduplicating target data on the sequential storage media is provided. A processor identifies duplicate segments of segments of target data on the medium. The processor determines a combination for which the predicted total recovery time is less than a threshold. The combination includes one or more of the identified duplicate segments, overwritten segments, and new data segments. The overwritten segments are duplicate segments designated to be overwritten on the medium as new data. The new data segments include data that is not currently stored in the duplicate segments on the medium, i.e., non-duplicate segments. The new data segments are designated to be written to the medium for the first time as new data. The processor writes according to the combination instruction, i.e., writes the overwritten segments and/or the new data segments to the identified medium.

Description

Recovering data from sequential storage media

Background

In some embodiments of the invention, the invention relates to storage systems, and more particularly, but not exclusively, to management of deduplication-based storage systems that store data on sequential storage media.

The deduplication (deduplication/dedup) based storage system eliminates redundant storage of the same data block by storing references to previously stored data blocks, rather than storing another duplicate data block of the same data block. But even with deduplication, storage requirements are such that the disk is not large enough to meet the requirements. The emphasis on auxiliary storage and archiving is moving from disks with deduplication capabilities to sequential storage media (e.g., tape technology) due to lower tape prices and/or longer data retention times. Seek times in magnetic tape are relatively long, on the order of tens of seconds, which is the time required for a magnetic tape drive to roll the magnetic tape to a position where data is read. According to the linear tape Open (LINEAR TAPE Open, LTO) 9 standard, the length of the tape (i.e., the maximum seek time) exceeds 1 km. Most applications use tapes without any optimization other than built-in compression, since seek times in the tape are relatively long.

Disclosure of Invention

It is an object of the present invention to provide a computing device, system, computer program product, and method for managing sequential storage media accessed by multiple sequential access storage devices for deduplication storage of target data.

The above and other objects are achieved by the features of the independent claims. Other implementations are apparent from the dependent claims, the description and the drawings.

According to a first aspect, an apparatus for deduplication storage of data includes a processor for identifying at least one medium by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and a data segment stored on the medium, the at least one medium storing a repeating segment for at least one segment of target data stored on the plurality of media, identifying, for each respective identified medium, a repeating segment of the segment of target data stored on the medium by accessing a medium-specific deduplication mechanism, determining a combination of a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one rewriting segment, and (iii) at least one new data segment, wherein the at least one rewriting segment comprises at least one repeating segment for being rewritten, the at least one first data segment comprises a non-repeating segment for being written for a time, and (iii) at least one of the at least one new data segment identified on the medium.

The predicted total recovery time for reconstructing the target data is calculated based on segment locations, optionally defined by the global index map and/or a medium-specific deduplication mechanism, to select which segments are to be written as new data, and which existing duplicate segments can be used. A selection is made to ensure that the data is written in such a way that the predicted total recovery time is less than the threshold while maximizing the utilization of the repeated segments.

Since repeated segments may be scattered over different locations of the medium of the sequential access storage device, the greater the number of non-sequential areas in which the repeated segments used to reconstruct the target data are located, the longer the total recovery time to read the repeated segments. Reusing the repeated segments to reconstruct the target data improves utilization of existing media resources, but at the cost of longer overall recovery time. In the other extreme, the total recovery time for reading the segments is the shortest in the case where no duplicate segments are used and all segments of the target data are newly written in sequence. The cost of a shorter total recovery time is to use a larger amount of media resources to store the target data. Embodiments described herein balance these two conflicting requirements by selecting a combination of segments of the target data that are predicted to have the total recovery time less than a selected threshold while maximizing utilization of existing duplicate segments such that the predicted total recovery time remains below the threshold.

Rewriting some of the identified repeated segments as new data (i.e., rewritten segments) along with the non-repeated segments (e.g., buffered in other storage media such as an HDD or SSD prior to dumping to the tape) may further improve the performance of the tape drive, e.g., prevent "shoe-cleaning" (back and forth tape movement), and reduce repeated stopping and starting of tape movement, which increases the life of the tape and/or increases the durability of the tape.

According to a second aspect, an apparatus for recovering data includes a processor for receiving a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, reading metadata to obtain a location of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one rewriting segment, and (iii) at least one new data segment, wherein the at least one rewriting segment comprises at least one repeating segment that is rewritten, the at least one new data segment comprises a non-repeating segment that is first written, reading the segment from the location obtained from the metadata, and recovering the target data from the read segment.

When the writing is completed according to the selected combination, it is expected that data recovery is performed below the threshold.

According to a third aspect, a method for deduplication storage of data includes identifying at least one medium storing a repeating segment for at least one segment of target data stored on a plurality of media by accessing a global index mapping between each of the plurality of media and the data segment stored on the media of a plurality of sequential access storage devices, identifying, for each respective identified medium, a repeating segment of the target data stored on the medium by accessing a medium specific deduplication mechanism, determining a combination of a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one overwriting segment, and (iii) at least one new data segment, wherein the at least one overwriting segment comprises at least one repeating segment for being overwritten, the at least one new data segment comprising a non-first repeating segment for being written, and (ii) at least one of the at least one overwriting segment and the at least one new data segment being written on the identified medium.

According to a fourth aspect, a non-transitory medium storing program instructions for deduplication storage of data, the program instructions, when executed by at least one processor, cause the at least one processor to identify at least one medium by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and a data segment stored on the medium, the at least one medium storing a duplicate of at least one segment of target data for storage on the plurality of media, identify, for each respective identified medium, a duplicate of a segment of the target data stored on the medium by accessing a medium-specific deduplication mechanism, determine a combination of predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one duplicate segment, (ii) at least one overwrite segment, and (iii) at least one new data segment, wherein the at least one overwrite segment comprises at least one duplicate segment for being overwritten, the at least one first-time segment comprising at least one new data segment for being overwritten, and (iii) at least one new data segment for being overwritten on the at least one media.

According to a fifth aspect, a method for recovering data includes receiving a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, reading metadata to obtain locations of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of (i) at least one repeating segment, (ii) at least one overwriting segment, and (iii) at least one new data segment, wherein the at least one overwriting segment comprises at least one repeating segment that is overwritten, the at least one new data segment comprises a non-repeating segment that is first written, reading the segment according to the locations obtained from the metadata, and recovering the target data from the read segment.

According to a sixth aspect, a non-transitory medium storing program instructions for retrieving data, which when executed by at least one processor, cause the at least one processor to receive a request to read target data stored on at least one of a plurality of media of a sequential access storage device storing the target data as a plurality of segments, read metadata to obtain locations of the plurality of segments of the target data on the at least one media, wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total retrieval time less than a threshold, the combination comprising at least one of (i) at least one duplicate segment, (ii) at least one overwrite segment, wherein the at least one overwrite segment comprises at least one duplicate segment that is overwritten, the at least one new data segment comprising a first-time written non-duplicate segment, and (iii) retrieve the target data from the read segment from the locations according to the metadata.

In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to write metadata indicating a location of the combination on the medium for reading and obtaining the predicted total recovery time less than the threshold.

The metadata provides the positions of the segments of the combination so as to obtain the predicted total recovery time less than the threshold during a read phase.

In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to read the metadata to determine a location of the combined segments to read to obtain the predicted total recovery time that is less than the threshold.

In a further implementation form of the first aspect, the second aspect and the third aspect, the at least one re-written segment and the at least one new data segment are written sequentially onto the available space of the identified medium.

Sequentially writing the repeated segments and/or the new data provides a faster sequential data read time than reading the repeated segments, which results in a longer read time due to the medium and/or read head moving to different positions of the repeated segments.

In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to calculate a plurality of combinations, each combination comprising a different combination of a repeat section and a overwrite section, calculate the predicted total recovery time for each combination, and select a combination having the predicted total recovery time less than the threshold and having a maximum repetition rate.

Calculating the different combinations enables the selection of the best combination, providing the highest repetition rate, while keeping the predicted total recovery time below the threshold.

In a further implementation form of the first aspect, the second aspect and the third aspect, each combination comprises a different subset of the repeated segments, wherein identified repeated segments excluded from a subset are classified as re-written segments.

Excluding some identified repeating segments, but writing them sequentially with non-repeating segments, reduces the total recovery time, keeping the predicted total recovery time below the threshold.

In a further implementation form of the first, second and third aspects, the processor is further configured to calculate a baseline restoration time for reading the plurality of segments of the target data using the first set of identified repeated segments, calculate a difference between the threshold and the baseline restoration time, iteratively delete at least one of the identified repeated segments from the first set until the predicted total restoration time is less than the threshold, wherein identified repeated segments deleted from the first set are added as overwrite segments to a second set.

A set of repeated segments may be added/deleted based on the prediction result to obtain the optimal segment combination. Different combinations may include different numbers of repeating segments. The remaining fragments included in the combination are re-written repeated fragments and sequentially written non-repeated fragments.

In a further implementation form of the first aspect, the second aspect and the third aspect, the threshold defines a maximum time from receiving a resume request to reconstructing the target data from the read fragment and ready.

The threshold is defined in terms of time associated from a user perspective.

In a further implementation form of the first aspect, the second aspect and the third aspect, the plurality of sequential access storage devices comprises a tape drive, the plurality of media comprises a tape cartridge, and the predicted total recovery time is calculated by determining an order of the segments for reading the target data and based on a maximum seek time for a first segment, based on seek times between segments for the order of reading, read throughput, read head wrap (wrap) variation and tape library loading time.

These parameters depend on the connection type (e.g., cFC, SAS), tape drive/tape library type, inserted tape cartridge type (LTO 9, LTO 8.) and drive compression on/off, providing a custom selection of which duplicate segments to use and which duplicate segments to overwrite as new data.

In a further implementation form of the first, second and third aspects, the order for reading is calculated based on a nearest neighbor method that takes into account the positions of the segments on different reels that are physically close to each other and minimizes the total distance traversed between the segments by the read head.

Moving the head of the tape drive across multiple wraps may be faster than moving the head of the tape drive across the length of one wrap. The predicted total recovery time may be further reduced by selecting different movements of the head across the wrap tape to change the order of reading. In addition, the head of the tape drive is more efficient across the tape than running the tape back and forth (also referred to as "shoe-cleaning") and greatly increases the life of the tape. In addition, moving the head of the tape drive across the tape may reduce the number of stops and restarts of tape movements, improve the efficiency of tape movements and increase the durability of the tape.

In a further implementation form of the first, second and third aspects, the threshold is defined according to a service level agreement (SERVICE LEVEL AGREEMENT, SLA).

The selection of the combination may be in accordance with the SLA associated with the target data. For each target data written to the sequential storage system, the client may select the SLA that defines the longest time. The threshold may be customized, with different values selected by different users.

In a further implementation form of the first, second and third aspects, the processor is further configured to update at least one medium-specific deduplication mechanism of the at least one identified medium to indicate the identified duplicate segments, the at least one overwrite segment and a non-duplicate segment included in the selected combination of the target data, update the global index to indicate a mapping between the segments of the target data used during the accessing, update a global file map to store identifiers of the at least one segment of the target data, and update an identification table to store metadata for each identifier.

In a further implementation form of the first aspect, the second aspect and the third aspect, when at least two media are identified, a respective combination is selected for each media for which the respective predicted total recovery time is less than the threshold.

In the context of deduplicated data stored on multiple media read in parallel, the total reconstruction time remains below the threshold.

In a further implementation form of the first aspect, the second aspect and the third aspect, the processor is further configured to aggregate repeating segments on the identified medium that are physically close to each other, wherein the combination comprises at least one cluster of repeating segments, and the new data is included in at least one overwriting cluster.

Aggregation reduces the number of elements in the combination, thereby reducing the total number of possible combinations, optionally to a smaller number. The small number of combinations enables efficient selection of one combination.

In a further implementation form of the first aspect, the second aspect and the third aspect, the predicted total recovery time is calculated from seek times between clusters.

When reading data within a cluster in sequence, the total read and seek time of the cluster is shorter than the non-aggregated segments. This is because when fragments in a cluster are close to each other, reading the entire data of the cluster sequentially (including the uncorrelated data between the fragments) is faster than stopping at the end of the fragments, skipping the uncorrelated data between the fragments, and seeking to the beginning of the next fragment—the latter is much slower.

In a further implementation of the second aspect, the order in which the segments are read is according to an order selected during writing to obtain the predicted total recovery time that is less than the threshold.

Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, these materials, methods, and examples are illustrative only and not necessarily limiting.

Drawings

Some embodiments of the invention are described herein, by way of example only, with reference to the accompanying drawings. Referring now in specific detail to the drawings, it is emphasized that the details shown are merely illustrative and for purposes of illustrative discussion of embodiments of the invention. In this regard, it will be apparent to those skilled in the art how embodiments of the invention may be practiced in conjunction with the description of the drawings.

In the drawings:

FIG. 1 is a flow diagram of a method for computing a combination of fragments of target data with a predicted total recovery time less than a threshold to deduplicate components of a system storing target data on sequential storage media accessed by a sequential access storage device, according to some embodiments;

FIG. 2 is a block diagram of components of a system for computing a combination of segments of target data with a predicted total recovery time less than a threshold for deduplication storage on a sequential storage medium (e.g., tape), according to some embodiments;

FIG. 3A is a flow chart of a method of deduplication storage on a sequential storage medium according to some embodiments;

FIG. 3B is a flow chart of a method of computing a combination of fragments of target data for deduplication storage on a sequential storage medium, according to some embodiments;

FIG. 4 is a method of reading a combination of fragments of target data written to have a predicted total recovery time less than a threshold, according to some embodiments;

one schematic of FIG. 5 helps explain why the linear loops (recall) are very inefficient, another schematic helps explain why the ordered loops are more efficient, and

FIG. 6 includes a schematic diagram that helps explain the calculation of predicted seek times for different combinations of identified repeating segments, according to some embodiments described herein.

Detailed Description

Aspects of some embodiments relate to systems, methods, computing devices and/or apparatuses and/or computer program products (storing code instructions executable by one or more processors) for managing sequential storage media (also referred to herein as "media") of a plurality of sequential access storage devices for deduplication storage of target data on the media. Sequential access storage devices are not random access, and may be, for example, tape libraries having hundreds of tape drives and thousands of cartridges. The processor divides the target data into a plurality of data segments (also referred to herein as "segments"). The processor identifies repeated segments of one or more segments of target data on one or more media. The processor determines a combination of the predicted total recovery time being less than a threshold (e.g., defined by a service level agreement (SERVICE LEVEL AGREEMENT, SLA)). The combination includes at least one of (i) one or more identified repeating segments, (ii) one or more overwriting segments, (iii) one or more new data segments. The overwrite segments have the same data as the identified repetition segments, wherein the overwrite segments are designated for being overwritten onto the medium as new data, e.g. over available space (e.g. appended to the end of existing written data). The new data segment comprises data that is not currently stored in the repeated segment on the medium, i.e. the new data segment is a non-repeated segment. The new data fragment is designated for being written on the medium for the first time as new data. The processor writes, i.e. writes the re-written fragment and/or the new data fragment on the identified medium according to the combination indication.

An aspect of some embodiments relates to systems, methods, computing devices and/or apparatuses and/or computer program products (storing code instructions executable by one or more processors) for recovering data stored on one or more media. The processor receives a request to read target data. The processor reads the metadata to obtain the location of the segment of target data on one or more media. Fragments of the target data are stored as combinations selected during the write phase to obtain a predicted total recovery time less than a threshold. The processor instructs to read the fragment according to the location obtained from the metadata. It should be noted that, as indicated by the metadata, even if some segments are duplicates, it is possible to read over-write segments instead of duplicate segments in order to obtain a predicted total recovery time that is less than the threshold. The processor recovers the target data from the read fragment.

Writing some of the identified repeated segments as new data (i.e., re-written segments) along with the non-repeated segments (e.g., buffered in other storage media such as an HDD or SSD prior to dumping to the tape) may further improve the performance of the tape drive, e.g., prevent "shoe-cleaning" (back and forth tape movement), and reduce repeated stopping and starting of tape movement, which increases the life of the tape and/or increases the durability of the tape.

At least some embodiments described herein address the technical problem of balancing the conflicting requirements of reducing seek time over deduplicated segments dispersed across one or more different sequential storage devices (e.g., cartridges) and increasing the duplicate data deletion of data to reduce the amount of storage media used. At least some embodiments described herein select a minimum seek time to recover data with possible de-duplication (the best possible de-duplication that is optional or near the best possible de-duplication) that is less than a defined maximum seek time threshold, e.g., as defined by a service level agreement (SERVICE LEVEL AGREEMENT, SLA).

Although different deduplication systems exist, this solution is designed for disk-based storage systems (i.e., random access data storage devices such as hard disks and/or Solid State Disks (SSDs)). Deduplication designed for disk-based storage cannot be easily migrated to tape storage because the amount of data stored by tape storage is large (e.g., at least about 100 times relative to disk storage) and/or the tape differs from the method of operation of disk storage.

There are existing methods of managing data stored on magnetic tape. In the most common approach, no deduplication is used. When data is written to tape, the data will be written in its original (i.e., non-deduplicated) form. Even if the data is moved according to a deduplication format (e.g., on disk), the data will first expand to the original non-deduplication format and be stored in a non-deduplication format. In another example method, data is sent to the tape in such a way that the data sent together is only deduplicated with the other parts of the group. For such operations, the data is processed to find matches within the group, which are saved as a unit. For example, one such method focuses on how to save the set of data in a manner that reduces recovery time. In another example, a full backup and a series of incremental data are created and saved together as a unit (referred to as a SILO). The limitation of this approach is that a closed set is required. Such closed groups require extensive computational preprocessing and significantly reduce the deduplication rate. For example, a SILO method with monthly full data (assuming that 2% of the daily variation falls partially over previous variations) will achieve a deduplication rate of approximately 1:1 because the incremental data has a lower common chunk with the full data (i.e., will not be included in the incremental data if they are not altered). The primary benefit is to delete the duplicate data from each other from the full backup. Month full data and daily delta data for a year will achieve a ratio of about 1:12, as full data is also much larger (in the conventional case).

An exemplary physical structure of tape storage is now described. The magnetic tape includes tracks on which data is written and from which data is read. The tracks follow the length of the tape from the beginning of the tape to the end of the tape. A magnetic head for a magnetic tape drive is used for reading and writing, the magnetic head comprising a plurality of read and write elements to facilitate reading/writing a plurality of adjacent tracks. Such a collection of adjacent tracks is called wrap. In addition, the magnetic tape is divided into a plurality of zones. The tape drive head covers the width of a single zone. Each zone is made up of an even number of wraps. Data is recorded on the tape in a linear serpentine fashion, with a first wrap written from the beginning to the end, then a second wrap written from the end to the beginning, then a third wrap written from the beginning to the end, and so on. The recording and reading directions are the directions of the associated wrap. The wraps 0, 2, 4 were made from the beginning of the tape to the end of the tape, while the wraps 1,3, 5 were made from the end of the tape to the beginning of the tape. The tape zone consists of an equal number of beginning to end wraps and end to beginning wraps. In LTO 9 tape (tape constructed according to release 9 of the linear tape open specification), each wrap has 32 tracks, each zone has 52 wraps (26 head-to-tail wraps and 26 end-to-head wraps) and 4 zones. In modern magnetic tapes, the length of the tape (i.e., the length of each wrap) may exceed 1 km (1035 meters in LTO 9).

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The invention is capable of other embodiments or of being practiced or of being carried out in various ways.

The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions that cause a processor to perform aspects of the invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or to an external computer or external storage device over a network such as the internet, a local area network, a wide area network, and/or a wireless network.

The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (local area network, LAN) or a wide area network (wide area network, WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including programmable logic circuitry, field-programmable gate array (FPGA) GATE ARRAY, programmable logic array (programmable logic array, PLA), or the like, may execute the computer-readable program instructions by utilizing state information of the computer-readable program instructions to customize the electronic circuitry to perform aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may be performed out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Referring now to FIG. 1, FIG. 1 is a flow diagram of a method for computing a combination of fragments of target data with a predicted total recovery time less than a threshold to deduplicate components of a system 100 storing target data 112A on a sequential storage medium 160 accessed by a sequential access storage device 150, according to some embodiments. Referring also to FIG. 2, FIG. 2 is a block diagram of components of a system 200 for computing a combination of segments of target data with a predicted total recovery time less than a threshold for deduplication storage on a sequential storage medium (e.g., tape) 252, according to some embodiments. Referring also to fig. 3A, fig. 3A is a flow chart of a method of deduplication storage on a sequential storage medium according to some embodiments. Referring also to fig. 3B, fig. 3B is a flow chart of a method of computing a combination of fragments of target data for deduplication storage on a sequential storage medium, according to some embodiments. Referring also to fig. 4, fig. 4 is a method of reading a combination of fragments of target data written to have a predicted total recovery time less than a threshold, in accordance with some embodiments. Referring also to fig. 5, the schematic diagram 502 of fig. 5 helps explain why the linear recall is very inefficient, and the other schematic diagram 504 helps explain why the ordered recall is more efficient. Referring also to fig. 6, fig. 6 includes a schematic diagram that helps explain the calculation of predicted seek times for different combinations of identified repeating segments according to some embodiments described herein.

The components of the systems 100 and/or 200 may implement the actions of the methods described with reference to fig. 3A and 3B and fig. 4-6, e.g., code instructions (e.g., code 106A) stored in memory 106 are executed by one or more processors 102 of computing device 104.

The computing device 104 obtains the target data 112A for deduplication storage on the sequential access storage device 150, optionally on a medium 160 accessed by the sequential access storage device 150.

The sequential access storage device 150 and/or the medium 160 may be referred to herein as sequential access memory and/or non-random access memory, such as tape layers.

The medium 160 may be referred to herein as a sequential access storage medium and/or a non-random access storage medium. The sequential access storage device sequentially reads and/or writes data, i.e., does not provide random access (e.g., in random access memory) that reads and/or writes data in any order. Media 160 may comprise a magnetic storage medium of a cartridge type, such as a magnetic tape, a sequential access memory (sequential access memory, SAM), and/or an optical storage medium. Sequential access storage devices may include tape drives and/or optical disk drives for optical storage media.

For example, target data 112A may be obtained from client 112 and/or server 110. The computing device 104 processes the target data 112A, for example, performs computations to segment, hash, and/or maintain the different data structures accessed, as described herein. Global index 116A is accessed to identify which of media 160 may store a repeating segment of target data 112A, as described herein. For each identified medium, a corresponding target-specific deduplication mechanism 150A is accessed to validate the duplicate data segments of the target data 112A. The selected repeated segments, new writes of the repeated segments, and non-repeated segments are stored on the identified media 160. The global index 116A, global file map 116B, associated media-specific deduplication mechanism 150A, and associated identification table 150B (e.g., metadata) are updated to indicate the presence of duplicate segments and/or newly stored non-duplicate segments to enable reading of segments of target data below a time threshold.

For example, the computing device 104 may be implemented as one or more of a computing cloud, a cloud network, a computer network, one or more virtual machines (e.g., hypervisors, virtual servers), network nodes (e.g., switches, virtual networks, routers, virtual routers), a single computing device (e.g., client terminal), a set of computing devices arranged in parallel, a network server, a web server, a storage server, a local server, a remote server, a client terminal, a mobile device, and a fixed device.

Different architectures of system 100 may be implemented. For example, computing device 104 may act as a server providing centralized storage for target data 112A obtained from different client terminals 112 and/or servers 110. In another example, computing device 104 may act as a local manager of medium 160 for local storage of target data 112A.

Computing device 104 includes one or more processors 102, e.g., implemented as one or more central processing units (central processing unit, CPUs), one or more graphics processing units (graphics processing unit, GPUs), one or more field programmable gate arrays (field programmable GATE ARRAY, FPGAs), one or more digital signal processors (DIGITAL SIGNAL processors, DSPs), one or more application-specific integrated circuits (ASICs), one or more custom circuits, processors coupled to other units, and/or special-purpose hardware accelerators. One or more processors 102 may be implemented as a single processor, a multi-core processor, and/or a processor cluster for parallel processing (which may include homogeneous and/or heterogeneous processor architectures). It is noted that one or more processors 102 may be designed to implement in hardware one or more features stored as code instructions 106A.

Memory 106 stores code instructions executable by one or more processors 202, such as random access memory (random access memory, RAM), dynamic random access memory (dynamic random access memory, DRAM), and/or storage class memory (storage class memory, SCM), non-volatile memory, magnetic media, semiconductor memory devices, hard drives, removable memory, and optical media (e.g., DVD or CD-ROM). For example, the memory 106 stores one or more of the code 106A, which when executed by the processor 102, implements one or more features and/or acts of the methods described with reference to fig. 3A and 3B and fig. 4-6. A sufficiently large memory may be selected to store the global index 116A.

Computing device 104 can include a data storage device 116 for storing data, e.g., a queue, a global index 116A, a global file map 116B, and/or one or more media-specific deduplication mechanisms 150A that accumulate target data 112A for storage on media 160, each of which can be associated with an identification table 150B, as described herein. For example, the data storage device 116 may be implemented as a Solid State Disk (SSD) STATE DRIVE, a fast storage layer of memory, local hard drives, virtual memory, solid state devices, storage devices, etc., and/or as a remote server and/or computing cloud (e.g., accessed using a network connection).

The computing device 104 communicates with one or more sequential access storage devices 150 that are accessed (i.e., read and/or written) on one or more media 160. For example, sequential access storage device 150 may be implemented as a tape drive. Sequential access medium 160 may be accessed, for example, as a tape library, a cartridge (e.g., a cartridge in a drive and/or a cartridge in a slot), such as an LTO-9 cartridge storing 18 terabytes (terabyte, TB). Sequential access storage device 150 may include built-in compression, e.g., 1:2.5 average as in the LTO-9 specification.

Computing device 104 may include one or more of a network interface 118 for connecting to network 114, such as a network interface card, a wireless interface to a wireless network, a physical interface for connecting to a cable for a network connection, a virtual interface implemented in software, network communication software providing higher-level network connections, and/or other implementations.

For example, the network 114 may be implemented as the Internet, a local area network, a virtual private network, a wireless network, a cellular network, a local bus, a point-to-point link (e.g., wired), and/or combinations thereof.

For example, computing device 104 may communicate with sequential-access storage device 150 (e.g., for reading data from medium 160 and/or storing data on medium 160) via network interface 118 over network 114 and/or another interface, such as a virtual and/or physical interface, using a wireless and/or physical cable link.

The computing device 104 may include one or more physical user interfaces 108 and/or be in communication with one or more physical user interfaces 108, the one or more physical user interfaces 108 including mechanisms that facilitate user input of data and/or viewing of data. For example, the exemplary user interface 108 includes one or more of a touch screen, a display, a keyboard, a mouse, voice activated software using a speaker and microphone.

Referring back now to fig. 2, the components described with reference to system 200 may correspond to, combine, include, and/or may represent an exemplary implementation of the components of system 100 described with reference to fig. 1. The system 200 may include a flash storage layer 202, the flash storage layer 202 including one or more flash data storage devices 204, such as SSDs. For example, the flash storage layer 202 may store target data that is being processed for storage on the tape storage layer 250. It should be noted that magnetic tape represents an exemplary implementation of a sequential access storage device and/or sequential storage medium, and that magnetic tape drives and/or magnetic tape cartridges are not necessarily limiting of other implementations of sequential access storage devices and/or sequential storage media. The system 200 includes a tape storage layer 250, the tape storage layer 250 including a global index 216A (e.g., a global sparse index) as described herein. The global sparse index 216A may be stored on a fast data storage device (e.g., SSD 206). The tape storage layer 250 includes a plurality of magnetic tape cartridges 252, wherein each magnetic tape cartridge 252 is read and/or written to by a tape drive (not shown). Each cartridge 252 is associated with a tape specific deduplication mechanism, as described herein. For example, each tape-specific deduplication mechanism may include a sparse index 254 and a full index 256, as described herein. It should be noted that other deduplication mechanisms described herein may be used, with sparse and full indexes representing one example that is not necessarily limiting. Each set of sparse index 254 and full index 256 may be stored, for example, on fast data storage 206.

Referring back now to FIG. 3A, in 302, target data is obtained for storage on a medium of a sequential access storage device. The target data may be, for example, data for archiving.

In 304, the target data may be processed by dividing the target data into segments, computing fingerprints of the segments, and selecting a representation of the segments.

The target data may be divided into smaller portions, referred to herein as segments, i.e., the target data is divided into data segments (target segments).

For example, the target data may be divided into constant size data segments and/or variable size data segments. A constant size segment is a process that divides data into fragments of exactly the same size (e.g., 16KB or 32KB or other sizes). It should be noted that in a deduplication based system, constant segmentation is not widely used, because a small number of insertions or deletions of several bytes will change all fragments from the change point, and no deduplication will occur. Variable-sized segmentation is the process of dividing data into segments according to a window of data. For example, a rolling hash function is used for rolling windows of 64 to 256 bytes. When the hash value of the rolling hash satisfies a certain condition, the fragment is cut, and a new fragment is started. The condition may be selected such that the minimum/average/maximum size of the fragments is a desired value.

A representation of the target data may be extracted. Optionally, samples of the data segments are identified and a representation is calculated for the sampled data segments.

As used herein, the term fingerprint (or data fingerprint) refers to a representation extracted from target data and/or from data stored on a sequential access medium as described herein. For example, the fingerprint may be extracted by feeding a data fragment sampled from the target data into a strong hash function (e.g., SHA1 outputting a 20 byte cryptographic hash, or a newer version of the secure hash algorithm (secure hash algorithm, SHA) outputting a longer hash value). The fingerprints are used to quickly compare target data to fingerprints of one or more indexes of a global index and/or media-specific deduplication mechanism, i.e., rather than comparing actual target data on a byte-by-byte basis. Note that the chance of false positive comparison may be selected. For example, when using a strong hash, the chance of false positives is very low (for 1PB storage, fragments are about 16 Kilobytes (KB), yielding 2 ³⁶ fragments, with SHA1 signed 20 bytes = 2 ¹⁶⁰, which gives a 1:2 ¹²⁴ chance of false positives).

It should be noted that the target data may be segmented and/or hashed using the same segmentation and/or hashing used by the medium-specific deduplication mechanism of the medium. For example, when disk deduplication and media deduplication use the same value, the segmentation and/or hashing process may be skipped, but in general, the segmentation of disk deduplication may be smaller.

Incoming target data may be written to a fast layer (with or without deduplication), such as a fast data storage device. The target data may be accumulated on the fast data storage device, for example in a buffer. For example, the target data may be moved to the medium according to a threshold and/or data retention policy defined for the fast data storage device. An accumulated amount of target data may be moved into the medium. For example, the target data may be moved in blocks of hundreds of GB at a time. For example, a data storage device storing target data is 10TB in size, and the threshold rule is to move 2TB of target data into the medium every time the data passes the 8TB mark. Or the target data may be written directly to the medium. The fast layer (or HDD within the tape layer, for example) may be used as a scratch pad until sufficient target data is written.

In 306, a global index may be accessed. In some embodiments, the method continues to feature 308 without using a global index.

The global index maps between the representative fingerprint and the medium storing the data piece corresponding to the representative fingerprint. The global index is used for efficient searching to match fingerprints, which indicates the medium with the highest number of possible matching fingerprints for further evaluation to determine the actual match. One or more target segments (e.g., fingerprints thereof) of target data are searched (e.g., looked up) in a global index to determine whether repeated segments of the respective target segments are likely to be stored on one or more media, i.e., match candidates for the one or more target fingerprints. The fingerprints of the target data that may be searched include all fingerprints, a subset of fingerprints selected using the same method for selecting a sparse index entry for a global index entry and/or medium-specific deduplication mechanism, or a combination thereof, e.g., if the number of matches found is insufficient.

The global index enables quick searching of target data set segments to quickly identify which media require more careful inspection to verify the storage of duplicate data segments. For example, an operation without a global index would require access to all indexes of all deduplication mechanisms for all media, which is computationally intensive and/or inefficient in finding duplicate segments of a large number of data segments and/or on a large number of media storing a large amount of data.

Search results are obtained that indicate a subset of media that is likely to store matching data segments that are duplicates of the target data segment. For example, a list of media is obtained, where each media indicates how many target data segments are likely to have duplicate segments stored on the media. The best matching medium may be selected, for example, a predefined number of media having the highest number of matches (indicating the highest number of data segments that repeat with the target segment), such as the best matching 4 or 5 (or other number) media. Other searches of the medium-specific deduplication mechanism may be limited to best matching mediums to avoid lengthy search times and/or to increase the efficiency with which the processor performs the search.

For example, as described herein, the global index may be stored on a high performance storage device (e.g., flash memory, e.g., RAM, SSD, etc.) in communication with a processor accessing the global index. The global index may be stored on a fast memory to increase memory utilization and/or system performance by enabling a fast lookup to quickly identify which media are relevant for further analysis to verify the storage of duplicate segments of the corresponding target segment.

Alternatively, the global index is implemented as a sparse index. The sparse index includes a sparse obtained data segment (e.g., its fingerprint) selected from each medium, i.e., a subset of the fingerprints of the data segments stored on the medium. For example, 2 fingerprint entries are selected every 4 megabytes (megabyte, MB) of data or other value. The sparse design of the global index supports storing samples of data segments from all data segments stored on all media, which can be very large, for quickly identifying which media may store duplicate segments.

The global index (optionally, sparse index) may be implemented as a key-value database (KVDB) and/or other searchable data structure that includes entries for all media storing related data entries. For example, the search key may be a fingerprint (or portion thereof), or a weak hash created from a fingerprint. For each key there may be multiple values indicating one or more media where it is possible to store a repeated segment of the data segment represented by the key, i.e. the same data segment may be stored on one or more media.

The sparse index may map a weak hash of the target data segment's fingerprint onto one or more media, with some probability of false positive results that require verification to determine whether the target data segment is actually stored on the mapped media. The weak hash may be implemented as part of a strong hash (e.g., bytes 10 through 17 in a 20-byte full hash). Weak hashing may be used to detect areas of data stored on a medium that have a high probability of containing similar data. It should be noted that weak hashes are not necessarily used to determine the actual identity of the duplicate data, as the false positive opportunities are not necessarily low enough to confirm the presence of the duplicate data. But since the number of entries in the global index is large enough (e.g., in the range of 10s x G), even if a 64-bit hash is used, the collision probability is low enough to be sent to the correct storage area.

The validation of the storage of the target fragment on the medium is provided by a medium specific deduplication mechanism mapping the medium, as described herein. The relatively small weak hash design supports a medium that quickly determines the most likely repeated segment of the stored target data. The use of a relatively small weak hash may increase the utilization of memory storing the global index.

The data segments represented in the global index are sampled from data segments represented in a plurality of medium-specific deduplication mechanisms of a plurality of media. Sampling the data segments represented in the plurality of media-specific deduplication mechanisms for inclusion in the global index enables efficient coverage of large amounts of data stored in the plurality of media while improving utilization of the memory storing the global index.

The global index provides compatibility for different implementations of the deduplication mechanism for different media independent of the implementation of the deduplication mechanism for each medium.

The value of the segment size and/or resolution of the global index may be configurable.

An exemplary calculation using the global index as a sparse index is now provided. The global index includes a subset of index entries of a medium-specific deduplication mechanism of multiple mediums. There are M entries per 10GB, for example 10GB/M >1GB/N. Each entry of the global index may include the following fields [ weak hash (8 to 16 bytes), media ID (4 bytes) ].

In 308, a media-specific deduplication mechanism is accessed for each respective media identified by the global index as potentially storing one or more duplicate segments of the target data segment thereon.

Each respective media-specific deduplication mechanism may be stored on a high-performance storage device in communication with the specific media to improve memory utilization and/or system performance by enabling a quick lookup to quickly ascertain which media actually stores the duplicate segments of the target data segment. For example, each media-specific deduplication mechanism may be stored on its own independent high-performance data storage device, which may enable parallel access to multiple media-specific deduplication mechanisms. In another example, multiple media-specific deduplication mechanisms may be stored on a common high-performance data storage device.

An exemplary method for searching for a media-specific deduplication mechanism is now described. A sparse index of media-specific deduplication mechanisms is searched for a representation (e.g., a fingerprint) of the target data. For example, searching for all fingerprints, searching for a subset of fingerprints that may be stored by the medium as indicated by the global index, searching for a subset of fingerprints that are selected using the same method used to select the fingerprints for inclusion in the sparse index, or a combination thereof, e.g., if an insufficient number of matches are found. The search results indicate which areas of the medium are likely to store data segments that are duplicative of the target data segment. The entries of the full index corresponding to the matching entries of the sparse index are read and compared to the target data segment (e.g., its fingerprint) to determine if a matching entry exists. The matching item indicates a repeated segment of the target data segment stored on the medium.

An exemplary implementation of a searchable media-specific deduplication mechanism is now described. It should be noted that different implementations of different media-specific deduplication mechanisms for different media are possible. The media-specific deduplication mechanism may include a full index that includes all representations (e.g., fingerprints) of all data segments stored on the media. For all data segments stored on the medium, the full index may map a strong hash (e.g., fingerprint) of the data segments stored on the medium to an offset storage location on the medium. The full index may be stored on a fast data storage device. An exemplary calculation of the full index size is now provided. Number of entries per complete media=45 TB/fragment size (45 KB) =about 1G entries. Each entry includes the following fields [ ID (8 bytes), offset in the medium (8 bytes), size (4 bytes), hash (20 bytes SHA 1) ]. Without optimization, each entry is 40 bytes. The total size of the complete media (e.g., cartridge) is 40GB.

Since a full index is typically too large for efficient and/or fast searching, one or more of the following data structures (associated with the full index) may be accessed before the full index is accessed:

* A sparse index includes samples that point to representations of locations in the full index that may include one or more fingerprints of target data. The full index is then accessed to compare the full fingerprints to confirm whether duplicate segments of the target data segment exist. An exemplary calculation of sparse index size is now provided. The sparse index of the medium is a ratio of N hashes (e.g., partial fingerprints of strong hashes) per GB of data. For example, the hash may be selected based on a local minimum or maximum (this ratio may be adjusted as a trade-off between accuracy and size). The number of entries in the sparse index is 45tb×n/1 gb=45 k×n. Each entry includes the following fields [ weak hash (8 to 16 bytes), block ID (4 bytes) in tape ]. The total size of the sparse index for each complete medium is <1mb×n.

The number of data segments represented in the sparse index may be a defined percentage of the number of data segments represented in the full index, e.g., every 100 data segments of the full index, approximately 1 data segment in the sparse index, or other value. The number of data segments represented in the global index may be a defined percentage of the number of data segments represented in the sparse index of the medium, e.g., every 100 to 1000 data segments of the sparse index of the medium, approximately 1 data segment in the global index.

For example, the media sparse index may reside as an array in a flash disk and/or as a separate table in a global sparse index DB (e.g., using a flash data storage device).

* A weak hash table includes a mapping from a weak hash of a data segment (e.g., a portion of a full fingerprint) to a segment ID of the data segment in the full index. When a weak hash is found in the weak hash table, a series of IDs around the found ID may be loaded into the cache and the fingerprint may be found in the cache. If a weak hash is not found, the process may return to the weak hash table until all the non-found fingerprints are searched in the weak hash table.

* Bloom filters indicate whether a fingerprint is likely to be stored in the full index. Bloom filters may give false positive answers.

In 310, one or more media are selected for deduplication storage of target data based on the analysis of the matching results.

Alternatively, a single identified medium is selected for deduplication storage of target data. A single medium may be selected that has the largest number of stored data segments that match (i.e., are duplicate segments of) the target data segment. A single tape may efficiently store target data using a deduplication machine.

Or a combination of two or more identified media is selected for deduplication storage of target data. Parallel write access to a medium may be achieved using multiple media, which may be faster than multiple sequential accesses to the same single medium. The number of media in the combination may be limited (e.g., by a maximum value) to avoid too much media being involved in the case of data recovery, which may be inefficient.

The use of multiple media may result in higher deduplication rates than a single media without necessarily increasing the time to perform recovery, for example, when multiple media match more target segments than a single media. Furthermore, using multiple media may increase memory utilization and/or reduce memory consumption by reducing instances of multiple media storing the same data segment in a global sparse index.

The combination of identified media may be selected to achieve a highest target deduplication rate for the target data under defined constraints. Constraints for selecting combinations of identified media may be, for example, dividing the target data into predefined units (e.g., files), and/or writing divisions of the same units to the same media in order to recover the predefined units from a single media. Ensuring that partitions of the same data unit (e.g., file) are stored on the same medium allows the unit to be efficiently recovered from a single medium. For example, it is more efficient to ensure that the same file is recreated from a single cartridge and that the recreated file can be accessed faster than multiple cartridges are accessed to recreate the file, in contrast to being less efficient to restore the file across multiple different cartridges.

At 312, the processor determines a combination of segments that predicts a total recovery time less than a threshold. The combination includes one or more of one or more identified repeating segments, one or more overwriting segments, one or more new data segments. The overwrite segments have the same data as the identified repetition segments, wherein the overwrite segments are designated for being overwritten onto the medium as new data, e.g. over available space (e.g. appended to the end of existing written data). The cost of overwriting the segments is to use additional storage space on the medium, which has the advantage of reducing seek time so that a predicted total recovery time less than a threshold is obtained compared to seek time of existing copy segments. The new data segment (i.e., non-duplicate segment) includes data in duplicate segments that are not currently stored on the medium, i.e., the non-duplicate target segment is the portion of the target medium where duplicate segments (i.e., no matching data segments) are not found on the selected medium by searching the global index and/or the medium-specific deduplication mechanism. The new data fragment is designated for being written on the medium for the first time as new data.

Referring back now to FIG. 3B, an exemplary method for determining segment combinations is described.

In 350, a threshold is received. The threshold value may be obtained by, for example, manual input by a user, reading from memory, and/or automatic calculation. The threshold may be defined for each target data or may be defined globally for all target data. The threshold may be defined for each client or globally for a plurality of clients.

The threshold may be defined in accordance with a service level agreement (SERVICE LEVEL AGREEMENT, SLA). The SLA may be used to customize the threshold for each user. For example, different users select different values.

The threshold may define a maximum time from receipt of a recovery request to reconstruction of target data from the read fragment and ready. The threshold may represent a time of relevance from a user perspective, e.g., how long the user needs to wait from requesting the target data to the target data being available to the user.

In implementations where the sequential access storage device is a tape drive and the medium is a tape cartridge, the predicted total recovery time may be calculated by determining the order in which the segments of the target data are read and based on one or more (optionally all) of the parameters of maximum seek time for the first segment, seek time between segments based on the order for reading, read throughput, read head wrap change, and tape library loading time. These parameters depend on the connection type (e.g., cFC, SAS), tape drive/tape library type, inserted tape cartridge type (LTO 9, LTO 8.) and drive compression on/off, providing a custom selection of which duplicate segments to use and which duplicate segments to overwrite as new data.

The order of reading may be calculated based on a nearest neighbor method that takes into account the locations of the segments on different wraps that are physically close to each other and minimizes the total distance traversed between the segments by the read head. Moving the head of the tape drive across multiple wraps may be faster than moving the head of the tape drive across the length of one wrap. The predicted total recovery time may be further reduced by selecting different movements of the head across the wrap tape to change the order of reading. In addition, moving the head of the tape drive across the tape is more efficient than running the tape back and forth (also referred to as "shoe-cleaning") and greatly increases the life of the tape. In addition, moving the head of the tape drive across the tape reduces the number of stops and restarts of tape movements, improves the efficiency of tape movements and increases the durability of the tape.

Exemplary methods for calculating the optimal read order of a tape include a Time-based access order system (Time-based Access Order System, TAOS) and a recommended access order (Recommended Access Order, RAO).

Fig. 5 explains how the read time of a fragment based on the nearest neighbor method is lower than the linear read.

At 352, the processor may aggregate the repeating segments that are physically close to each other on the identified medium. The intervals between the repeated segments including data not related to the target data are included in the cluster. The combinations described herein are calculated on a cluster basis rather than a single segment.

As used herein, the term repeating segment may refer to a cluster of repeating segments. For example, the combinations described herein may include clusters of repeating segments, rather than individual repeating segments.

Aggregation reduces the number of elements in the combination, thereby reducing the total number of possible combinations, optionally to a smaller number. The small number of combinations enables efficient selection of one combination. For example, when 100 repeating segments are found and clustered into 4 clusters, it is computationally more efficient to find the best combination using 4 elements than to use 100 elements.

The predicted total recovery time is calculated from seek times between clusters. When reading data within a cluster in sequence, the total read and seek time of the cluster is shorter than the non-aggregated segments. This is because when fragments in a cluster are close to each other, reading the entire data of the cluster sequentially (including the non-relevant data between the fragments) is faster than stopping at the end of the fragments, skipping the non-relevant data between the fragments, and seeking to the beginning of the next fragment—the latter is much slower. For example, for a tape that reads 32MB to 64MB of data in advance after reaching a relocation, the time to read a single segment of a cluster of 32MB is the same.

At 354, the processor may calculate one or more candidate combinations. A predicted total recovery time for each candidate combination is calculated.

Different methods may be used to calculate the candidate combinations.

In one approach, multiple candidate combinations are computed sequentially and/or in parallel, where each combination includes a different repeating segment and a different overwriting segment. Each combination includes a different subset of the repeating segments (e.g., different numbers, different data), wherein identified repeating segments that are excluded from the subset are classified as overwriting segments. Excluding some identified repeating segments, but writing them sequentially with non-repeating segments, reduces the total recovery time, keeping the predicted total recovery time below the threshold.

When aggregation is performed and the number of clusters is small, all cluster combinations are optionally calculated. When the number of elements (e.g., clusters, fragments) is so large that evaluation of all combinations is computationally prohibitive (e.g., the time of computation is too long), a small number of candidate combinations may be evaluated, e.g., randomly selected and/or using heuristics. For each candidate combination, a predicted total recovery time is calculated.

In another method, a baseline recovery time is calculated for reading segments of the target data using a baseline set of identified repeated segments (optionally, all identified repeated segments). The baseline recovery time represents the recovery time at maximum repetition. When the baseline restoration time is less than the threshold, the identified repeated segments (e.g., all segments) are used. When the baseline restoration time is greater than the threshold, a difference between the threshold and the baseline restoration time is calculated. One or more identified repeating segments are iteratively deleted from the baseline set to obtain a candidate set. Identified duplicate segments deleted from the baseline/candidate set will be added as overwrite segments to the other set. The predicted total recovery time is calculated for a combination of the candidate set of repeating segments and the other set of overwriting segments. The iteration is continued until the predicted total recovery time of the candidate set is less than a threshold. One or more of the deleted repeated segments may be added back to the candidate set in an attempt to minimize the gap between the total recovery time of the candidate set and the threshold.

At 356, a candidate combination is selected that predicts a total recovery time less than a threshold and has a maximum repetition rate.

Assuming that the reading of multiple media is done in parallel, when two or more media are identified, a respective combination is selected for each media for which the respective predicted total recovery time is less than a threshold. In an environment of repeated data stored on multiple media read in parallel, the total reconstruction time remains below the threshold.

Referring back to FIG. 3A, in 314, the combined overwrite segments and/or new data segments (i.e., non-duplicate target segments) are written onto the identified media, optionally sequentially written over the available space of the identified media, which segments may be appended to the end of the existing write data.

In the case of a selected combination of media, the storage on the different media may be performed in parallel.

At 316, the process instructs to write metadata indicating metadata of the locations of the segment combinations on the medium for reading and obtaining a predicted total recovery time less than a threshold. The metadata provides the positions of the segments of the combination to enable the predicted total recovery time of less than the threshold to be obtained in a read phase.

The processor is further configured to instruct to read the metadata for determining a location of the combined segments to read to obtain a predicted total recovery time less than a threshold.

In 318, a media-specific deduplication mechanism of the selected media is updated to indicate the identified duplicative segments of the targets stored on the selected media, as well as to indicate the newly stored overwrite segments and non-duplicative segments of the target data.

In the case of a combination of selected media, the updating of the different media-specific deduplication mechanisms may be performed in parallel.

The complete index and associated data structures (e.g., sparse index, weak hash table, bloom filter) of each medium-specific deduplication mechanism associated with the selected single medium or combination of media are updated.

To make the data readable again, the updating includes adding metadata that indicates how to construct the target data from different segments on a single medium or combination of media, including newly added non-duplicate segments, overwritten segments, and pre-existing duplicate data segments. The metadata may be a mapping of data segments stored on the medium that are combined to reconstruct the target data. Metadata may be stored on the flash layer.

The identification table of the media specific deduplication mechanism is updated to store the metadata for each new segment identifier. Each metadata identifying each data segment stored in the table may include, for example, one or more of an indication of the medium storing the data segment, an offset of the medium storing the data segment, a size of the data segment, a fingerprint. The identifier of the medium may be included in the metadata and/or calculated using the fingerprint. The fingerprint included in the metadata is a fingerprint used to determine the medium, e.g., as described herein.

At 320, the global index is updated to indicate the mapping between segments of the target data.

At 312, a global file map for storing identifiers of segments of the target data is updated. The identifier may be a code, such as a number, letter, symbol, and/or combinations thereof, that uniquely identifies each data segment.

Referring back to FIG. 4, in 402, the processor receives a request to read target data stored as fragments on one or more media of a sequential access storage device.

In 404, the processor reads metadata (which was written during the write phase, as described herein) to obtain the location of the fragment of the target data on the medium.

Fragments of the target data are stored as combinations selected during the write phase to obtain a predicted total recovery time less than a threshold.

Other relevant data may be read, such as identifiers of data fragments of the target data obtained from the global file map. The metadata for each data fragment may be obtained, for example, from an identification table.

In 406, the processor instructs to read the fragment according to the location obtained from the metadata.

It should be noted that metadata indicates that the re-written segments written during the write phase are read, rather than repeated segments comprising the same data, in order to obtain a predicted total recovery time that is less than a threshold, as described herein.

Optionally, the order in which the segments are read is according to an order selected during writing to obtain a predicted total recovery time that is less than a threshold. The order may be based on a nearest neighbor method determined during the write phase, as described herein.

At 408, the processor recovers the target data from the read fragment. When the writing is completed according to the selected combination, it is expected that data recovery is performed below the threshold.

Referring back now to fig. 5, schematic 502 helps explain why linear recall is very inefficient and schematic 504 helps explain why ordered recall is more efficient. Schematic 502 depicts reading from a tape in a wrap-around order based on the original order of reading of files/clips. This causes the tape to be moved back and forth, which is inefficient and wears the tape, as described herein. Schematic 504 depicts an optimal read sequence based on the nearest neighbor approach, considering that sometimes the head of a tape drive is moved between wraps faster than the head is moved over the length of a wrap. It is visually apparent that the total length traversed by the read head at 504 is much shorter than at 502.

Referring back now to fig. 6, schematic diagrams 602, 604, 606, and 608 that help explain the calculation of predicted seek times for different combinations of identified repeating segments according to some embodiments described herein. Each combination includes an increased number of used repeated segments and a reduced number of repeated segments written sequentially as new data. These combinations increase the deduplication rate, but at the cost of increased seek time. A combination having a maximum seek time less than a threshold and a maximum data de-duplication rate is selected.

It should be appreciated that read throughput and seek time depend on the tape drive and cartridge type. The numbers here are just examples of calculations. The cartridge loading time in the library is omitted for simplicity.

Four combinations of identified repeating and non-repeating segments of target data are depicted in schematics 602, 604, 606, and 608.

Schematic 602 depicts a first combination in which identified repeating segments are not used. All incoming target data 650 is written as a new segment at the end of wrap 2 652, which requires almost a full seek 654 time of about 90 seconds. The data read time was calculated as 10 Gigabytes (GB)/300 Megabytes (MB)/second (S) =33 seconds. The total predicted seek time is 123 seconds.

Schematic 604 depicts a second combination in which approximately 25% de-duplication is achieved. Fragments 135-136 660 are repeated. The first seek 662 to segment 135 increases by about 75 seconds. The second seek 664 from segment 136 to segment 241 increases by about 70 seconds. The data read time was 33 seconds. The predicted total seek time is 178 seconds.

Schematic 606 depicts a third combination in which approximately 40% de-duplication is achieved. Fragments 1-5 670 are repeated. The first seek 672 to segment 1 takes 100 seconds. The second seek 674 from segment 5 to segment 241 increases by about 80 seconds. The data read time was 33 seconds. The predicted total seek time is 213 seconds.

Schematic 608 depicts a fourth combination in which approximately 63% de-duplication is achieved. Fragments 1-5 and fragments 135-136 are repeated. The first seek 682 to segment 1 increases by 100 seconds. The second seek 684 from segment 5 to segment 135 increases by 25 seconds. The third seek 686 from segment 135 to segment 241 increases by 70 seconds. The data read time was 33 seconds. The predicted total seek time is 228 seconds.

The selected combination is the combination that has the highest percentage of deduplication and still meets the threshold (e.g., defined by the SLA). If the threshold is greater than 228 seconds, a fourth combination is selected. If the threshold is between 213 and 228 seconds, a third combination is selected. If the threshold is between 178 and 213 seconds, a second combination is selected. Otherwise, the first combination will be selected.

Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.

The description of the various embodiments of the present invention is intended for purposes of illustration only and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to better explain the principles of the embodiments, the practical application, or technical improvements over the technology found in the marketplace, or in order to enable others skilled in the art to understand the embodiments disclosed herein.

It is expected that during the life of a patent expiring in this application many relevant deduplication mechanisms and mediums will be developed, the scope of the term deduplication mechanism and medium being intended to include all such new technologies a priori.

The term "about" as used herein means ± 10%.

The terms "including", "having" and their cognate terms mean "including but not limited to". This term includes the term "by. The composition" and "consisting essentially of.

The phrase "consisting essentially of" means that the composition or method may include additional ingredients and/or steps, provided that the additional ingredients and/or steps do not substantially alter the basic and novel characteristics of the composition or method as required.

As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "a complex" or "at least one complex" may include a plurality of complexes, including mixtures thereof.

The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude features from other embodiments.

The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. Any particular embodiment of the invention may include a plurality of "optional" features unless the features contradict each other.

In the present application, various embodiments of the application may be presented in a range format. It should be understood that the description of the range format is merely for convenience and brevity and should not be construed as a fixed limitation on the scope of the present application. Accordingly, the description of a range should be considered to have specifically disclosed all possible sub-ranges as well as individual values within the range. For example, a description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., as well as individual numbers within that range, such as 1, 2,3, 4, 5, and 6. This applies regardless of the breadth of the range.

When a range of numbers is referred to herein, it is intended to encompass any of the recited numbers (fractional or integer) within the range indicated. The phrases "a range between a first indicated number and a second indicated number" and "a range from a first indicated number to a second indicated number" are used interchangeably herein to mean that all fractions and integers between the first indicated number and the second indicated number are included.

It is appreciated that certain features of the invention, which are, for clarity of description, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as any suitable other embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments unless the embodiments are not described as being without these elements.

The inventors' intent is that all publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. Furthermore, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present application. With respect to the use of section titles, the section titles should not be construed as necessarily limiting. Further, one or more of any priority documents of the present application are incorporated by reference herein in their entirety.

Claims

1. A device (104) for deduplicating data, comprising:

A processor (102) configured to:

Identifying at least one medium (160) storing duplicate segments of at least one segment of target data (112A) stored on the plurality of media by accessing a global index (116A) mapping between each of a plurality of media of a plurality of sequential access storage devices (150) and data segments stored on the media;

For each correspondingly identified medium, identifying duplicate segments of segments of the target data stored on the medium by accessing a medium-specific deduplication mechanism (150A);

Determining a combination for which the predicted total recovery time is less than a threshold, the combination comprising at least one of: (i) at least one duplicate segment, (ii) at least one rewritten segment, and (iii) at least one new data segment, wherein the at least one rewritten segment comprises at least one duplicate segment for being rewritten, and the at least one new data segment comprises a non-duplicate segment for being written for the first time; and

At least one of the (ii) at least one rewritten segment and the (iii) at least one new data segment is written onto the identified medium.

2. The device according to claim 1 is characterized in that the processor is also used to write metadata, which indicates the location of the combination on the medium for reading and obtaining the predicted total recovery time less than the threshold.

3. The device according to claim 2 is characterized in that the processor is further used to read the metadata to determine the position of the combined fragments, so as to read to obtain the predicted total recovery time less than the threshold.

4. The device according to claim 1, characterized in that the at least one overwritten segment and the at least one new data segment are sequentially written to the available space of the identified medium.

5. The device according to any one of the preceding claims, characterized in that the processor is further configured to:

Calculate multiple combinations, each combination including a different combination of repeated segments and rewritten segments;

For each combination, calculating the predicted total recovery time; and

The combination whose predicted total recovery time is less than the threshold and has the largest repetition rate is selected.

6. The apparatus of claim 5, wherein each combination comprises a different subset of the repeated segments, wherein the identified repeated segments excluded from the subset are classified as overwritten segments.

7. The device according to any one of the preceding claims, characterized in that the processor is further configured to:

calculating a baseline recovery time for reading a plurality of segments of the target data using the identified first set of repeated segments;

calculating a difference between the threshold and the baseline recovery time; and

iteratively deleting at least one of the identified repeated segments from the first set until the predicted total recovery time is less than the threshold,

The identified repeated segments deleted from the first set are added to the second set as rewritten segments.

8. The device according to any one of the preceding claims, characterized in that the threshold defines a maximum time from receiving a recovery request to rebuilding the target data from the read fragments and being ready.

9. A device according to any of the above claims, characterized in that the multiple sequential access storage devices include tape drives, the multiple media include cassette tapes (252), and the predicted total recovery time is calculated by determining the order of the fragments used to read the target data and based on the maximum seek time to the first fragment, the seek time between fragments based on the order used for reading, the read throughput, the read head wrap variation and the tape library load time.

10. The apparatus according to claim 9, characterized in that the order for reading is calculated based on a nearest neighbor method, which takes into account the positions of fragments on different windings that are physically close to each other and minimizes the total distance traversed by the read head between the fragments.

11. The device according to any of the preceding claims, characterized in that the threshold value is defined according to a service level agreement (SLA).

12. The device according to any one of the preceding claims, wherein the processor is further configured to:

updating at least one medium-specific deduplication mechanism of at least one identified medium to indicate that the identified duplicate segments, the at least one overwritten segment, and the non-duplicate segments are included in the selected combination of the target data;

updating the global index to indicate a mapping between the segments of the target data used during the access;

updating a global file map (116B) to store an identifier of the at least one segment of the target data; and

The identification table (150B) is updated to store metadata for each identifier.

13. The device according to any one of the preceding claims, characterized in that when at least two media are identified, a corresponding combination whose corresponding predicted total recovery time is less than the threshold is selected for each medium.

14. A device according to any of the above claims, characterized in that the processor is also used to cluster repeated fragments that are physically close to each other on the identified medium, wherein the combination includes at least one cluster of repeated fragments and the new data is included in at least one overwritten cluster.

15. The device according to claim 14, wherein the predicted total recovery time is calculated based on the seek time between clusters.

16. A device (104) for recovering data, comprising:

A processor (102) configured to:

receiving a request for reading target data, the target data being stored on at least one medium of a plurality of media of a sequential access storage device storing the target data as a plurality of fragments;

reading metadata to obtain locations of the plurality of segments of the target data on the at least one medium;

wherein the segments of the target data are stored as a combination selected during a write phase to obtain a predicted total recovery time less than a threshold, the combination comprising at least one of the following: (i) at least one duplicate segment, (ii) at least one overwritten segment, and (iii) at least one new data segment, wherein the at least one overwritten segment comprises at least one duplicate segment that is overwritten, and the at least one new data segment comprises a non-duplicate segment that is written for the first time;

reading the segment according to the position obtained from the metadata; and

The target data is restored from the read segments.

17. The apparatus of claim 16, wherein the order in which the segments are read is based on an order selected during writing to obtain the predicted total recovery time less than the threshold.

18. A method for deduplicating data for storage, comprising:

identifying at least one medium storing duplicate segments of at least one segment of target data stored on the plurality of media by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and data segments stored on the media (306);

for each correspondingly identified medium, identifying duplicate segments of segments of the target data stored on the medium by accessing a medium-specific deduplication mechanism;

Determining a combination for which the predicted total recovery time is less than a threshold, the combination comprising at least one of: (i) at least one duplicate segment, (ii) at least one rewritten segment, and (iii) at least one new data segment (312), wherein the at least one rewritten segment comprises at least one duplicate segment for being rewritten and the at least one new data segment comprises a non-duplicate segment for being written for the first time; and

At least one of the (ii) at least one rewritten segment and the (iii) at least one new data segment is written to the identified medium (314).

19. A non-transitory medium (106) storing program instructions (106A) for deduplicating data, wherein when the program instructions are executed by at least one processor (102), the at least one processor performs the following operations:

identifying at least one medium storing duplicate segments of at least one segment of target data stored on the plurality of media by accessing a global index mapping between each of a plurality of media of a plurality of sequential access storage devices and segments of data stored on the media;

20. A method for recovering data, comprising:

receiving a request for reading target data, the target data being stored on at least one medium of a plurality of media of a sequential access storage device storing the target data as a plurality of fragments (402);

reading metadata to obtain locations of the plurality of segments of the target data on the at least one medium (404);

reading the segment according to the position obtained from the metadata (406); and

The target data is restored from the read segments (408).

21. A non-transitory medium (106) storing program instructions (106A) for recovering data, characterized in that when the program instructions are executed by at least one processor (102), the at least one processor is caused to perform the following operations:

reading the segment according to the position obtained from the metadata; and

The target data is restored from the read segments.