HK1173521B

HK1173521B - Partial recall of deduplicated files

Info

Publication number: HK1173521B
Application number: HK13100336.8A
Authority: HK
Inventors: A.古普塔; R．卡拉赫; C.H.张; J．R．本顿; J-T．普芬宁
Original assignee: 微软技术许可有限责任公司
Priority date: 2010-12-16
Filing date: 2013-01-09
Publication date: 2015-07-10

Description

Partial recall of deduplicated files

Technical Field

The invention relates to partial recall of deduplicated files.

Background

Data deduplication (also sometimes referred to as data optimization) refers to eliminating redundant data in a storage system to reduce the amount of physical bytes of data that need to be stored on disk or transferred over a network without compromising the fidelity and integrity of the original data. Data deduplication thus results in savings in hardware costs (for storage) and data management costs (e.g., backup) by reducing the resources required to store and/or transfer data. These cost savings become important as the amount of digitally stored data grows.

Data deduplication typically uses a combination of techniques for eliminating redundancy within and between persistently stored files. A technique for identifying identical portions of data in one or more files and physically storing only one unique portion (block) while maintaining a reference to the block in association with the file. Another technique is to mix data deduplication with compression, for example, by storing compressed blocks.

The data of the deduplicated file is thus stored in blocks or compressed blocks in a block store, where the file itself is reserved as a "stub" that includes references to these blocks. When a user or application needs to access a deduplicated file, the deduplication engine brings the data back into memory (called rehydration) or into disk (called recall). When a user or application modifies the data, portions of the old optimized data may be needed to ensure data consistency and integrity.

The process of rehydration or recall introduces latency in data access due to (possibly) the need for chunk decompression, file fragmentation introduced due to blocking, and due to the location/implementation of chunk storage. A complete file recall introduces high latency and relatively considerable I/O overload. The latency and resource consumption problems are exacerbated when the file is large.

Furthermore, when a full large file is recalled, the deduplication engine may need to deduplicate the file again. This requires a large amount of resources and impacts the overall data deduplication throughput, which is also a challenge given the large amount of data that a typical deduplication system needs to manage.

Disclosure of Invention

This summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.

Briefly, various aspects of the subject matter described herein are directed towards techniques for managing file data such that a file may be in a partially deduplicated state in which some file data is deduplicated in a chunk store and some file data is recalled to the file, i.e., to a storage volume of the file in place of a reference to the chunk store. When writing a range of data, the file may change from a fully deduplicated state to a partially deduplicated state. This change can be made by: the block is read from the block store and at least a portion of the block is committed (e.g., flushed) to a storage volume or other medium containing the file as the recalled range. Tracking information is maintained in association with the file (e.g., at the reparse point) to track which data ranges have been recalled and which data ranges reside in the chunk store. The tracking information is accessed to return data from the appropriate source as needed.

In one aspect, a block may be read to obtain more data than is needed, such as to align with file system allocation boundaries, and/or to anticipate a need to access additional data. For example, a file may be divided into fixed-size partitions, any of which contains recall data that is substantially filled to the partition boundaries.

In another aspect, the tracking information may be maintained as metadata for the deduplication file, e.g., as a bitmap correlation (auxiliary) structure with one bit per partition to indicate whether the partition contains recalled file data or whether the data resides in chunk store. The tracking information may be maintained in a reparse point buffer, an alternate data stream, or any other means that the file system provides for storing metadata about the file. Metadata may be stored in a limited amount of space. If more space is needed, the data in the bitmap correlation structure may be used to represent more than one partition and/or may be compressed (e.g., encoded) to reduce the space consumed. File data may be recalled to make compression more efficient.

Other advantages of the present invention will become apparent from the following detailed description when read in conjunction with the accompanying drawings.

Drawings

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or like elements and in which:

FIG. 1 is a block diagram representing example components that recall ranges of a deduplication file as a partial deduplication file and/or access data of the partial deduplication file.

FIG. 2 is a schematic diagram of how blocks containing data ranges are called and aligned on disk.

FIG. 3 is a schematic diagram of how fast additional attachments to blocks containing data ranges are called and aligned on disk.

FIG. 4 is a schematic diagram of a deduplication reparse point comprising a buffer containing a recall bitmap used to track which ranges of a partial deduplication file have been recalled and which ranges are retained in chunk store.

Fig. 5 and 6 include flow diagrams representing steps of a partial recall algorithm used in an example implementation.

7A-7C are schematic diagrams of how partial deduplication file data may be recalled and written back with user data for an alternative example implementation.

8-10 include flow charts representing steps of a partial recall algorithm used in alternative example implementations.

FIG. 11 is a block diagram representing exemplary non-limiting networked environments in which various embodiments described herein can be implemented.

FIG. 12 is a block diagram representing an exemplary non-limiting computing system or operating environment in which one or more aspects of various embodiments described herein can be implemented.

Detailed Description

Various aspects of the technology described herein are generally directed to recalling partial portions (extents) of a deduplicated file as opposed to recalling an entire file, which provides performance benefits relative to a full file recall (especially when the file is large). To this end, aspects of the present technology allow recall ranges in memory and/or on disk to be implemented and tracked in an I/O efficient manner. The technique also facilitates re-deduplication of partially recalled files by scanning and streaming only the recalled file range (relative to the entire file), thereby saving machine resources and increasing deduplication throughput.

To implement partial deduplication, various mechanisms and optimizations are provided, including mechanisms and optimizations directed to adjusting file extents, I/O types, and minimizing refreshes for recalls. A compressed bitmap structure that tracks the extent of recalls may also be utilized.

It should be understood that any of the examples herein are non-limiting. Thus, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing and data processing in general.

FIG. 1 illustrates an example concept of a partial recall of a deduplicated file 102. In general, when a file is fully deduplicated, the file is associated with metadata 104, which metadata 104 includes deduplication (deduplication) metadata 106 that maps file chunks (e.g., C3, C6, and C7) in chunk store 108 back to logical files. In general, when an application 110 requests access to a file, a deduplication mechanism 112 (e.g., implemented in a file system filter) accesses blocks referenced in chunk store 108 (or one when there is only one) and, after any decompression as appropriate, returns the chunks as recalled file data. From the perspective of the application 110, the data is intact, and thus, without delay, the user of the application 110 does not know or care whether the file is deduplicated.

There are various operations that cause a file to change from a deduplication state to a non-deduplication state. This includes when a user or process writes to a file, causing the modified file data located at the write offset location to no longer match the block or blocks that previously contained the data. It is often inefficient to recall the entire file completely to support such file modifications.

As described herein, a partial recall mechanism 114 is provided that allows only relevant chunks to be recalled, leaving the deduplication file in a partially deduplicated state (partial deduplication file 102), containing certain recalled file data 116 and references to chunks that were not recalled. For example, in FIG. 1, consider that only recall chunk C7/its corresponding data was written (dirty). File 102 (e.g., once the data is submitted to disk or other stable storage) contains recalled block data modified by the write operation in its recalled file data 116; deduplication metadata 106 is updated to reflect this partial recall status in order to understand that file 102 contains recalled data and in order to no longer point to now obsolete chunk C7. Note that the partial deduplication file may be changed back to the full deduplication file state, or recalled fully to the non-deduplication state, as described below.

Thus, a partial recall generally refers to recalling fewer than all of the chunks (generally those chunks associated with changes) and tracking which portion or portions of the file have been recalled and which are retained in the deduplicated chunks. The file may thus exist in a non-deduplication state, a partial deduplication (partial recall) state, or a full deduplication state.

In one implementation, the deduplication mechanism 112 is implemented as a file system filter 118, incorporating the partial recall mechanism 114 as a subcomponent (e.g., a line of software code) of the deduplication mechanism 112. As described herein, the partial recall mechanism 114 performs various tracking operations, including maintaining tracking data regarding which portions of a file have been recalled and which are retained in chunk storage.

In one implementation, correspond toIn thatNTFS configuration, tracking data is maintained in NTFS reparse points, which also inform the filter 118 that the file is partially deduplicated. In other usage scenarios, the resolution points are well known and not described in detail herein. Note that reparse points are only one alternative; other metadata/streams may alternatively be used for tracking purposes. In general, any manner of associating metadata with a file (including via reparse points, streams, database lookups, etc.) may be used to provide access to deduplicated metadata.

When data is recalled, a number of policy options are available that are typically associated with the file system. For example, NTFS-based deduplication may use spatially allocated sparse files with multiples of a fixed cluster size (e.g., 64k cluster size blocks). In such implementations, the deduplication block size of the file may not be aligned with the data allocation size of the file system. By way of example, a file may have 100K and 28K deduplicated blocks (128K total), while NTFS may store it in two clusters of 64K (128K total).

Thus, due to such differences, when a user writes to a block of a certain size (e.g., 100K), more data (e.g., 128K) is typically required to be recalled. As an example, consider a user requesting access to a 50K data range, as illustrated in FIG. 2 by data range 220. However, this is not consistent with the chunk that includes data range 220, and thus two chunks 222 and 224 need to be recalled to provide the full data range requested. Note that in this example, the blocks are not of a fixed size, however fixed size blocks may be used in many implementations. Furthermore, consider that for sparse files, the file system allocates 64K of space at the time needed for data writes, as shown in FIG. 2, also cannot align with blocks. (supporting variable-scope file systems may not have such a problem.)

There are therefore a number of options available, including bringing only the blocks that the user needs into memory. In this case, there is more data than needed, but the data is not enough to fill the three 64K allocations. All block data can be written to disk, with other parts of the allocated space not written filled with zeros. Alternatively, in the example of fig. 2, only the required 50K blocks of data plus any other blocks of data may be retained up to the 64K allocation boundary, so only 64K of space needs to be allocated and then the 64K allocation filled completely.

In another alternative shown in FIG. 3, more blocks (blocks 222 and 224 and blocks 332 and 334) may be requested than are needed for the 50K range, such that blocks 222 and 224 are located entirely on disk. Note that for blocks 332 and 334, only the portions that fall within the 64K boundary are retained in order to fully utilize the space allocated by the file system.

Thus, this is a policy choice on how to handle the extra data or space, which can be used to minimize latency and I/O, for example by allocating and filling more space than needed, if writing to the extra space is expected to be beneficial. As can be seen from the examples of fig. 2 and 3, one policy implementation saves write operations based on a fixed sector alignment boundary, dropping any unneeded block data that is outside the boundary. One strategy implements reading enough data to adequately fill each allocated range of the sparse file. (one implementation takes these two policy actions for processing block data.)

Turning to aspects related to tracking recalled data versus data in chunk store, the maintained tracking data has various aspects that may be related to the file system. For example, one file system may allocate only the required file space to sparse files, whereby such a file system itself may be sufficient to track which have been recalled and which have not. However, NTFS does not guarantee that it will not allocate more sparse file ranges than needed, and thus in NTFS implementations, the filter 118 needs to track which partitions (e.g., at 64KB boundaries corresponding to sparse file allocation) are recalled to the partially deduplicated file, and which partitions are reserved as deduplication blocks.

In one implementation, generally represented in FIG. 4, NTFS reparse points 440 are used for tracking. The reparse point 440 maintains a bitmap structure 442 or a compressed representation of the bitmap structure 442. More specifically, to partially recall a file, the deduplication filter 118 creates a virtual partition view of the file, with partitioning at a predetermined boundary (e.g., at a 64K boundary). The filter 118 maintains this partition view in a bitmap structure in the reparse point buffer, e.g., a set bit in the bitmap indicates that the partition is recalled to the deduplication file, whereas a zero bit indicates that the partition resides in the chunk store. The size of the partition may be increased when setting the reparse point.

In a reparse point implementation, the bitmap is limited in size. If the file becomes too large, the bitmap cannot track the partitions using one bit per partition. Described herein are various ways of adjusting the bitmap in such a case.

One way to adjust the bitmap is to use one bit to represent multiple partitions and access each group of multiple partitions together in unison as a unit that cannot be partitioned. For example, if one bit represents a 2-partition unit, then for a 64K partition, each partition is recalled as part of a 128K unit, and so on.

Another way to adjust the bitmap is to compress the bitmap in the reparse points and decompress it into memory when it needs to be accessed. Run-length encoding, for example, is a well-known compression technique for bitmap runs of ones and zeros.

Note that if compression can no longer reduce the size of the compressed bitmap, the selected partitions can be recalled such that more of the recalled partitions are contiguous, thereby increasing compression efficiency such that the compressed bitmap size fits into the space allowed by the reparse point. For example, consider a long run with a majority of one bit occasionally interrupted by zero. If blocks corresponding to occasional zero bits are recalled, the run-length encoding may represent a much longer run of one bit in a single run representation.

Note that data is recalled at one time, while bitmaps are updated at another time, which results in possible crash-related inconsistencies. As described herein, the order of operations makes a system crash consistent when data is committed (e.g., flushed) to stable storage (e.g., disk, etc.). More specifically, a change in the bitmap is refreshed on the disk only after the corresponding data it represents is refreshed on the disk. In this way, the bitmap never indicates that the data has been partially recalled until the data is considered safe to be refreshed on disk. Note that if the disk and file system support write-through, write-through may be used instead of refresh. Note that while flushing file data/metadata to disk is typically used as an example herein, this is merely an example. Thus, "commit to stable storage" also includes the concept of write-through of the file system and storage medium (as well as the concept of using file system and storage medium refresh or any other means).

In another aspect, tracking data may be used to efficiently convert partially recalled file data back to a fully deduplicated file. For this purpose, the tracking data may be used when scanning files for deduplication, such that only partially recalled file ranges are provided to the deduplication system for deduplication. When these ranges are added to the chunk store, the deduplication metadata, including the tracking data, is then adjusted to reflect that the recalled ranges are now optimized chunks and no longer partially recalled ranges.

As an example of one implementation, to partially recall a file, the deduplication filter 118 creates a virtual partition view of the file. Partitioning may be done at fixed sector aligned boundaries (e.g., at 64K boundaries). The filter maintains this partition view in a bitmap structure 442 in a buffer of reparse points 446. The size of the partition may be increased when setting the reparse point.

FIGS. 5 and 6 include flow diagrams representing example steps of one implementation of a partial recall mechanism in which blocks are fixed size, a bitmap structure is maintained in a reparse point buffer, and so on. Fig. 5 starts when a recall operation is required, gives a recall operation (FileObject, FileOffset, Length) parameter, and obtains a fixed block size for the recall from the resolution point, e.g., FixedRecallSize — reparseppoint (resolution point) > fixesiformercall (fixed size for recall) (aligned sectors may be required).

Step 502 represents aligning an offset of the user file based on the available granularity, e.g., recallOffsetEnd (fixedrechallsize) ((FileOffset + Length + fixedrallsize-1)/fixedrallsize); note that the "/" sign refers to integer division, which rounds the answer down to its nearest integer value. Step 504 initializes a recall loop, i.e., recallLength (recall length) FixedRecallSize and recallOffset (recall offset) recallOffsetStart.

Step 506-. At this point, step 506 branches to step 518 to refresh the recalled portion of the deduplication file onto disk, and the process continues to FIG. 6, which is described below.

Step 508 represents an end of file evaluation to ensure that the recall operation itself does not increase the logical file size. If the recall offset indicates that the process is to recall the last chunk from the end of the file, the process may use the paged I/O technique as represented by steps 510 and 512. Using this technique, data can be recalled from the end of the file using a non-buffered handle and still not extend the size of the file (note that using paged I/O is just one example that may be used for NTFS, and alternative techniques may be used). Thus, if at step 508, a write is made at the logical end of the file, then step 510-.

For the non-end-of-file case, steps 514 and 515 are performed to read data from the chunk store and write data to the partial deduplication file.

Note that the block store data may be read in a cached manner (steps 511 and 514). However, at the time of the recall, writes to the deduplicated file may not be cached, as in step 515, in order to reduce duplication in the system cache. If the recall boundary is sector aligned, the process automatically satisfies file system requirements for sector aligned I/O during uncached reads and writes. This can be dynamically configured by querying the file system for the sector size of the volume and then dividing the file into reasonably sized sector aligned blocks for recall purposes.

Returning to step 506, when the recall offset is less than or equal to the end of the recall offset, the recalled portion is refreshed to disk at step 518. The process then continues to fig. 6.

Step 602 of fig. 6 represents an update to the bitmap structure in the reparse buffer. Once updated, step 604 is performed for crash consistency, i.e., refreshing the bitmap structure of each block. Note that if the structure is a reparse point or the like (which is part of a file), then the structure is refreshed as part of refreshing the file in step 518 of fig. 5, and thus, as an optimization, it may be acceptable to bypass step 604.

Step 606 is performed to update the in-memory representation of the data range of the deduplication file. The in-memory representation is used to provide efficiency in lookups and enables faster decisions on what portions of a file have been recalled and what portions remain in the chunk store. Note that this means that there are two representations of the state of the data range of the deduplication file; one of these representations is infiltrated on the disk in a structure similar to the reparse point on NTFS, while the other representation is saved in memory for fast lookup.

In one deduplication implementation, a data structure called an MCB (memory control block) is maintained for each deduplication file. This structure resides in another per-deduplication stream structure called a stream context.

If the entire file is dirty, at step 608, the state of the file is changed back to the non-deduplicated file. To do so, step 610 deletes the reparse point (in this example implementation), and if the file is not originally sparse, step 612 marks the file as not sparse (for which "un-sparse").

At this point, the recall operation is complete from the perspective of the filter 118, regardless of whether the file state is partially deduplicated or non-deduplicated. Step 614 forwards the user's modification request towards the file system, such as via an intermediate filter manager, for example to the file system or to another filter.

7A-7C depict alternative implementations in which all data corresponding to a user write is not recalled. In this implementation, when a user's modify/write request is received, the partial recall mechanism retrieves only those portions of the optimized data that are needed to bring recalled file portions into conformance with predetermined recall boundaries. This is accomplished by first allowing the user to write to the file system, and then examining the extent of the user's write to determine what portion of the extent does not align with the recall boundary. The range will then be aligned with the recall boundary by prefixing and suffixing the range with appropriate optimized data.

Once the optimized data has been recalled, the file is refreshed and its bitmap is updated. A key advantage here is that a smaller amount of optimized data needs to be recalled.

Thus, the user's write request is forwarded to the file system upon receipt. After the file system completes the write, but before returning the write completion to the user, the partial recall mechanism ensures that the portion corresponding to the recall granularity has also been written to disk.

As an example, consider a 128K file generally represented in FIG. 7A and a recall granularity of 64K. If the user writes 20K of user data at offset 70K, the filter forwards the write to the file system. After writing, the state of the file is as shown in FIG. 7A (where user data is labeled 770).

Until the partial recall mechanism patches the rest of the range from 64K to 128K with old data, the write does not complete. To do so, the partial recall mechanism intercepts writes until the remainder of the range is used with data read from the chunk store. This is shown in FIG. 7B, where user data is labeled 770 and data read from the block store is labeled 772A and 772B.

Note that in general, less user data is recalled. For example, in the implementations of fig. 5 and 6, 64K was recalled; in an alternative implementation, only (70K-64K) + (128K-90K) ═ 44K is recalled. Once the remaining 44k are patched, the write is complete and returned to the user in a successful or erroneous state.

Consider further an alternative implementation for large write requests, for example, where user writes 774 span multiple recall boundaries, as shown in FIG. 7C. As can be seen, the user writes from 70K to 256K. In such a case, only data from 64K to 70K is recalled (Block 776) (in contrast to having to recall all data from 64K to 256K).

FIGS. 8-10 include flowcharts representing example steps for the above-described alternative implementation of the partial recall mechanism corresponding to FIGS. 7A-7C. Step 800 represents forwarding the operation to the file system and waiting for completion; note that as described above, completion is not returned to the user until after other data is read and patched.

Step 802 represents aligning the user's file offsets based on the available granularity. Step 804 appends to the beginning of the user's modify/write range the data read offset and length required for the prefix made up of the optimized data. This aligns the beginning of the user range with the recall boundary. Such a prefix is not needed if the beginning of the user range is already perfectly aligned with the recall boundary. As can be seen, this calculation is relative to the user data, and thus less than all of the data. Note that step 806 in this example ends the process if the recall offset is less than or equal to the end of the recall offset.

Steps 808, 810, 812, 814, and 815 are similar to steps 508, 510, 512, 514, and 515 and are therefore not described in detail again for the sake of brevity. Note that these steps recall only the optimized data corresponding to the beginning of the range. A similar operation is required for the end of the range. The steps in FIG. 9 highlight the steps required to recall the optimized data corresponding to the end of the range.

Step 904 appends the data read offset and length required for the suffix composed of the optimized data to the end of the user's modify/write range. This aligns the end of the user range with the recall boundary. Such a suffix is not needed if the end of the user range is already perfectly aligned with the recall boundary. Steps 908, 910, 912, 914 and 915 are similar to steps 508, 510, 512, 514 and 515 and are therefore not described in detail again for the sake of brevity.

The steps of FIG. 10 are largely similar to those of FIG. 6, except for step 1014 where the write back to the user application is completed. At this point, the data is recalled into the file using an alternate implementation in which only those portions of the optimized data are retrieved that reconcile the recalled portions of the file with the fixed-size recall boundary.

Thus, alternative implementations are available. Note that these (and any other) alternative implementations may be used alone or in combination, depending on performance requirements and/or any file system constraints.

As can be seen, techniques are described for partially recalling file extents of deduplicated files based on modification extent (user writing) or other criteria. Latency and I/O are reduced by adjusting the range of recalls based on modification range, block alignment, and file system range. To this end, the extent of the recall may be tracked using a bitmap structure, possibly configured via encoding and compression techniques. The technique also addresses partial recall performance improvements to handle cached/uncached I/O, reducing file system flushing and file end issues. Further, the technique provides the ability to re-optimize partially recalled files by tracking the state of the partially recalled files so as to enumerate and stream only a range of partially recalled files for re-deduplication.

Exemplary networked and distributed environments

Those skilled in the art will appreciate that the embodiments and methods described herein may be implemented in connection with any computer or other client or server device, which may be deployed as part of a computer network or in a distributed computing environment, and which may be connected to any type of data store or stores. In this regard, the embodiments described herein may be implemented in any computer system or environment having any number of memory or storage units, and any number of applications and processes occurring across any number of storage units. This includes, but is not limited to, an environment with server computers and client computers deployed in a network environment or distributed computing environment, having remote or local storage.

Distributed computing provides sharing of computer resources and services through the exchange of communications between computing devices and systems. These resources and services include the exchange of information, cache storage and disk storage for objects such as files. These resources and services also include processing power sharing among multiple processing units for load balancing, resource expansion, processing specialization, and so forth. Distributed computing makes use of network connections, allowing clients to leverage their collective power to benefit the entire enterprise. In this regard, various devices may have applications, objects, or resources that may participate in the resource management mechanisms as described with reference to various embodiments of the present disclosure.

FIG. 11 provides a schematic diagram of an exemplary networked or distributed computing environment. The distributed computing environment comprises computing objects 1110, 1112, etc. and computing objects or devices 1120, 1122, 1124, 1126, 1128, etc., which may include programs, methods, data stores, programmable logic, etc., as represented by example applications 1130, 1132, 1134, 1136, 1138. It is to be appreciated that the computing objects 1110, 1112, etc. and computing objects or devices 1120, 1122, 1124, 1126, 1128, etc. can comprise different devices, such as a Personal Digital Assistant (PDA), audio/video device, mobile phone, MP3 player, personal computer, laptop computer, etc.

Each computing object 1110, 1112, etc. and computing objects or devices 1120, 1122, 1124, 1126, 1128, etc. can communicate with one or more other computing objects 1110, 1112, etc. and computing objects or devices 1120, 1122, 1124, 1126, 1128, etc. by way of the communications network 1140, either directly or indirectly. Although illustrated as a single element in fig. 11, communications network 1140 may comprise other computing objects and computing devices that provide services to the system of fig. 11 and/or may represent multiple interconnected networks, which are not shown. Each computing object 1110, 1112, etc. or computing object or device 1120, 1122, 1124, 1126, 1128, etc. can also contain an application, such as applications 1130, 1132, 1134, 1136, 1138, that can utilize an API or other object, software, firmware, and/or hardware, suitable for communication with or implementation of applications provided in accordance with embodiments of the present disclosure.

There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems may be connected together by wired or wireless systems, by local networks, or by widely distributed networks. Currently, many networks are coupled to the Internet, which provides an infrastructure for widely distributed computing and encompasses many different networks, but any network infrastructure can be used to facilitate exemplary communications with the system as described in various embodiments.

Thus, a host of network topologies and network infrastructures, such as client/server, peer-to-peer, or hybrid architectures, can be utilized. A "client" is a member of a class or group that uses the services of another class or group to which it is not related. A client may be a process, for example, roughly a set of instructions or tasks, that requests a service provided by another program or process. The client process uses the requested service without having to "know" any working details about the other program or the service itself.

In a client/server architecture, particularly a networked system, a client is typically a computer that accesses shared network resources provided by another computer (e.g., a server). In the illustration of FIG. 11, as a non-limiting example, computing objects or devices 1120, 1122, 1124, 1126, 1128, etc. can be thought of as clients and computing objects 1110, 1112, etc. can be thought of as servers where computing objects 1110, 1112, etc. act as servers providing data services, such as receiving data from client computing objects or devices 1120, 1122, 1124, 1126, 1128, etc., storing data, processing data, transmitting data to client computing objects or devices 1120, 1122, 1124, 1126, 1128, etc., although any computer can be considered a client, a server, or both, depending on the circumstances.

A server is typically a remote computer system accessible over a remote or local network, such as the internet or wireless network infrastructure. A client process may be active in a first computer system and a server process may be active in a second computer system, communicating with each other over a communications medium, thereby providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server.

In a network environment in which the communications network 1140 or bus is the Internet, for example, the computing objects 1110, 1112, etc. can be Web servers with which other computing objects or devices 1120, 1122, 1124, 1126, 1128, etc. communicate via any of a number of known protocols, such as the Hypertext transfer protocol (HTTP). Computing objects 1110, 1112, etc. acting as servers can also serve as clients, e.g., computing objects or devices 1120, 1122, 1124, 1126, 1128, etc., as may be characteristic of a distributed computing environment.

Exemplary computing device

As noted above, the techniques described herein may be advantageously applied to any device. It should be understood, therefore, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments. Accordingly, the below general purpose remote computer described below in FIG. 12 is but one example of a computing device.

Embodiments may be implemented in part via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates to perform one or more functional aspects of the embodiments described herein. Software may be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices. Those skilled in the art will appreciate that computer systems have a variety of configurations and protocols that can be used to communicate data, and thus no particular configuration or protocol should be considered limiting.

FIG. 12 thus illustrates an example of a suitable computing system environment 1200 in which one or aspects of the embodiments described herein can be implemented, although as made clear above, the computing system environment 1200 is only one example of a suitable computing environment and is not intended to suggest any limitation as to scope of use or functionality. Further, the computing system environment 1200 is not intended to be interpreted as having any dependency relating to any one or combination of components illustrated in the exemplary computing system environment 1200.

With reference to FIG. 12, an exemplary remote device for implementing one or more embodiments includes a general purpose computing device in the form of a computer 1210. Components of computer 1210 may include, but are not limited to, a processing unit 1220, a system memory 1230, and a system bus 1222 that couples various system components including the system memory to the processing unit 1220.

Computer 1210 typically includes a variety of computer readable media and can be any available media that can be accessed by computer 1210. The system memory 1230 may include computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM) and/or Random Access Memory (RAM). By way of example, and not limitation, system memory 1230 may also include an operating system, application programs, other program modules, and program data.

A user may enter commands and information into the computer 1210 through input device 1240. A monitor or other type of display device is also connected to the system bus 1222 via an interface, such as output interface 1250. In addition to a monitor, computers can also include other peripheral output devices such as speakers and a printer, which may be connected through output interface 1250.

The computer 1210 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as a remote computer 1270. The remote computer 1270 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 1210. The logical connections depicted in FIG. 12 include a network 1272, such Local Area Network (LAN) or a Wide Area Network (WAN), but may also include other networks/buses. Such networking environments are commonplace in homes, offices, enterprise-wide computer networks, intranets and the Internet.

As mentioned above, while exemplary embodiments have been described in connection with various computing devices and network architectures, the underlying concepts may be applied to any network system and any computing device or system in which it is desirable to improve the efficiency of resource usage.

Moreover, there are numerous ways of implementing the same or similar functionality, e.g., an appropriate API, tool kit, driver code, operating system, control, standalone or downloadable software object, etc., which enables applications and services to use the techniques provided herein. Thus, embodiments herein are contemplated from the standpoint of an API (or other software object), as well as from a software or hardware object that implements one or more embodiments as described herein. Thus, various embodiments described herein may have aspects that are wholly in hardware, partly in hardware and partly in software, as well as in software.

The word "exemplary" is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited to these examples. Moreover, any aspect or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, to the extent that the terms "includes," "has," "includes," and other similar words are used, for the avoidance of doubt, such terms are intended to be inclusive in a manner similar to the term "comprising" as an open transition word without precluding any additional or other elements when employed in a claim.

As noted, the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. As used herein, the terms "component," "module," "system," and the like are likewise intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.

The system as described above has been described with reference to interaction between several components. It will be understood that these systems and components may include components or specified sub-components, some specified components or sub-components, and/or additional components, and according to various permutations and combinations of the above. Sub-components can also be implemented as components communicatively coupled to other components rather than included within parent components (hierarchical). Additionally, it should be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and any one or more middle layers, such as a management layer, may be provided to communicatively couple to such sub-components in order to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but generally known by those of skill in the art.

In view of the exemplary systems described herein, methodologies that may be implemented in accordance with the described subject matter can also be understood from the flow charts with reference to the figures. While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the embodiments are not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Although non-sequential or branched flow is illustrated via flowchart, it can be appreciated that various other branches, flow paths, and orders of the blocks, may be implemented which achieve the same or similar result. Moreover, some illustrated blocks may be optional in implementing the methodologies described hereinafter.

Conclusion

While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.

In addition to the embodiments described herein, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiments for performing the same or equivalent function of the corresponding embodiments without deviating therefrom. Further, multiple processing chips or multiple devices may share the performance of one or more functions described herein, and similarly, storage may be effected across multiple devices. Accordingly, the present invention should not be limited to any single embodiment, but rather should be construed in breadth, spirit and scope in accordance with the appended claims.

Claims

1. A method for partial recall of a deduplicated file at least partially executed on a processor in a computing environment, comprising: changing the state of a file of a storage volume from a fully deduplicated state to a partially deduplicated state, including reading (511, 514, 811, 814) one or more chunks of file data from a chunk store, committing (512, 515, 812, 815) at least part of the one or more chunks to one or more recalled data ranges stably stored as the file, and maintaining information (602, 604, 606, 1002, 1004, 1006) associated with the file that tracks which data range or ranges have been recalled and which data range or ranges reside in the chunk store.

2. The method of claim 1, wherein changing the state comprises recalling the one or more data ranges for modification by a program or user write.

3. The method of claim 1, wherein reads from the chunk store obtain more data than is needed to satisfy a request, and wherein committing comprises writing the data needed to satisfy the request plus additional data to fill space allocated by the file system until a start and end allocation boundary is reached; or wherein one or more blocks needed to satisfy a request are obtained from a read of the block store, and wherein committing comprises writing the blocks needed to satisfy the request plus additional data from the one or more unneeded blocks to fill space allocated by the file system until a beginning and end allocation boundary is reached.

4. The method of claim 1, wherein the file is divided into partitions corresponding to recalled extents and extents residing in the chunk store, and wherein maintaining information associated with the file comprises maintaining a bitmap correlation structure in an auxiliary structure associated with the file, the bitmap correlation structure having data indicating whether each partition corresponds to a recalled extent or an extent residing in the chunk store.

5. The method of claim 4, further comprising compressing the bitmap structured data.

6. The method of claim 1, further comprising receiving write data, and wherein reading from the block store comprises retrieving zero or more bytes of data before the write data required to append data to a request for one recall boundary and retrieving zero or more bytes of data after the write data required to append data to a request for another recall boundary.

7. The method of claim 1, further comprising receiving a write request comprising write data, providing the write request to a file system, waiting for the write request to complete at the file system, obtaining zero or more bytes of data before the write data required to append data to a request for one recall boundary, and obtaining zero or more bytes of data after the write data required to append data to a request for another recall boundary, committing the write data and any appended data to the stable storage, and updating the information that tracks which data range or ranges have been recalled.

8. In a computing environment, a system for partial recall of deduplicated files, comprising: a partial recall means (114) configured to access associated tracking data (104), the tracking data (104) indicating which range or ranges of a file (102) have been recalled to the file and which ranges reside on and are referenced by chunk store (108), the partial recall means (114) further configured to access one or more chunks of file data in the chunk store (108), submit one or more ranges corresponding to at least portions of the one or more chunks to the file as one or more recalled ranges (116), and update the tracking data to indicate when a range has been submitted as a recalled range.

9. The system of claim 8, wherein the partial recall means is included in a file system filter.

10. A method for partial recall of a deduplicated file, comprising:

receiving a request to access a file data range of a file at a given offset, wherein at least some of the requested file data resides in a deduplication block store;

reading from the block store and writing to a store until at least the file data range at the given offset is obtained;

submitting at least part of the file data to the file as a range of one or more partial recalls;

updating tracking information indicating that the extent of the one or more partial recalls has been committed to the file; and

the request is forwarded towards the file system.