HK1150250B - Data storage space recovery system and method - Google Patents
Data storage space recovery system and method Download PDFInfo
- Publication number
- HK1150250B HK1150250B HK11104381.6A HK11104381A HK1150250B HK 1150250 B HK1150250 B HK 1150250B HK 11104381 A HK11104381 A HK 11104381A HK 1150250 B HK1150250 B HK 1150250B
- Authority
- HK
- Hong Kong
- Prior art keywords
- data storage
- file system
- storage space
- space
- page
- Prior art date
Links
Description
Cross Reference to Related Applications
[001] The present application claims the benefit of U.S. patent application entitled "Data Storage Space Recovery System and Method", serial No. 11/767,049 filed on day 6, month 22 of 2007, and to a commonly pending U.S. patent application entitled "virtual disk Drive System and Method", filed on day 8, month 13 of 2004, serial No. 10/918,329; both of these applications are incorporated herein by reference.
Technical Field
[002] The present invention relates to determining apparent free data space in a computer data storage system with implicitly allocated data space by using information provided by a host computer system that knows which allocated space is currently being used at the time of a query. By reducing the total amount of storage required, considerable cost savings can be realized over the lifetime of any given data.
Background
[003] The requirement to store and transmit ever increasing amounts of data annually for a variety of purposes, including for business practices and compliance with various laws. The medium on which this data is recorded has an acquisition price calculated in dollars, a management price calculated in human hours, and a price of infrastructure that provides such things as power and heat dissipation and/or other elements. It is desirable to reduce the cost of all these elements. It is generally accepted that the cost of managing and providing such infrastructure is a multiple of the cost of obtaining storage media. By reducing the amount of media, other infrastructure costs can be further reduced. The present invention provides a method by which data storage and related media can be saved, recycled, or reused, thereby reducing the overall cost of owning the data storage.
[004] It has previously proven possible to build a storage subsystem in which all physical storage is initially allocated to pools (pool), examples of which are discussed in commonly pending U.S. patent application entitled "Virtual Disk Drive System and method", filed 8, 13, 2004, serial No. 10/918,329. The pool can then be allocated to other entities accessible to the computing entity as needed to use the entity for data storage. The allocation of storage from a pool to a computing entity is commonly referred to in the art of the present invention as "thin provisioning". This method of allocating storage only on demand takes advantage of the implications: storage is being used by a computing entity because if the computing entity writes data, it is intended to store the data for later retrieval. By allocating only the storage identified by these particular operations, a considerable amount of storage that is not used by, and may never be used by, conventional storage subsystems may be omitted from the system as a whole, thereby reducing acquisition, maintenance, etc. costs.
[005] However, in standard protocols, it is not possible for a computing entity to communicate to the storage subsystem that a particular region to which data has previously been stored is no longer in use and can now be used again or otherwise freed. This data space may have been used for temporary storage or may simply no longer be of sufficient value to be reserved for further use. The storage subsystem continues to maintain the data space since there is no method available to identify regions that are no longer being used only from the perspective of the storage subsystem. In other words, there is no implicit way by which to unambiguously determine vacating (free) previously implicitly allocated stores without examining the data itself. As such, it is very computing resource intensive for the storage subsystem to examine the contents of all data stored by the computing entity. Thus, the storage system suffers from a very severe performance impact while trying to follow the operational or technical changes in the file system and all possible applications that may use the storage subsystem.
[006] In general, it is desirable to know exactly which blocks are being used, and which blocks are not, for any operating system and any type of file system to help make thin provisioning as efficient as possible. For users of block storage, there is no criterion to indicate to the storage unit that a block is "not being used". For conventional storage devices this information is already of no concern at all, since one physical block is mapped to each addressable block on the storage device by a physical representation. In almost all storage systems containing more than one disk (disk) device, any given addressable block can actually be mapped to almost any (and sometimes more than one) physical block on one or more physical disk devices. With a fully virtualized, thin-provisioned storage system, only information about which blocks are being used is implicitly collected — if a block is written, it is assumed to be being used. This is an intrinsically safe assumption. In the case of thin provisioning, physical blocks are allocated to map to user-addressable blocks on an as-needed basis based on the user writing to a given addressable block. A block that has never been written to "read" can return dummy data (dummy data), typically data consisting of all 0 s and having the required total length. The only way that a block may be freed for reuse in this embodiment is if a PITC is generated and the given logical addressable block is written again and the previous PITC expires. Again, this implicitly indicates that the previously allocated blocks are no longer necessary for the integrity of the addressable storage and can be reallocated, possibly to other volumes, if desired.
[007] Certain conditions may result in a large number of unused addressable blocks in any FS. An extreme example of this might be to create a single very large file containing almost the entire volume and then delete the file. The storage subsystem will implicitly allocate the storage required for each write made to the file system, in this case those containing the entire volume. After the file is deleted, most of the space allocated by the storage subsystem is no longer needed, but storage space cannot be implicitly freed, consuming resources. Over time, small allocations and re-allocations at the application or file system level may lead to the same result.
[008] Thus, the thin provisioning approach in existing data storage systems is tied to the file system operation of the operating system. These file systems do not reallocate the freed space, but instead allocate previously unused space to new file writes, i.e., new file write operations. This method of operation results in a large amount of space in a given partition that has been previously written to, which in effect no longer stores data that is available to the file system. Because the data storage system has no way of knowing which logical block addresses ("LBAs") are no longer used by the file system, the file system is layered according to the block storage provided by the data storage system, and these now unused blocks accumulate over time. This accumulation will eventually require that each point-in-time copy ("PITC") obtained will access (refer to) the previous page in the page pool, although that store is actually no longer used.
[009] Because more and more pages are declared as "being used," and not actually being used, operations such as copying (copy), copying (replication), and other data movement will take more time and will consume more storage space (possibly at all levels), thereby negating many of the space advantages of thin provisioning. A file, an example of which is lgb, is written and a corresponding new volume is allocated, and then the file of lgb is deleted. In the storage subsystem, lgb pages remain allocated in the active PITC (active PITC) and will be brought to the next PITC and so on. Pages may be replaced in later PITC, however in existing systems there is no way to free pages that the file system has announced that they are no longer in use. The result is that if the supposedly empty volume is copied using internal tools, pages of lgb will be consumed in the new copy even if the volume is empty.
[010] Therefore, a way to determine when implicitly allocated storage is no longer being used by a computing entity and can be vacated for other applications is desirable.
Disclosure of Invention
[011] The present invention provides a system and method for determining explicitly free data space in a computer data storage system with implicitly allocated data space by using information provided by a host computer system that knows which allocated space is currently being used at the time of a query. By reducing the total amount of storage required, considerable cost savings can be realized over the lifetime of any given data.
[012] In one embodiment of the invention, a method is provided to determine when implicitly allocated storage is no longer in use by a computing entity and can be vacated for other applications. One of the advantages of the present invention is that it reduces the total amount of data storage required, which reduces other resources, such as bandwidth required to copy data from one entity to another, store additional copies of data, and correspondingly reduces the use of supporting infrastructure, including space, time to transport and manage storage, and power and other potentially useful resources provided to storage devices.
[013] As will be realized, the embodiments of the invention are capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
Drawings
[014] FIG. 1 sets forth a flow chart illustrating an exemplary method for data storage space recovery according to the principles of the present invention.
[015] FIG. 2 illustrates an exemplary file system unit/sector/cluster mapping method to a page pool of explicitly free data space in a computer data storage system according to the principles of the present invention.
Detailed Description
[016] Fig. 1 and 2 illustrate a method of determining explicitly free data space in a computer data storage system with implicitly allocated data space by using information provided by a host computer system that knows which allocated space is currently being used at the time of a query.
[017] The host computer system of the present invention may include one or more computing entities (sometimes referred to as hosts or servers) connected by means such as fibre channel, SCSI, or other standard storage protocols to one or more data storage subsystems, each of which emulates or maps to one or more physical storage volumes. One embodiment of a data storage subsystem is discussed in a co-pending U.S. patent application entitled "Virtual Disk Drive System and Method," filed 8/13/2004, serial No. 10/918,329, the subject matter of which is incorporated by reference. The host or server includes an operating system ("OS"), a portion of which is referred to as a file system ("FS") having a plurality of cells/sectors/clusters, as shown in fig. 2.
[018] The host or server typically has no way to determine the differences between regular storage volumes and emulated/virtual volumes that are confined to a single physical disk. The data storage subsystem provides abstractions (abstructions) between sectors of storage units viewed by a host or server to those sectors of storage units used for data storage that are extended across multiple disks using redundant storage such as RAID or other non-redundant methods. Storage subsystem abstraction (abstrate) is allocated to storage via RAID methods in units called pages, which contain multiple sectors. This abstraction allows for simplified internal management of data allocation between Virtual volumes and actual Disk storage, and detailed implementations are discussed in a co-pending U.S. patent application entitled "Virtual Disk Drive System and Method", filed 8/13/2004, serial No. 10/918,329.
[019] Thus, in FIG. 1, a method 100 of determining explicitly free data space in a computer data storage system with implicitly allocated data space begins with step 102 of identifying FS allocation units/sectors/clusters. FS units/sectors/clusters are allocated and mapped with OS physical disk units/sectors (the FS units/sectors/clusterics area allocated and mapped with OS physical disk units/sectors in a step 104). In step 106, a list of unused blocks of the apparent free area is delivered to the storage subsystem. Upon arrival at the storage subsystem, the unused blocks are adjusted to include only the entire page. A page must be completely unused to qualify it for vacation. In step 108, a controller (not shown) may modify the valid PITC, which tracks changes to the volume over a given period of time. In step 108, a controller (not shown) determines whether each block in the list of unused blocks is in a valid point-in-time copy ("PITC"), which is a storage area or page that has been used and is not being used, or in a historical PITC, which is a storage area or page that has been used and will likely be vacated when the PITC expires. If a block in the list of unused blocks is a valid PITC, the controller returns the page to the free list in step 110. The page pool 210 in fig. 2 shows a free list of storage space. The page pool 212 in fig. 2 shows the free list after the page is sent back.
[020] If the block in the free list is a historical PITC, the controller marks the page in the valid PITC as available in step 112 to be vacated when the frozen PITC that owns the page expires to a PITC with a marked page (i.e., a later PITC may contain new data that may have overlapped with the page so the page may have been vacated implicitly anyway), so that the page will be vacated when the historical PITC expires. The data within the historical PITC is read-only and may not be altered for the lifetime. This includes writing I/O to the data page and sending the page back to the free list. Once the historical PITC expires, its pages may be sent back to the free list. Next, the controller determines whether there is another block in the list. If so, the method 100 returns to step 108 and so on. If there are no blocks in the list, the method 100 ends. The page pool 212 in fig. 2 shows the list of free pages after PITC B and C expire from the system. Pages E and N are vacated when PITC B and C expire from the system. As long as the PITC exists and provides a valid recovery point, it needs to retain all of its pages.
[021] In a typical case without the above-described method 100 of the present invention, page 6 in PITC a, page 1 in PITC B, pages 1, 2 in PITC C, as shown in fig. 2, may have been previously accessed (reference) so they must be brought forward as PITC merges and they are implicitly free space and the server or host is unaware of it. As shown in FIG. 2, FS is no longer using these memory areas as indicated by FS cluster map 202, i.e., clusters 2, 4, 5, 6 are no longer used, which is simply wasted space.
[022] To free up or make room for such space, the FS is required to identify clusters shown in cluster map 202 that are not being used and are being used. This identifies clusters 2, 4, 5, 6 as no longer being used.
[023] Then FS is required to map clusters (2, 4, 5, 6) that are not being used to the OS visible disk. This provides a mapping of cluster 2 to sectors 3, 4 on disc 0, cluster 4 to sectors 7, 8 on disc 0, cluster 5 to sectors 18 and 19 on disc 1, and cluster 6 to sectors 1 and 2 on disc 1. It should be understood that the sector numbers used herein are for illustrative purposes.
[024] Because the physical disk viewed by the design OS is consistent with the emulated/virtual volume rendered by the storage subsystem, there is a one-to-one sector mapping between the disk's OS view (view)204 and the storage subsystem volume 206.
[025] The sector addresses of the sectors identified as not being used can now be resolved to the corresponding PITC from which the data is mapped, PITC a, PITC B, and PITC C in 208. Each PITC page typically contains a very large number of sectors-sometimes thousands, and in this example, for illustrative purposes, each page contains two sectors. Thus, sectors 3 and 4 of volume 0 are mapped to page 1 of PITC B, sectors 7 and 8 of volume 0 are mapped to page 6 of PITCA, and so on. At this point, pages that cannot be vacated because other portions of the page are being used may also be resolved. For example, in fig. 2, sector 19 of volume 1 is mapped to page 5 of PITC C, which is also and still being used by sector 3 of volume 1. In this case, page 5 of PITC C is not vacated at this point.
[026] By using server information about the FS, the PITC page shown in 208 is marked as a point for future PITC that is no longer used and will not be consolidated forward beyond space recovery, thus saving considerable storage.
[027] Note that the above example does not show how FS clusters that have never been used are mapped to "zero data". It should be appreciated that although the method of the present invention identifies and parses clusters that previously contained data and no longer contained data (e.g., delete or move files, etc.), the steps used to identify and parse clusters that include some clusters that have never been used may be implemented.
[028] In general, by checking for FS, some identified pages can be removed from later PITC, and some pages are returned to the pool of stored pages in future operations. In the present invention, the FS is free to map any allocation unit used by the FS to sectors and physical disks in any way it desires. One of the keys to recovering space that is no longer used is thus to query the FS to determine which space is actually being used and at which physical location. Knowing this information, a mapping from the FS allocation units to the virtual storage subsystem volumes, and from there, to the pages, can be performed. Pages implicitly identified as being used may be explicitly determined as being free. This information can be used to optimize space usage in the appropriate PITC.
Claims (14)
1. A method of determining apparently free data storage space in a data storage subsystem implicitly allocated to a primary file system, the method comprising:
querying the primary file system to identify unused file system storage locations, wherein the file system storage locations correspond to storage space implicitly allocated in the data storage subsystem;
receiving a list of unused file system storage units from the master file system;
mapping unused file system storage units in the list of unused file system storage units to corresponding implicitly allocated data storage space in the data storage subsystem, whereby such data storage space is vacated but assumed by the data storage subsystem to be in use by the primary file system; and
the corresponding implicitly allocated data storage space is made available explicitly.
2. The method of claim 1, wherein the step of explicitly vacating the corresponding implicitly allocated data storage space is based on a determination that the corresponding implicitly allocated data storage space is in a valid point-in-time copy page or a historical point-in-time copy page.
3. The method of claim 2, wherein the corresponding implicitly allocated data storage space in the valid point-in-time copy page is released into a free page pool, thereby becoming visibly free.
4. The method of claim 2, wherein corresponding implicitly allocated data storage space in a historical point-in-time copy page is marked for release into a free page pool upon expiration of the historical point-in-time copy page, whereby the copy page becomes visibly free upon expiration of the historical point-in-time copy page.
5. The method of claim 1, wherein the main file system is connected to the data storage subsystem through a fibre channel.
6. The method of claim 1, wherein the host file system is connected to the data storage subsystem through SCSI.
7. The method of claim 1, wherein the list of unused file system storage units is adjusted to include only storage units in the data storage subsystem that correspond to a complete page.
8. The method of claim 1, wherein the data storage subsystem utilizes thin provisioning.
9. An apparatus for determining apparently free data storage space in a data storage subsystem implicitly allocated to a primary file system, the apparatus comprising:
means for querying the primary file system to identify unused file system storage units, wherein the file system storage units correspond to storage space implicitly allocated in the data storage subsystem;
means for receiving a list of unused file system storage units from the master file system;
means for mapping unused file system storage units in the list of unused file system storage units to corresponding implicitly allocated data storage space in the data storage subsystem, whereby such data storage space is vacated but assumed by the data storage subsystem to be in use by the primary file system; and
means for explicitly freeing up a corresponding portion of physical storage space in the data storage subsystem.
10. The apparatus of claim 9, wherein the means for explicitely vacating the corresponding portion of physical storage is based on a determination that the corresponding implicitly allocated physical storage is in a valid point-in-time copy page or in a historical point-in-time copy page.
11. The apparatus of claim 10, wherein the means for explicitely vacating the corresponding portion of physical storage space is configured to free the corresponding portion of physical storage space in the copy page to the free page pool at a valid point in time, thereby becoming explicitely space.
12. The apparatus of claim 10, wherein the means for expiring the corresponding portion of physical storage space is configured to label the corresponding portion of physical storage space in the historical point-in-time copy page for release into a free page pool when the historical point-in-time copy page expires, thereby becoming apparently free upon expiration of the historical point-in-time copy page.
13. A method of freeing an implicitly allocated data space on a data storage system, the method comprising:
identifying implicitly allocated data storage space in the data storage system;
querying a file system operatively connected to the data storage system whether any implicitly allocated data storage space is being used by the file system;
receiving a list of data storage spaces not in use by the file system;
mapping the data space from the list received from the file system to a physical storage space; and
freeing the data space from the list received from the file system to a page pool list, thereby converting unused implicitly allocated data storage space to apparently free data storage space for allocation by the data storage system.
14. The method of claim 13, wherein the list of data storage spaces not in use by the file system is adjusted to include only complete pages.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/767,049 US8601035B2 (en) | 2007-06-22 | 2007-06-22 | Data storage space recovery system and method |
| US11/767049 | 2007-06-22 | ||
| PCT/US2008/067905 WO2009002934A1 (en) | 2007-06-22 | 2008-06-23 | Data storage space recovery system and method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1150250A1 HK1150250A1 (en) | 2011-11-11 |
| HK1150250B true HK1150250B (en) | 2014-05-16 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9251049B2 (en) | Data storage space recovery system and method | |
| US8239648B2 (en) | Reclamation of thin provisioned disk storage | |
| US6823442B1 (en) | Method of managing virtual volumes in a utility storage server system | |
| JP5346536B2 (en) | Information backup / restore processing device and information backup / restore processing system | |
| US7792882B2 (en) | Method and system for block allocation for hybrid drives | |
| US7398418B2 (en) | Virtual disk drive system and method | |
| US7873600B2 (en) | Storage control device to backup data stored in virtual volume | |
| US8650381B2 (en) | Storage system using real data storage area dynamic allocation method | |
| JP4961319B2 (en) | A storage system that dynamically allocates real areas to virtual areas in virtual volumes | |
| US8909893B2 (en) | Virtual logical volume for overflow storage of special data sets | |
| US7584229B2 (en) | Method and system for priority-based allocation in a storage pool | |
| JP2006293981A (en) | Database storage method and database storage system | |
| US7840657B2 (en) | Method and apparatus for power-managing storage devices in a storage pool | |
| US7877554B2 (en) | Method and system for block reallocation | |
| US7899989B2 (en) | Method and system for using a block allocation policy | |
| US20070106868A1 (en) | Method and system for latency-directed block allocation | |
| US8566554B2 (en) | Storage apparatus to which thin provisioning is applied and including logical volumes divided into real or virtual areas | |
| HK1150250B (en) | Data storage space recovery system and method | |
| HK1193208A (en) | Data storage space recovery system and method |