HK1121839B - Snapshot restore method and apparatus - Google Patents
Snapshot restore method and apparatus Download PDFInfo
- Publication number
- HK1121839B HK1121839B HK09101948.2A HK09101948A HK1121839B HK 1121839 B HK1121839 B HK 1121839B HK 09101948 A HK09101948 A HK 09101948A HK 1121839 B HK1121839 B HK 1121839B
- Authority
- HK
- Hong Kong
- Prior art keywords
- data
- snapshot
- storage volume
- primary storage
- chunk
- Prior art date
Links
Description
Technical Field
The present invention is directed to a data storage system controller. In particular, the present invention is directed to a method and apparatus for recovering data in a data storage system.
Background
The need to store digital files, documents, pictures, images and other data is rapidly increasing. With respect to electronic storage of data, various data storage systems have been designed for storing large amounts of data quickly and safely. Such a system may include one or more storage devices used in a coordinated manner. Systems are also available in which data may be distributed across multiple storage devices such that if one of the storage devices (or in some cases, more than one storage device) fails, the data is not irretrievably lost. Systems that cooperate to operate multiple individual storage devices may also provide improved data access and/or storage time. Examples of systems that may provide such advantages may be found in various RAID (redundant array of independent disks) levels that have been developed. Whether implemented using one or more storage devices, the storage provided by the data storage system may be treated as one or more storage volumes.
To facilitate the availability of the desired data, it is often desirable to maintain different versions of a data storage volume. Disaster recovery is facilitated by maintaining different versions. For example, if a virus causes the current storage volume version to be lost or unavailable, the system may roll back to an earlier version of the file that did not include the introduced virus. However, maintaining different versions of a data storage volume is expensive and inefficient because it is required to maintain a complete copy of each storage volume version. This problem is multiplied if multiple backup versions of the storage volume are maintained. Furthermore, once a different version of a storage volume is restored, it is generally not possible to revert to another version, for example, if it is determined that the restored storage volume is less desirable than the previously applied storage volume. Furthermore, the storage volume selected in connection with the recovery operation is typically not immediately available, and the ability to create additional versions of the storage volume may not be possible when a rollback to the selected storage volume is completed.
Disclosure of Invention
The present invention is directed to solving these and other problems and disadvantages of the prior art. According to embodiments of the present invention, a data storage system is provided that is capable of using metadata to efficiently maintain snapshots of one or more storage volumes at different times. More specifically, even though a data piece may be applied to more than one version of a data storage volume, only one copy of each data piece in the storage volume is maintained. The metadata is used to track the version of the data storage volume to which each slice or subset of data belongs. Thus, embodiments of the present invention may be considered to include sparse snapshots (sparse snapshots). According to embodiments of the invention, a storage volume remains operational even during operations to restore the state of the storage volume to a selected recovery point. Further, the selected state of the primary storage volume, as represented by the snapshot, is immediately available after a decision to restore the primary storage volume to the selected state. As used herein, data in the selected state represented by the snapshot is immediately available, wherein the user does not need to wait for all data within the primary storage volume to be processed before the requested chunk of data can be accessed.
According to an embodiment of the invention, a snapshot is a block-level point-in-time representation of data on a storage volume. At the instant the snapshot is taken, the data must be frozen. Although the data on the storage volume may change as a result of the write operation, the data in the snapshot will remain unchanged and be frozen at the instant the snapshot is taken. To protect snapshot data, a repository (or backing store) is used to store data that is not otherwise represented in the storage volume and snapshot metadata. All data and metadata associated with the snapshot is stored in a repository. According to an embodiment of the present invention, the data stored in the snapshot is stored in a "chunk". A chunk corresponds to a plurality of logical data blocks (LBAs). Alternatively or additionally, data may be stored in the sub-blocks. A subchunk is a fixed-size subset of a chunk. The units (e.g., chunks, sub-chunks, or multiple sub-chunks thereof) used to create and manage snapshots may be selected to optimize the performance of the system.
When a snapshot is initially created, it does not contain any data. Instead, snapshot metadata represents the data contained on the storage volume. As a result, if the read operation is directed to the snapshot while the snapshot is in the initial condition, the snapshot metadata will redirect the read operation to the storage volume. If the write operation is directed to the storage volume after the snapshot is created, the metadata of the snapshot is checked to determine if the chunk of data to be overwritten contains data that has previously been written to the snapshot. If it does, the write operation is allowed to complete normally. If the write operation is to overwrite a chunk of data that has not been written to the snapshot, a copy-on-write (COW) operation is initiated. The COW operation includes reading an existing chunk of data in the storage volume to be overwritten and copying the chunk to the snapshot. The snapshot metadata is then updated to indicate that the data chunk is now contained in the snapshot. The write operation to the storage volume is then allowed to complete.
According to another embodiment of the invention, a storage volume may be restored to any existing point-in-time snapshot of the volume while maintaining all existing earlier and newer snapshots. In particular, all existing snapshots of the storage volume are maintained, allowing the version of the active volume to be advanced or rolled back to any existing snapshot. As a result, a system administrator or other user has the ability to change their decision regarding the storage volume or primary storage volume selected as active.
Embodiments of the present invention also allow immediate access to the restored primary storage volume. In particular, data blocks that need to be copied from a selected snapshot to an active storage volume as part of a restore operation may be accessed from the snapshot while data blocks that have been copied to the primary storage volume, or data blocks that are currently already present in the primary storage volume as they exist, may be retrieved from the primary storage volume. That is, the requested data blocks become available in their recovered state even though the recovery process for the entire primary storage volume has not been completed. The ability to use data obtained from a snapshot or from a primary storage volume while performing a background copy operation to restore primary storage from a snapshot is made possible by using a high watermark to track whether data should be obtained directly from a storage volume or from a snapshot. The high water mark (high water mark) may be maintained by a restore thread for moving data from the snapshot to the main volume. Immediate access is also available to other versions of the storage volume if a decision is made to interrupt the restore from a previously selected snapshot to the storage volume and to select a different version represented by a different snapshot. Additional snapshots of the primary storage volume may also be taken while a restore operation to the selected snapshot is in progress. Other embodiments of the present invention associate a restore marker with a data chunk to allow identification of data chunks that have been restored from the snapshot volume to the primary storage volume.
Additional features and advantages of embodiments of the present invention will become more readily apparent from the following description, particularly when taken in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a functional block diagram depicting components of an electronic data system incorporating a data storage system according to an embodiment of the present invention;
FIG. 2 is a block diagram depicting components of a data storage system according to an embodiment of the invention;
FIG. 3A is a block diagram depicting components of a host, management computer or server according to an embodiment of the invention;
FIG. 3B is a block diagram depicting components of a storage device according to an embodiment of the invention;
FIG. 3C is a block diagram depicting components of a storage controller according to an embodiment of the invention;
FIG. 4 is a block diagram depicting a primary storage volume and multiple snapshots of the primary storage system volume taken at different times;
FIG. 5 depicts the relationship between different volumes in a data storage system according to an embodiment of the present invention;
FIG. 6 is a flow diagram depicting aspects of snapshot processing according to an embodiment of the invention;
FIG. 7 is a flow diagram depicting aspects of a data recovery process according to an embodiment of the invention;
FIG. 8 is a flow diagram depicting aspects of a process for writing data to a storage volume according to an embodiment of the invention; and
FIG. 9 is a flow diagram depicting aspects of a read process according to an embodiment of the invention.
Detailed Description
FIG. 1 is a block diagram depicting an electronic data system 100 incorporating a data storage system 104 according to an embodiment of the invention. In general, the data storage system 104 may be interconnected to one or more host processors or computers 108 by a bus and/or network 112. Thus, embodiments of the invention have application associated with single or multiple hosts 108 in a Storage Area Network (SAN), or direct connection environment. According to other embodiments, data storage system 104 may be integrated with host 108 or directly connected to host 108. Further, the storage system 104 may be interconnected to a management computer 116. In general, the management computer 116 may provide a user interface for controlling operational aspects of the storage system 104. The management computer 116 may be interconnected with the storage system 104 directly and/or through a bus or network 112. According to other embodiments of the invention, the management computer 116 may be integrated with the host computer 108. In addition, multiple management computers 116 may be provided as part of the electronic data system 100. The electronic data system 100 may also include a plurality of data storage systems 104.
The electronic data system 100 may also include a server 120 that provides snapshot recovery services as described herein. The server 120 may be interconnected with the storage system 104 by a bus or network 112. Alternatively or additionally, the snapshot restore functionality may be provided by storage devices 124 inserted along a data channel interconnecting storage system 104 and bus or network 112, or interconnecting storage system 104 and host computer 108. According to other embodiments of the invention, the snapshot restore functionality as described herein may be provided in whole or in part by execution of instructions or programming by storage system 104. As yet another alternative, the snapshot restore functionality may be provided by host 108 or management computer 116.
FIG. 2 illustrates components that may be included in an example data storage system 104 including a RAID system in accordance with embodiments of the present invention. Generally, the data storage system 104 includes a plurality of storage devices 204. Examples of storage devices 204 include hard disk drives such as Serial Advanced Technology Attachment (SATA), Small Computer System Interface (SCSI), Serial Attached SCSI (SAS), Fibre Channel (FC), or Parallel Advanced Technology Attachment (PATA) hard disk drives. Other examples of storage devices 204 include tape storage devices, optical storage devices, or solid state disk devices. Further, while a number of storage devices 204 are illustrated, it should be understood that embodiments of the invention are not limited to any particular number of storage devices and that a fewer or greater number of storage devices 204 may be provided as part of the data storage system 104. As can be appreciated by those skilled in the art, an array and/or array partition (hereinafter referred to as a Logical Unit Number (LUN)) can be established on the data storage device 204. As those skilled in the art will also appreciate, LUNs may be implemented according to any one of a variety of array levels or other arrangements to store data on one or more storage devices 104. As will also be appreciated by those skilled in the art, the storage device 204 includes: data comprising a primary storage volume that may correspond to a LUN; and one or more snapshots of the storage volume taken at different times.
The first controller slot 208a may be provided to a data storage system 104 according to an embodiment of the present invention. In addition, other embodiments may include additional controller slots, such as the second controller slot 208 b. As can be appreciated by one skilled in the art, the controller slot 208 can include a connection or a set of connections such that the controller 212 can be operably interconnected with other components of the data storage system 104. In addition, the data storage system 104 according to an embodiment of the present invention includes at least one controller 212 a. For example, when data storage system 104 is operated in a single controller, non-failover (non-failover) mode, data storage system 104 may actually include one controller 212. By providing the second controller 212b, data storage systems 104 according to other embodiments of the present invention may operate in a dual redundant active-active controller (dual redundant active-active controller) mode. When the second controller 212b is used in addition to the first controller 212a, the second controller 212b is received through the second controller slot 208 b. As can be appreciated by those skilled in the art, the provision of two controllers 212a-212b allows for mirroring of data between the controllers 212a-212b, thereby providing redundant active-active controller operations.
Typically, one or more buses or channels 216 are provided to interconnect the controller 212 with the storage devices 204 through associated controller slots 208. Further, while illustrated as a single shared bus or channel 216, it is understood that multiple dedicated and/or shared buses or channels may be provided. Additional components that may be included in the data storage system 104 include one or more power supplies 128 and one or more cooling units 132. In addition, a bus or network interface 136 may be provided to interconnect the data storage system 104 to the bus or network 112, and/or the host computer 108 or the management computer 116.
Although illustrated in FIG. 2 as a complete RAID system, it should be understood that the data storage system 104 may include one or more storage volumes implemented in various other ways. For example, the data storage system 104 may include: a hard disk drive, or other storage device 204 connected to or associated with a server or general purpose computer. As yet another example, storage system 104 may comprise a simple hard disk cluster (JBOD) system or a switched hard disk cluster (SBOD) system.
The snapshot recovery method and apparatus may be implemented in various ways. For example, the snapshot restore functionality may be implemented with respect to a server 120 interconnected to the storage system 104 by a bus or network 112, or with respect to some other computing device, such as a host computer 108 or a management computer 116. According to yet another embodiment, the snapshot method and apparatus may be implemented with respect to an inline (inline) application 124 between the data storage system 104 and the host computer 108. According to other embodiments of the invention, snapshot functionality may be provided with respect to the operation or execution of instructions or code by a component or subsystem of the data storage system 104, such as by the data storage system controller 212.
Referring to FIG. 3A, an example host 108, management computer 116, server 120, or other device is illustrated with respect to an embodiment of the present invention in which snapshot functionality is provided by software running on the device 108, 116, or 120. The components may include a processor 304a capable of executing program instructions. Thus, the processor 304a may comprise any general purpose programmable processor or controller for executing application programming. Alternatively, the processor 304a may comprise a specially configured Application Specific Integrated Circuit (ASIC). The processor 304a is typically operative to execute programming code including operating system software and one or more applications that implement various functions performed by the device 108, 116 or 120.
The device 108, 116 or 120 may additionally include a memory 308a for use in connection with the processor 304a executing programming and for temporarily or long term storage of data or program instructions. For example, memory 308a may be used in connection with the execution of a snapshot recovery algorithm. The memory 308a may comprise solid state memory of a resident, removable, or remote nature, such as DRAM and SDRAM.
Data storage 314a may also be included for application programming and/or storage of data. Operating system software 318 may be stored in data storage device 314a, for example. Further, data storage 314a may be used to store a snapshot restore process, or application 328a that includes instructions for providing snapshot and restore functionality of a storage volume as described herein. The snapshot restore application 328a may itself include a number of modules or components, such as a main input/output (IO) module 332a, and a restore thread or module 336 a.
The device 108, 116, or 120 may also include one or more network interfaces 340 a. Examples of network interface 340a include a Fibre Channel (FC) interface, an ethernet or other type of communication interface. The network interface 340a may be provided in the form of a network interface card or other adapter, as will be appreciated by those skilled in the art.
Host computer 108 or administrative computer 116 implementing or providing the snapshot restore 328 application or function may include the same conventional components as server 120. In particular, host computer 108 or administrative computer 116 providing the functionality of snapshot restoration application 328 will generally include a data storage 314a containing operating system 318 and snapshot restoration application 328a instructions, a processor 304a for executing those instructions, a memory 308a for using the execution with respect to those instructions, and a network interface 340 a. Typically, however, host computer 108 or management computer 116 will include additional application programming, as well as additional components, for providing other features. For example, host computer 108 may include one or more applications for servicing, creating, and/or using data stored in data storage system 104. As another example, the management computer 116 may include application programming for managing aspects of the data storage system 104. Additional components that may be included as host computer 108 or administrative computer 116 include user input and output devices.
Referring to FIG. 3B, components that may be included as part of the network or storage device 124 are illustrated. Generally, the components include a processor 304b, a memory 308b, and one or more network or communication link interfaces 340 b. In general, the network device 124 is characterized as being inserted into a communication path or link between the host computer 108 and the data storage system 104. Alternatively or additionally, the device 124 is characterized by: according to an embodiment of the invention, firmware is executed that implements a snapshot recovery algorithm or process 328b, where snapshot recovery algorithm 328b is stored or encoded as firmware. According to embodiments of the invention, the snapshot restore algorithm or process 328b may be stored or encoded in memory 308b provided as part of device 124.
As mentioned above, the snapshot restore algorithm or process 328 according to embodiments of the present invention may also be implemented with respect to the operation of the storage controller 212 of the data storage system 104. FIG. 3C illustrates the storage controller 212 providing snapshot restore application or process 328 functionality (shown as snapshot restore instructions 328C) according to an embodiment of the present invention. In general, the storage controller 212 includes a processor or processor subsystem 304c capable of executing instructions for performing, implementing, and/or controlling the various controller 212 functions. Such instructions may include instructions 328c for implementing aspects of the snapshot restore methods and apparatus as described in this disclosure. Further, such instructions may be stored as software and/or firmware. As will be appreciated by those skilled in the art, operations related to generating parity data, or other operations, may be performed using one or more hardware (hardwired), and/or programmable logic circuits provided as part of processor subsystem 304 c. Thus, processor subsystem 304c may be implemented as a number of discrete components, such as one or more programmable processors with respect to one or more logic circuits. Processor subsystem 304c may also include or be implemented as one or more integrated devices or processors. For example, the processor subsystem may include a Complex Programmable Logic Device (CPLD).
The controller 212 also typically includes a memory 306. The memory 306 is not particularly limited to any particular type of memory. For example, the memory 306 may include a solid state memory device, or a plurality of solid state memory devices. Further, the memory 306 may include separate volatile 308c and non-volatile 310 portions. As will be appreciated by those skilled in the art, the memory 306 typically includes a write cache 312 and a read cache 316 provided as part of the volatile memory 308c portion of the memory 306, although other arrangements are possible. By providing caches 312, 316, storage controller 212 may improve the speed of input/output (IO) operations between host 108 and data storage device 204, which includes an array or array partition. Examples of volatile memory 308c include DRAM and SDRAM.
The non-volatile memory 310 may be used to store data that is input to the write cache 312 of the memory 306 in the event of a power interruption that affects the data storage system 104. The non-volatile storage portion 310 of the storage controller memory 306 may comprise any type of data storage device capable of retaining data without requiring power from an external source. Examples of non-volatile memory 310 include, but are not limited to: compact flash or other standardized non-volatile storage.
The memory 306 also includes a portion of the memory 306 that includes an area 324, the area 324 providing storage for controller code 326. Controller code 326 may include a number of components, including a snapshot restore process or application 328c, which snapshot restore process or application 328c includes instructions for providing a snapshot of a storage volume and restore functionality as described herein. The snapshot restore application 328c may itself include a number of modules, such as a primary input/output (IO) module 332c and a restore thread or module 336 c. As shown in fig. 3c, a controller code area 324 may be established in the volatile memory 308 portion of the storage controller memory 306. Alternatively or additionally, the controller code 326 may be stored in the non-volatile memory 310.
The storage controller 212 may additionally include other components. For example, a bus and/or network interface 340c may be provided to operatively interconnect the storage controller 212 to the remainder (remainders) of the data storage system 104, such as through the controller slot 208 and bus or channel 216. In addition, the interface 340c can be configured to facilitate removal or replacement of the storage controller 212 in the controller slot 208 as a Field Replaceable Unit (FRU). In addition, to interconnect the various components of the memory controller 212, a complete (integral) signal and power channel may be provided.
With reference to FIG. 4, a primary storage volume 404 and a plurality of snapshots of the storage volume or snapshot volumes 408 taken at different times T0-Tx are depicted. As used herein, snapshot 408 is a virtual volume that represents data present on primary storage volume 404 at the point in time that snapshot 408 was taken. The primary storage volume 404 is the current set of data maintained on the data storage system 104. The primary storage volume 404 may correspond to a standard RAID volume or LUN. In the example of FIG. 4, the first (oldest) snapshot taken of the storage volume is snapshot 408a taken at time T0. The next oldest snapshot 408b is taken at time T1. The most recent fixed or stored snapshot 408 in the example of fig. 4 is snapshot 408c, which was taken at time T2. The current snapshot 412 has not been fixed. Thus, the data contained by the current snapshot 412 changes as the data in the primary storage volume 404 changes. Since the current snapshot 412 has not yet been fixed, its time is shown as Tx. If a command is received to fix current snapshot 412 (e.g., as a result of an operation of an automated process or an administrator's decision to determine when to take the snapshot), then the snapshot will be associated with time T4 and will become a full snapshot. The generation of a new current snapshot may then begin.
In general, each full snapshot 408 includes metadata describing the data included in the snapshot. Furthermore, if a data block in the storage volume 404 is changed or overwritten, the latest full snapshot 408 containing the data block will be modified to include a copy of the original data block. Thus, each snapshot 408 includes a reference or copy of each data block included in primary storage volume 404 at the time snapshot 408 was taken. In addition, a copy of the data blocks referenced by snapshot 408 may be maintained by an earlier snapshot 408. Thus, data blocks may be shared between snapshots 408. However, according to embodiments of the present invention, only one copy of each data block included in primary storage volume 404 in its current state, or included in a storage volume at any other time captured by snapshot 408, is maintained among the data blocks in primary storage volume 404 or snapshot 408 of the primary storage volume.
Further, embodiments of the invention allow multiple snapshots 408 to be maintained from different times. Furthermore, even if a restore operation is initiated or even completed to return the state of the primary storage volume 404 represented by the selected snapshot 408, the primary storage volume 404 may also return the state represented by any other snapshots 408 that have been taken. For example, if the administrator selects snapshot 408b from time T1 and initiates or completes the restore process for that snapshot 408b, snapshot 408a from the earlier time T0 and snapshot 408c from the later time T2 remain available, e.g., if the administrator determines that one of the other snapshots 408 is more preferred than the selected snapshot 408 b. That is, embodiments of the present invention allow all snapshot 408 data and metadata to be maintained to allow the contents of the primary storage volume 404 to be rolled back (roll backed) or advanced to any existing snapshot 408. In addition, the contents of the primary storage volume 404 may be scrolled to the snapshot 408 even before an earlier restore operation is completed to revert the contents of the primary storage volume 404 to another snapshot 408. Thus, the data in the selected state can be made immediately available to the user. In yet another aspect of embodiments of the present invention, an additional snapshot of the restored primary storage volume 404 (i.e., the current snapshot 412) may be taken even though the restore operation is being performed as a background operation.
Referring to FIG. 5, storage of data blocks in a data storage system 104 is depicted in accordance with an embodiment of the present invention. In particular, FIG. 5 illustrates a backing store for primary storage volume 404, snapshot volume 504, and primary volume 508. In the primary storage volume 404, a plurality of data chunks or blocks a 1512, B0516, and C0520 are shown. These data chunks 512 and 520 represent the actual data stored as part of the primary storage volume 404 in the data storage system 104.
Snapshot volume 504 is a virtual volume that includes metadata. Thus, all of the data that is being represented on snapshot volume 504 actually exists elsewhere. In particular, the data included in a particular snapshot 408 exists on a primary storage volume 404 or on a backing store 508. More specifically, data that has not been modified since snapshot 408 was taken is present on primary storage volume 404, while data that has been modified since snapshot 408 was taken is present on backing store 508. In general, backing store 508 has information about primary storage volumes 404 and the virtual snapshot 408 volumes associated with the primary storage volumes. As will be appreciated by those skilled in the art, the backing store 508 comprises volumes in the data storage system 104. The backing store 508 may be established and controlled by the same controller 212 as the primary storage volume 404 associated with the backing store 508. According to other embodiments, the backing store and its contents may be established and controlled by another system node or system component (e.g., host computer 108, management computer 106, server 120, or device 124) that provides the described snapshot restore capability. A single backing store 508 may exist for each snapshot-capable primary storage volume 404. Alternatively, multiple primary storage volumes 404 may be assigned to a single backing store 508.
In the example of fig. 5, snapshot volume 504 contains snapshot 408 taken at time T0. The data present in primary storage volume 404 at time T0 consists of data chunk A0524, data chunk B0516, and data chunk C0520. At the time depicted in FIG. 5, primary storage volume 404 no longer contains data chunk A0524. Instead, it contains data chunk A1512. Thus, when data storage system 104 receives data chunk A1512 to store in primary storage volume 404, data chunk A0524 is copied from primary storage volume 404 to backing store 508. Thus, snapshot 408 at time T0 includes metadata (shown as A0 '524') indicating that data chunk A0524 is associated with snapshot 408 and located in backing store 508. Snapshot 408 at time T0 also includes metadata (B0 '516' and C0 '520') indicating that data chunks B0516 and C0520 are located in the primary storage volume 404. Further, the data chunks may be shared in snapshot 408. For example, an overwritten data chunk may be associated with the most recent snapshot 408, the most recent snapshot 408 including the data chunk as it existed prior to being overwritten. Snapshots 408 other than the snapshot 408 associated with the data may also reference snapshots in the metadata such that data chunks are located if needed.
FIG. 5 also illustrates data chunk SW 528 being written directly to snapshot 408. As will be appreciated by those skilled in the art having the benefit of this disclosure, snapshot 408 may include a collection of metadata that is actually stored in backing store 508. Because snapshot volume 504 itself does not contain data chunks, but only metadata, data chunk SW 528 resides on backing storage 508 as chunk SW 528. Thus, snapshot 408 includes metadata SW '528' indicating that data chunk SW 528 is located in backing store 508.
Referring to FIG. 6, aspects of the operation of the data storage system 104 in handling IO operations to a primary storage volume 404 are illustrated, in accordance with an embodiment of the present invention. Such operations may be performed with respect to execution of the controller code 326 instructions. More specifically, such operations may be performed by the primary IO processes 332a, 332b, 332c executing the snapshot restore algorithm 328. Such processing may also be performed by primary IO processing 332a, 332b, 332c executing snapshot restoration algorithm 328 executed by another system node or system component (e.g., by host computer 108, management computer 116, server 120, or device 124). Initially, at step 604, a determination is made as to whether the IO operation to be performed includes creation of a snapshot 408 of the primary storage volume 404. If snapshot 408 is to be created, metadata that references data in primary storage volume 404 is written to snapshot volume 504 in backing store 508, thereby creating virtual snapshot 408 of the data (step 608). At the time the snapshot 408 is created, all of the data of the snapshot is metadata in its entirety, since it resides in the primary volume 404. At step 610, an in-memory data structure is created to allow snapshot information to be read directly from memory.
If it is determined at step 604 that a snapshot is not to be created, a determination is next made as to whether a read operation is to be performed (step 612). If a read step is to be performed, a determination is made as to whether data is to be read from the primary storage volume 404 (step 614). If data is to be read from the primary storage volume 404, then data is read from the volume 404 (step 616). If the data is not to be read from the primary storage volume 404, the metadata in the snapshot volume 504 for the target snapshot 408 is consulted to determine the actual location of the chunk of data needed to satisfy the read operation (step 618). For example, according to embodiments of the invention, the data chunks associated with snapshot 408 may reside in primary storage volume 404, or in backing store 508. Then, as indicated by the snapshot 408 metadata, the chunk of data is retrieved from the primary storage volume 404 or the backing store 508 (step 620).
After determining at step 612 that a read operation for the snapshot has not been received, a determination is made as to whether a write operation has been received (step 622). If a write operation has been received, a determination is made as to whether the write operation is directed to the primary storage volume 404 (step 624). If the write operation is not to the primary storage volume 404, but to snapshot 408, then the data is written to backing store 508 and the snapshot metadata is updated (step 628).
If the operation is directed to the primary storage volume 404, the metadata of the current snapshot is read (step 630). A determination is then made as to whether the current snapshot 408 requires a chunk of data currently in the primary storage volume (step 632).
If the current and/or most recent snapshot 408 (i.e., that is part of the primary storage volume of the image at the point in time represented by the current and/or most recent snapshot) needs to overwrite a chunk of data, a copy-on-write (COW) operation is initiated to write the existing chunk of data to the current snapshot (e.g., to the backing store 508) (step 636). The metadata of the most recent snapshot 408 is then updated to indicate that the chunk of data is now located in backing store 508 (step 640). After copying the desired data blocks from the primary storage volume to the current snapshot and updating the current snapshot metadata, or after determining that the current snapshot does not require data chunks that are overwritten as part of a write operation, the write operation to the primary storage volume is completed (step 644).
If it is determined at step 622 that the operation is not a write operation, then the operation must be a delete operation (step 648). For all existing data chunks in the deleted snapshot, the metadata of the next oldest snapshot 408 is checked, and all data chunks included in the deleted snapshot that are needed for the next oldest snapshot 408 are moved to the next oldest snapshot 408 (step 652).
Referring to FIG. 7, operational aspects of the data recovery system 104 are illustrated with respect to restoring the storage volume 404 to the state represented by the snapshot 408. The recovery process may be performed with respect to executing a recovery module or process 336. At step 704, snapshot restore thread processing is initiated. More specifically, the initiation of the restore thread process may be in response to a command to restore the primary storage volume 404 to the state represented by the snapshot 408. After the restore storage thread process is initiated, the high-watermark (high-watermark) of the master storage volume is set to 0 (step 708). At step 712, the next chunk of data in the primary storage volume 404 is identified, and a determination is made as to whether a restore flag for the chunk of data has been set. In general, the restore flag is used to track those chunks of data included in primary storage volume 404 that have been restored to the state represented by selected snapshot 408.
If the recovery flag identifying the data chunk is not set, a determination is made as to whether the identified data chunk is already present in the snapshot 408 at the recovery point (step 712). If the data chunk does not already exist in the snapshot 408, the data chunk is moved from the primary storage volume 404 to the most recent snapshot 408 at the recovery point (step 716). As will be appreciated by those skilled in the art, moving a chunk of data to the most recent snapshot 408 may include moving the chunk of data to backing store 508. After moving the data chunk to the most recent snapshot 408, or after determining that the snapshot already exists in snapshot 408, the data chunk represented by the restored snapshot 408 is moved from the restored snapshot 408 to the primary storage volume 404 (step 720). As will be appreciated by those skilled in the art in view of the description provided herein, a restored chunk of data (i.e., a chunk of data that is in the state it existed when the restored snapshot was taken) may be moved to the primary storage volume 404 from a location in the backing store associated with the restored snapshot 408 (or from another snapshot 408 that is referenced in the metadata included in the restored snapshot 408). Further, after the data chunk is restored, or after it is determined at step 712 that the restore flag for the selected chunk has been set, the high watermark in the storage volume is incremented (step 724). The high watermark identifies the point in the storage volume 404 through which recovery of the data chunk has occurred. The high watermark provides a quick reference that may be used to help determine the actions of the data storage system 104 with respect to read and write operations at different points in the primary storage volume 404.
After incrementing the high watermark, a determination may be made as to whether the current high watermark is greater than the chunk number of the last selected chunk of data (step 728). If it is determined that the high watermark is not greater than the number of the last data block in the storage volume 404, processing may return to process step 712 for processing the next data chunk. If it is determined that the high watermark is greater than the last chunk number included in the storage volume 404, the process may end. That is, if the high watermark value is greater than the number of the last chunk (where the chunks are numbered sequentially), then each chunk in the primary storage volume 404 is restored to the state represented by the recovery snapshot 408.
Referring now to FIG. 8, aspects of the operation of data storage system 104 with respect to receiving data for writing to primary storage volume 404 when performing a restore operation or restore process for returning the state of primary storage volume 404 to the state represented by storage snapshot 408 are illustrated. Such operations may be performed by executing a snapshot recovery application or algorithm 328. Initially, a restore process in the primary IO path is performed (step 804). At step 808, data for writing to the primary storage volume 404, including at least the first chunk of data, is received into the cache (e.g., in the storage 306) and the data is locked to prevent destaging of the write data to the primary storage volume 404. At step 812, a determination is made as to whether the chunk address (i.e., the target address) for the write operation is above the high watermark. If the target LBA range is above the high watermark, then a determination is next made as to whether the restore process for the data chunk in the target LBA has been completed (i.e., whether the target address of the received data chunk includes an existing data chunk that has been restored) by determining whether a restore flag has been set for the data chunk at the address indicated in the primary storage volume 404 (step 816).
If a recovery flag is not set for the data chunk under consideration, a determination is made as to whether a data chunk exists in the current snapshot 408 at the recovery point (step 820). If no data chunks exist in the current snapshot 408, the data chunks are moved from the primary storage volume 404 to the most recent snapshot 408 at the recovery point (step 824). After the data chunk is moved from the primary storage volume 404 to the most recent snapshot 408 at the recovery point, or after it is determined that the data chunk exists in the most recent snapshot 408 at the recovery point, the data chunk is moved from the recovery snapshot 408 to the primary storage volume 404 (step 826). The restore flag for the data chunk that has been moved at step 824 is then set, indicating that the restore process has been performed on the data chunk (828).
After the recovery process is completed and the flag for the data chunk is set at step 828, after the chunk address is determined not to be above the high watermark at step 812, or after the recovery flag has been set for the data chunk at step 816, a determination is made as to whether the current snapshot exists (step 832). If the current snapshot is found to exist, a determination is made as to whether data exists in the current snapshot for the selected chunk of data (step 836). If data for the selected data chunk does not exist in the current snapshot, the data chunk is moved from the primary storage volume 404 to the current snapshot 408 (840). After moving the data chunk from the primary storage volume 404 to the current snapshot 408 at step 840, or after determining that there is data in the current snapshot for the selected data chunk at step 836, or after determining that there is no current snapshot at step 832, the data chunk held in the cache is unlocked and destage of the data chunk from the cache to the primary storage volume 404 is allowed (step 844). Thus, writing the received data chunk to the primary storage volume 404 while retaining any data written to the address of the received data chunk as part of any applicable snapshots 408 may end the process for writing the received data during the restore operation.
As will be appreciated by those skilled in the art in view of the present description, a data storage system 104 according to embodiments of the present invention may accept new data for storage in a storage volume 404 even if the primary storage volume 404 is restored to a previous state. That is, from the user's perspective, the data restored to the state represented by the selected snapshot 408 is immediately available. In addition, the data is available in the state it was restored to during the restore operation to roll back (or advance) the primary storage volume 404 to the state represented by snapshot 408. Thus, embodiments of the present invention avoid long delays in the availability of the data storage system 104 with respect to write operations when restoring the storage volume 404. More specifically, from the user or client's perspective, the restore operation is completed immediately (i.e., upon startup).
Referring now to FIG. 9, aspects of the operation of the data storage system 104 with respect to receiving read requests when performing a restore operation are illustrated. The operations involved may be performed with respect to the snapshot restore code or instructions 324 and the various modules of the code. Initially, at step 904, a read operation is received for the primary storage volume 404 while a recovery operation is in progress, and at step 908, a determination is made as to whether the LBA range of the read operation is below a high water mark. If the LBA range of the read operation is below the high water mark, the requested data is returned from the primary storage volume 404 (step 912). That is, data below the high watermark has been restored to the state it was in at the point in time represented by the selected snapshot 408. If it is determined at step 908 that the LBA range of the requested data is not below the high watermark, the requested data is retrieved from the recovery snapshot 408 (or the location indicated by the recovery snapshot 408, which may be the primary storage volume 404 or a location located in the backup storage 508) (step 916).
The foregoing discussion of the invention has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit the invention to the form disclosed herein. Accordingly, variations and modifications commensurate with the above teachings, within the skill and knowledge of the relevant art, are within the scope of the present invention. The embodiments described herein above are further intended to explain the present invention by way of example as a presently known and recent mode of the invention and to enable others skilled in the art to utilize the invention in such, or other, embodiments and with the various modifications required by their particular application or use of the invention. It is intended that the claims be construed to include alternative embodiments to the extent permitted by the prior art.
Claims (26)
1. A storage volume snapshot method, comprising:
initiating a first restore operation to return the state of the primary storage volume to a previous state represented by the first snapshot of the primary storage volume;
reading data from a primary storage volume immediately after initiation of, and during, a first restore operation, wherein the data read from the primary storage volume is in a state represented by a first snapshot of the primary storage volume,
the method further comprises the following steps:
while performing the recovery operation, receiving a first chunk of data from the host for writing to a first target address in the primary storage volume;
after receiving the first chunk of data, determining whether the first target address contains an existing chunk of data that has been restored to the state represented by the first snapshot;
in response to determining that the first target address contains an existing chunk of data that has not been restored to the state represented by the first snapshot:
migrating a chunk of data of a first target address maintained as part of the first snapshot to a target address in the primary storage volume, wherein the first target address contains the recovered chunk of data when migrating the chunk of data is complete;
a first chunk of data is written to a first target address in a primary storage volume, wherein a recovered chunk of data at the primary storage volume is overwritten.
2. The method of claim 1, further comprising:
a first snapshot is selected from a plurality of snapshots.
3. The method of claim 2, further comprising:
selecting a second snapshot from the plurality of snapshots after initiating the first restore operation;
initiating a second restore operation, wherein the second restore operation returns the state of the primary storage volume to the state of the primary storage volume represented by the second snapshot of the primary storage volume;
data is read from the primary storage volume immediately after initiation of the second restore operation, wherein the data read from the primary storage volume is in a state represented by the second snapshot of the primary storage volume.
4. The method of claim 3, wherein the second snapshot represents a state of the primary storage volume at a point in time after the state of the primary storage volume represented by the first snapshot.
5. The method of claim 3, wherein the second snapshot represents a state of the primary storage volume at a point in time prior to a state of the primary storage volume represented by the first snapshot.
6. The method of claim 1, further comprising:
a current snapshot of the primary storage volume is taken immediately prior to initiating the first restore operation, wherein the current snapshot includes a second snapshot representing a state of the primary storage volume immediately prior to initiating the first restore operation, and wherein data read from the primary storage volume immediately after initiation of the first restore operation data is in the state represented by the first snapshot of the primary storage volume.
7. The method of claim 6, further comprising:
after the first restore operation is initiated, a second restore operation is initiated to return the state of the primary storage volume to the state represented by the second snapshot of the primary storage volume.
8. The method of claim 1, further comprising:
maintaining a high watermark to track how far a background copy operation has progressed through primary storage volume addresses, wherein at least some of the data blocks stored at addresses in the primary storage volume are different from data blocks at those addresses as maintained by the first snapshot, and wherein said initiating a first restore operation comprises: as part of the background copy operation, a different data block than the first snapshot is copied to the primary storage volume.
9. The method of claim 1, wherein the first snapshot includes metadata, and wherein the metadata relates to at least one chunk of data in the primary storage volume and at least one chunk of data in the backing store.
10. The method of claim 1, further comprising:
in response to determining that the existing data chunk is included in the most recent snapshot, the existing data chunk is moved from the first target address to the most recent snapshot.
11. The method of claim 1, further comprising:
in response to determining that the current snapshot exists:
after moving a chunk of data of a first target address maintained as part of a first snapshot to a first target address in a primary storage volume such that the first target address contains a restored chunk of data, copying the restored chunk of data to a current snapshot,
wherein the first chunk of data is not written to the first target address in the primary storage volume until copying of the restored chunk of data to the current snapshot has completed.
12. The method of claim 1, further comprising:
a restore flag associated with a chunk of restore data at a first target address of the primary storage volume is set to indicate that a restore operation of the chunk of data at the target address in the primary storage volume has completed.
13. The method of claim 1, further comprising:
in response to determining that the first target address contains a chunk of data that has been restored to the state represented by the first snapshot, the first chunk of data is written to the first target address in the primary storage volume.
14. The method of claim 1, wherein determining whether the first target address of the first chunk of data contains a chunk of data that has been restored to the state represented by the first snapshot comprises: determining whether the first target address is above a high water mark,
and wherein the step of determining whether the first target address of the first chunk of data contains a chunk of data that has been restored to the state represented by the first snapshot further comprises: it is determined whether a resume flag associated with the chunk of data at the first target address is set.
15. The method of claim 1, further comprising:
receiving a read request for a second target address in the primary storage volume while performing a restore operation;
in response to receiving the read request, determining whether the second target address is below a high water mark;
in response to determining that the second target address is not below the high watermark, retrieving the requested data from the first snapshot;
in response to determining that the second target address is below the high watermark, the requested data is retrieved from the primary storage volume.
16. The method of claim 15, wherein the requested data is retrieved from the first snapshot, and wherein retrieving the requested data from the first snapshot comprises: data identified in the metadata included in the first snapshot is retrieved from the backing store.
17. The method of claim 10, wherein the step of moving the existing chunk of data from the first target address to the most recent snapshot comprises: existing data chunks are moved from the primary storage volume to the backing storage, and the existing data chunks on the backing storage are associated with the most recent snapshot in metadata that includes the most recent snapshot.
18. The method of claim 1, wherein the step of moving chunks of data at a first target address maintained as part of the first snapshot to target addresses in the primary storage volume comprises: a data chunk of a first target address maintained as part of the first snapshot is moved from the backing store to a target address in the primary storage volume.
19. The method of claim 1, further comprising:
canceling the restore operation and returning the primary storage volume to a state prior to performing the restore operation that is different from the state represented by the first snapshot, wherein the step of returning the primary storage volume to the state prior to performing the restore operation comprises: the primary storage volume is restored to the state represented by the second snapshot.
20. The method of claim 1, further comprising:
after the first restore operation is initiated and before the first restore operation is completed, a second snapshot of the primary storage volume is taken, wherein the data represented in the second snapshot is an image at the point in time of the restored primary volume even though the actual restore is occurring as a background operation.
21. The method of claim 1, further comprising:
after initiating the first restore operation and before completing the first restore operation, taking a second snapshot of the primary storage volume, wherein the data represented in the second snapshot includes a first chunk of data associated with the first address in a state represented by the first snapshot, and wherein the chunk of data at the first address in the primary storage volume is not in the state represented by the first snapshot.
22. A data storage system, comprising:
at least one of a controller, a storage device, and a computer, comprising:
a memory;
a snapshot restore instruction loaded into memory;
a processor, wherein the processor executes a snapshot restore instruction, and wherein execution of the snapshot restore instruction comprises execution of a snapshot restore process;
a storage device interconnected with at least one of the controller, storage device, and computer, the storage device comprising:
a primary storage volume;
snapshot metadata; and
the storage of the backup is carried out,
wherein during execution of the controller code in combination with the restore operation to restore the state of the primary storage volume to the state represented by the first snapshot, data in the state represented by the first snapshot is available from one of the primary storage volume and the backup storage without first moving the data from the backup storage to the primary storage volume.
Wherein a high watermark is maintained to indicate a last address in the primary storage volume that has been restored to the state represented by the first snapshot,
wherein, in response to receiving a write request to a first address above a high watermark during a restore operation, existing data at the first address is moved to a backing store and associated with the most recent snapshot, data to be restored to the first address is moved from the backing store to the first address in the primary storage volume, and data from the write request is written to the first address in the primary storage volume,
wherein the data restored to the first address is overwritten with the data from the write request.
23. The system of claim 22, wherein the system comprises a plurality of storage devices, and wherein the primary storage volume comprises a RAID array partition established across the plurality of storage devices.
24. A data storage system, comprising:
means for storing data including data comprising a primary storage volume, metadata comprising at least a first snapshot, and data associated with the at least a first snapshot, wherein the at least a first snapshot is not included in the primary storage volume;
means for controlling data input/output operations to the means for storing data, comprising:
means for storing program instructions for execution;
means for executing stored program instructions;
wherein the program instructions comprise:
instructions for implementing a process of taking a snapshot of the primary storage volume at a point in time,
instructions for restoring a state of the primary storage volume to a state of the primary storage volume at a point in time represented by the first snapshot,
wherein, during execution of the instructions to restore the state of the primary storage volume to the point in time represented by the first snapshot:
in response to receiving a write request to a first address in the primary storage volume, making a determination as to whether the first address in the primary storage volume contains a recovered chunk of data,
in response to determining that the first address in the primary storage volume contains the recovered data chunk, completing the write request.
25. The system of claim 24, further comprising:
means for managing;
wherein the means for controlling data input/output operations further comprises:
means for interfacing, wherein at least the second set of execution instructions is received from the means for managing.
26. The system of claim 24, further comprising:
means for hosting data in communication with the means for controlling data input/output operations, wherein data is written to and read from the means for storing data by the means for hosting data through the means for controlling data input/output operations.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71490405P | 2005-09-06 | 2005-09-06 | |
US60/714,904 | 2005-09-06 | ||
US11/277,738 US7426618B2 (en) | 2005-09-06 | 2006-03-28 | Snapshot restore method and apparatus |
US11/277,738 | 2006-03-28 | ||
PCT/US2006/032506 WO2007030304A2 (en) | 2005-09-06 | 2006-08-18 | Snapshot restore method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
HK1121839A1 HK1121839A1 (en) | 2009-04-30 |
HK1121839B true HK1121839B (en) | 2011-08-26 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1922624B1 (en) | Snapshot restore method and apparatus | |
US7831565B2 (en) | Deletion of rollback snapshot partition | |
US7975115B2 (en) | Method and apparatus for separating snapshot preserved and write data | |
US7593973B2 (en) | Method and apparatus for transferring snapshot data | |
US7783850B2 (en) | Method and apparatus for master volume access during volume copy | |
US7716183B2 (en) | Snapshot preserved data cloning | |
US10346253B2 (en) | Threshold based incremental flashcopy backup of a raid protected array | |
US8001345B2 (en) | Automatic triggering of backing store re-initialization | |
EP2391968B1 (en) | System and method for secure and reliable multi-cloud data replication | |
US7783603B2 (en) | Backing store re-initialization method and apparatus | |
US8117410B2 (en) | Tracking block-level changes using snapshots | |
US8751467B2 (en) | Method and apparatus for quickly accessing backing store metadata | |
US9600375B2 (en) | Synchronized flashcopy backup restore of a RAID protected array | |
US8200631B2 (en) | Snapshot reset method and apparatus | |
US7743224B2 (en) | Method and apparatus for virtual load regions in storage system controllers | |
US7185048B2 (en) | Backup processing method | |
US20090300303A1 (en) | Ranking and Prioritizing Point in Time Snapshots | |
WO2012049036A1 (en) | Multiple incremental virtual copies | |
KR20060007435A (en) | Manage relationships between one target volume and one source volume | |
HK1121839B (en) | Snapshot restore method and apparatus |