CN1784676B

CN1784676B - Database data recovery system and method

Info

Publication number: CN1784676B
Application number: CN200480001706.4A
Authority: CN
Inventors: M·J·兹威林; G·A·史密斯; R·B·拉扬; J·库勒斯扎; P·拜恩; S·B·坎达尔瓦尔; M·S·威斯托姆
Original assignee: Microsoft Corp
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2004-02-25
Filing date: 2004-07-27
Publication date: 2012-05-30
Anticipated expiration: 2024-07-27
Also published as: ATE463797T1; DE602004026422D1; CN1784676A

Abstract

The present invention relates to systems and methods for data recovery, for example, after the occurrence of a user error. In particular, a snapshot database may be maintained that stores a copy of the database data. The snapshot database does not have to store a complete copy of all the data on the source database, but it shares data that is common to both, but not necessarily identical. If an error occurs on the primary database, the database may be restored to a point in time prior to the error by replacing the source database file with the snapshot file. Additionally, an undo component may be used in conjunction with the snapshot to approximate the error to a finer grained point in time. In short, the present invention can restore a database faster and more simply while using less space and resources than conventional data recovery techniques.

Description

Database data recovery system and method

相关申请参照Related application reference

本发明要求以下申请的优先权：美国临时申请第60/547,641号，其提交于2004年2月25日，标题为DATABASE DATA RECOVERY SYSTEM AND METHOD(数据库数据恢复系统和方法)；美国非临时申请第10/833,541号，其提交于2004年4月28日，标题为DATABASE DATA RECOVERY SYSTEM AND METHOD(数据库数据恢复系统和方法)，这些申请作为参考结合于此。此外，本发明是美国申请第10/611,774号的部分延续，该申请的标题为TRANSACTION CONSISTENTCOPY-ON-WRITE DATABASE(事务一致写复制数据库)，其提交于2003年6月30日，也作为参考结合于此。This application claims priority to the following applications: U.S. Provisional Application No. 60/547,641, filed February 25, 2004, and entitled DATABASE DATA RECOVERY SYSTEM AND METHOD; U.S. Nonprovisional Application No. 10/833,541, filed April 28, 2004, and entitled DATABASE DATA RECOVERY SYSTEM AND METHOD, which applications are hereby incorporated by reference. Additionally, this invention is a continuation-in-part of U.S. Application Serial No. 10/611,774, entitled TRANSACTION CONSISTENT COPY-ON-WRITE DATABASE, filed June 30, 2003, also incorporated by reference here.

技术领域 technical field

本发明一般涉及数据库，尤其涉及数据库恢复技术。The present invention generally relates to databases, and more particularly to database restoration techniques.

背景background

在如今的世界上，数据库相当流行。数据库可通过便于迅速查询和容易使用的方式来储存大量的信息。例如，在传统关系型数据库中，信息可被组织为对象，如记录、表格和索引。数据库引擎提供了一种在用户指定了查询之后从数据库表格中检索并操纵数据的机制。查询通常是以某种查询语言来表达的，如结构化查询语言(SQL)。查询可指定一个或多个表格，以及其中要被检索或被操纵的行和列。通过适当地指定查询，数据库引擎检索数据、执行任何指定的操作并产生结果表格。数据库是受欢迎的和有用的，这至少部分是因为其储存大量数据的能力，这样的数据可通过简单地指定查询来有效地检索并操纵。Databases are quite popular in today's world. Databases can store large amounts of information in a manner that facilitates rapid query and ease of use. For example, in a traditional relational database, information can be organized into objects such as records, tables, and indexes. A database engine provides a mechanism for retrieving and manipulating data from database tables after a user specifies a query. Queries are usually expressed in some query language, such as Structured Query Language (SQL). A query specifies one or more tables, along with the rows and columns to be retrieved or manipulated. By specifying the query appropriately, the database engine retrieves the data, performs any specified operations, and produces the resulting tables. Databases are popular and useful at least in part because of their ability to store large amounts of data that can be efficiently retrieved and manipulated by simply specifying queries.

令人遗憾的是，用户错误是数据库系统中的常见问题。通常，这样的错误在数据库应用程序或用户错误地改变或删除数据，且数据库系统正确地遵循命令并迅速地改变或删除数据时发生。这在本领域中被称为快手指删除问题(quick fingerdelete problem)。例如，用户可能发出删除表格的命令，但忘记指定“WHERE”子句，这就导致比所预期的删除了更多数据。另外，用户可能安装新应用程序，而该新的应用程序以用户未知的方式修改数据库。对这个问题有几种传统解决方案。一般而言，最普通的解决方案是将数据全恢复到用户错误出现之前的时间点。一旦数据库被恢复，其可能成为在线状态，并且所有的变化，包括用户错误，均丢失了。然而，全数据库恢复是时间密集型的，有时候可能要花费数天来完成。Unfortunately, user error is a common problem in database systems. Typically, such errors occur when a database application or user mistakenly changes or deletes data, and the database system correctly follows orders and changes or deletes data promptly. This is known in the art as the quick finger delete problem. For example, a user might issue a command to delete a table, but forget to specify a "WHERE" clause, resulting in more data being deleted than expected. Additionally, a user may install a new application that modifies the database in ways unknown to the user. There are several traditional solutions to this problem. Generally speaking, the most common solution is to fully restore the data to a point in time before the user error occurred. Once the database is restored, it may go online and all changes, including user errors, are lost. However, full database restores are time-intensive and can sometimes take days to complete.

另外一种方法是，被无心修改的数据可通过将相关信息从恢复数据库中提取出来并将其合并回到原始数据库中来补救。有关这种方案的一种变化被称作日志运送(log shipping)。Alternatively, data that has been inadvertently modified can be remedied by extracting relevant information from the recovery database and merging it back into the original database. A variation on this scheme is called log shipping.

日志运送涉及以恢复状态保持另一次要服务器上的数据库的副本，但在原始服务器之后有一个恒定延迟。日志备份仅在延迟(如24小时)之后被应用于次要数据库上。若用户错误出现在原始数据库中，且数据库管理员在延迟时间内注意到该错误，则数据库管理员可还原到次要服务器，这是因为它已经包含了在该错误之前的时间点的数据库。令人遗憾的是，日志运送是复杂的，其需要许多附加资源和空间。Log shipping involves keeping a copy of the database on another secondary server in recovery state, but with a constant delay after the original server. Log backups are only applied to the secondary database after a delay (such as 24 hours). If a user error occurs in the original database, and the DBA notices the error within the delay time, the DBA can restore to the secondary server because it already contains the database from a point in time before the error. Unfortunately, log shipping is complex, requiring many additional resources and space.

使用传统的系统和方法恢复数据库需要相当长的延迟，因此它通常是最后采用的选项。此外，日志运送需要附加硬件，这就增加了数据库系统的复杂度。减少错误是数据库系统中重大且重要的问题。因此，本领域内需要一种除其它特征之外即快速又有效的恢复数据库的新系统和方法。Restoring a database using traditional systems and methods requires considerable latency, so it is usually an option of last resort. In addition, log shipping requires additional hardware, which increases the complexity of the database system. Reducing errors is a big and important problem in database systems. Accordingly, there is a need in the art for a new system and method for, among other features, fast and efficient restoration of a database.

概述overview

下面提供本发明的简化概述，以提供对本发明各方面的基本了解。这个概述不是本发明的详尽纵览。它并非要标识本发明的关键/决定性元素，或者描述本发明的范围。其唯一目的是以简单的形式提出本发明的某些概念，作为以后给出的更详细说明的序言。The following provides a simplified summary of the invention in order to provide a basic understanding of various aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.

本发明涉及数据库快照的创建和采用。本发明减轻了与恢复数据库相关联的问题，恢复数据库将花费时间并通常是最后采用的选项；以及与日志运送相关联的问题，日志运送通常需要附加硬件，从而增加了数据库系统的复杂度。还原到数据库快照缓解了某些这样的问题。根据本发明的一方面，数据库快照(DBSS)是看起来像另一(源)数据库的只读、间点副本的数据库。DBSS不必是源数据库的完整副本；这两个数据库共享对两者公用的数据，这就使DBSS能够被快速地创建并使空间有效。当源数据库被修改时，原始数据被复制到空间有效存储中，以供DBSS用来维持其源数据库的时间视图中的点。若用户错误发生在源数据库上，且DBSS已经在错误之前被创建了，则数据库管理员具有将整个数据库还原到DBSS的选项，这在用户错误之前的时间点处。对源数据库的所有改变，包括用户错误，都丢失了。而且，该还原一般比正常恢复快得多，且不需要日志运送所需要的重复资源。The present invention relates to the creation and utilization of database snapshots. The present invention alleviates the problems associated with restoring a database, which takes time and is usually an option of last resort, and with log shipping, which often requires additional hardware, adding to the complexity of the database system. Reverting to a database snapshot alleviates some of these problems. According to an aspect of the invention, a database snapshot (DBSS) is a database that looks like a read-only, point-in-time copy of another (source) database. The DBSS does not have to be a complete copy of the source database; the two databases share data that is common to both, which allows the DBSS to be created quickly and space efficiently. When the source database is modified, the original data is copied to space-efficient storage for use by the DBSS to maintain its point-in-time view of the source database. If the user error occurred on the source database, and the DBSS had been created before the error, the database administrator has the option of restoring the entire database to the DBSS, which was at the point in time before the user error. All changes to the source database, including user errors, are lost. Also, this restore is typically much faster than normal recovery and does not require the duplication of resources required for log shipping.

根据本发明的一方面，还原到数据库快照可包含在源或主要数据库上复制数据库快照文件页、截短主要数据库、将打开的未提交事务应用于数据库、以及使用数据库日志来集中于诸如用户错误等事件上。According to an aspect of the invention, reverting to a database snapshot may include copying the database snapshot file pages on the source or primary database, truncating the primary database, applying open uncommitted transactions to the database, and using the database log to focus on issues such as user error Wait for the event.

依照本发明的一方面，用户或数据库管理员可在不同的时间点处创建一个或多个数据库快照。例如，若用户要执行测试，则他/她可创建数据库快照，以使能还原到前一个数据库状态或视图。然而，根据本发明的另一个方面，一监视器组件可用来监视主要数据库，并在特定事件发生时自动创建数据库快照。例如，若监视器检测到或者能够推断新的应用程序将要被安装，则其可启动数据库快照的创建，以在新应用程序改变之前保存数据库的状态。According to an aspect of the invention, a user or database administrator can create one or more database snapshots at different points in time. For example, if a user wants to perform a test, he/she can create a database snapshot to enable reverting to a previous database state or view. However, according to another aspect of the present invention, a monitor component can be used to monitor the primary database and automatically create database snapshots when certain events occur. For example, if the monitor detects or can infer that a new application is about to be installed, it can initiate the creation of a database snapshot to preserve the state of the database before the new application changes.

根据本发明的又一方面，镜像数据库可被自动更新，以反映在还原时对主要数据库做出的改变。因此，镜像数据库可被更新并同步，而不使用其传统上使用的冗长的全恢复。According to yet another aspect of the present invention, the mirror database can be automatically updated to reflect changes made to the primary database while restoring. Thus, mirrored databases can be updated and synchronized without the lengthy full recovery that has traditionally been used.

为实现前述和相关的目的，这里结合下面的描述和附图，描述了本发明的某些说明性方面。这些方面表明可实施本发明的不同方法，所有这些方法都旨在由本发明所覆盖。当结合附图考虑时，可从下面本发明的详细描述中清楚本发明的其它优点和新颖特征。To the accomplishment of the foregoing and related ends, certain illustrative aspects of the invention are described herein, taken in conjunction with the following description and accompanying drawings. These aspects indicate different ways in which the invention can be practiced, all of which are intended to be covered by the invention. Other advantages and novel features of the invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.

附图的简要说明Brief description of the drawings

从下面的详细描述和下文简要描述的附图中，本发明的前述和其它方面将变得显而易见。The foregoing and other aspects of the invention will become apparent from the following detailed description and the accompanying drawings briefly described below.

图1是依照本发明一方面的数据恢复系统的原理框图。FIG. 1 is a functional block diagram of a data recovery system in accordance with one aspect of the present invention.

图2是依照本发明一方面的恢复组件的原理框图。Figure 2 is a functional block diagram of a recovery component in accordance with an aspect of the invention.

图3示出了一时间线简图，以示出本发明的各方面。Figure 3 shows a simplified timeline diagram to illustrate aspects of the invention.

图4是依照本发明一方面的数据库快照系统的原理框图。FIG. 4 is a functional block diagram of a database snapshot system according to one aspect of the present invention.

图5是依照本发明一方面示出示例性数据库恢复的图示。FIG. 5 is a diagram illustrating exemplary database recovery in accordance with an aspect of the present invention.

图6是依照本发明一方面的示例性数据库镜像系统的原理框图。FIG. 6 is a functional block diagram of an exemplary database mirroring system in accordance with an aspect of the present invention.

图7是依照本发明一方面的示例性镜像系统的原理框图。7 is a functional block diagram of an exemplary mirroring system in accordance with an aspect of the present invention.

图8是依照本发明一方面的主要数据库的原理框图。Figure 8 is a functional block diagram of a primary database in accordance with an aspect of the present invention.

图9是依照本发明一方面的数据库快照系统的原理框图。FIG. 9 is a functional block diagram of a database snapshot system according to an aspect of the present invention.

图10是依照本发明一方面的示例性事务日志的原理框图。10 is a functional block diagram of an exemplary transaction log in accordance with an aspect of the present invention.

图11是依照本发明一方面描述建立快照数据库的方法的流程图。FIG. 11 is a flowchart describing a method for establishing a snapshot database according to an aspect of the present invention.

图12是依照本发明一方面示出恢复方法的流程图。FIG. 12 is a flowchart illustrating a recovery method in accordance with an aspect of the present invention.

图13是依照本发明一方面的数据恢复方法的流程图。FIG. 13 is a flowchart of a data recovery method according to an aspect of the present invention.

图14是依照本发明一方面示出合适的操作环境的原理框图。14 is a functional block diagram illustrating a suitable operating environment in accordance with one aspect of the present invention.

图15是可与本发明交互的示例计算环境的原理框图。15 is a functional block diagram of an example computing environment that can interact with the present invention.

详细描述A detailed description

现在参考附图来描述本发明，贯穿附图，相同的数字表示相同的元素。然而，应当理解的是，附图和详细描述并非要将本发明限于所揭示的具体形式。相反，本发明旨在覆盖落入本发明的精神和范围内的所有修改、等价方案以及替换方案。The present invention is now described with reference to the drawings, wherein like numerals refer to like elements throughout. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the precise form disclosed. On the contrary, the invention is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

如在本申请中所使用的，术语“组件”和“系统”是指计算机相关实体，其或者是硬件、硬件和软件的组合、软件、或者是执行中的软件。例如，组件可以是，但不局限于，在处理器上运行的进程、处理器、对象、可执行码、执行线程、程序和/或计算机。作为说明，在服务器上运行的应用程序和服务器均可以是组件。一个或多个组件可驻留在进程和/或执行线程中，且组件可位于一台计算机中和/或分布在两台或多台计算机之间。As used in this application, the terms "component" and "system" refer to a computer-related entity, which is either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. As an illustration, both an application running on a server and a server can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.

此外，本发明可以被实现为方法、装置或者制造物品，其使用标准编程和/或工程技术来产生软件、固件、硬件或者任何其任一组合。这里使用的术语“制造物品”(或称“计算机程序产品”)旨在包含可从任何计算机可读设备、载波或介质访问的计算机程序。例如，计算机可读介质可包括，但不局限于，磁存储设备(如硬盘、软盘、磁条......)、光盘(如压缩盘(CD)、数字多功能盘(DVD)......)、智能卡以及闪存设备(如卡、棒)。当然，本领域的技术人员会认识到，在不背离本发明的范围和精神的前提下，可对本配置做出许多修改。Furthermore, the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term "article of manufacture" (or "computer program product") as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media may include, but are not limited to, magnetic storage devices (such as hard disks, floppy disks, magnetic stripes, ...), optical disks (such as compact disks (CDs), digital versatile disks (DVDs). ...), smart cards, and flash memory devices (e.g. cards, sticks). Of course, those skilled in the art will recognize many modifications can be made in this configuration without departing from the scope and spirit of the invention.

恢复系统restoration system

首先看图1，示出了数据恢复系统100。数据恢复系统100包含源数据库110、快照组件120、快照数据库130以及恢复组件140。源数据库110(以下也称为主要数据库)以经组织的方式包含大量的数据，以便于查询和其它用途。数据库110可以是任何种类的数据库，包括但不局限于，关系型或多维数据库。快照组件120部分地基于源数据库而生成快照数据库130(这里也将数据库快照称为DBSS)。快照数据库130允许用户创建现有源数据库110的事务一致视图，而不作出它的完整副本。当源数据库110从快照数据库130中分叉时，快照组件120确保快照数据库130包含在其被修改之前的数据副本，例如，以页为单位。换言之，若源数据库页包含字母“A”，且执行导致“A”改变为“B”的事务，则可用其来写对应的快照数据库页，并因而储存字母“A”。根据本发明的一方面，快照数据库130可以是源数据库110的只读时间点副本。DBSS不必要是源数据库110的完整副本。这两个数据库可共享对两者公用的数据，这就也使DBSS能够被快速创建并使空间有效。当源数据库110被修改时，原始数据可被复制到空间有效存储器，以供DBSS用来维持其源数据库110的时间视图中的点。此外，应当理解，可以有与源相关联的一个以上快照数据库130，以提供多个还原点。另外，快照数据库130可以是瞬时或持久的。瞬时快照是内部易失性副本，它在崩溃、故障或关机之后被删除。持久快照是公共副本，它可被更安全地保存在存储设备中，以供其它应用程序使用。Referring first to FIG. 1 , a data recovery system 100 is shown. The data recovery system 100 includes a source database 110 , a snapshot component 120 , a snapshot database 130 and a recovery component 140 . The source database 110 (hereinafter also referred to as the primary database) contains large volumes of data in an organized fashion for ease of query and other purposes. Database 110 may be any kind of database, including but not limited to, relational or multidimensional databases. The snapshot component 120 generates a snapshot database 130 (also referred to herein as a DBSS) based in part on the source database. Snapshot database 130 allows a user to create a transactionally consistent view of an existing source database 110 without making a full copy of it. When the source database 110 forks from the snapshot database 130, the snapshot component 120 ensures that the snapshot database 130 contains a copy of the data before it was modified, eg, in units of pages. In other words, if a source database page contains the letter "A," and a transaction is executed that causes "A" to change to "B," it can be used to write the corresponding snapshot database page, and thus store the letter "A." According to an aspect of the invention, snapshot database 130 may be a read-only point-in-time copy of source database 110 . The DBSS is not necessarily a complete copy of the source database 110 . The two databases can share data that is common to both, which also enables the DBSS to be created quickly and space efficiently. When the source database 110 is modified, the original data can be copied to space efficient storage for use by the DBSS to maintain its point in time view of the source database 110 . Furthermore, it should be understood that there may be more than one snapshot database 130 associated with a source to provide multiple restore points. Additionally, snapshot database 130 may be transient or persistent. A transient snapshot is an internal volatile copy that is deleted after a crash, failure, or shutdown. Persistent snapshots are public copies that are more securely kept on storage for use by other applications.

恢复组件140利用快照数据库130将源数据库110恢复到在事件之前的时间点。依照本发明的一方面，该事件可对应于用户错误，如快手指删除，即用户意外地将数据从源表格中删除。或者，事件可对应于系统崩溃、死锁或其中数据丢失或被操纵的任何其它情形中。若事件发生在源数据库110上，且快照数据库130已经在错误之前被创建了，则数据库管理员具有使用恢复组件140来将整个源数据库110恢复回到事件之前的时间点处的快照数据库130的选项。恢复组件140可利用驻留在快照数据库130中的数据来将源数据库110恢复到事件之前的先前的时间点处。例如，可用快照数据库数据在对应位置处重写当前源数据库值。或者，快照数据库130可用共享数据来填充，并变成新的源数据库。应当认识到，这个恢复过程通常比传统的恢复技术快得多，这是因为仅需要将稀疏文件(sparse file)复制到源数据库，而不是全部恢复。此外，由于恢复过程不需要如日志运送所需要的重复资源，所以它是更加有效的。Restore component 140 utilizes snapshot database 130 to restore source database 110 to a point in time prior to the event. According to an aspect of the present invention, the event may correspond to a user error, such as a quick finger delete, that is, the user accidentally deletes data from the source table. Alternatively, an event may correspond to a system crash, deadlock, or any other situation where data is lost or manipulated. If the event occurs on the source database 110, and the snapshot database 130 has been created before the error, then the database administrator has the ability to use the restore component 140 to restore the entire source database 110 back to the snapshot database 130 at the point in time before the event option. Restoration component 140 can utilize data residing in snapshot database 130 to restore source database 110 to a previous point in time prior to the event. For example, snapshot database data may be used to overwrite current source database values at corresponding locations. Alternatively, snapshot database 130 can be populated with shared data and become the new source database. It should be appreciated that this recovery process is typically much faster than traditional recovery techniques because only sparse files need to be copied to the source database rather than full recovery. Furthermore, the recovery process is more efficient because it does not require duplication of resources as required by log shipping.

图2依照本发明的一方面描述了恢复组件140。特别地，恢复组件包含还原组件210和撤销组件210。还原组件210提供了依照本发明的恢复中的主要功能，而撤销组件220便于细粒度的恢复，其集中于诸如错误等事件上。若数据库快照已经在事发生现之前被创建，则还原组件210可利用数据库快照中的稀疏文件将源数据库恢复到事件之前的时间。在快照创建之后被改变的数据文件或页可通过将储存在快照数据库中的旧值复制到源数据库来返回到快照时间处的状态。这就导致例如从快照被创建的时间直到还原操作的数据丢失，包括错误。例如，若源数据库包含快照数据库被创建时的“A”、“B”和“C”，且随后“B”被改变为“D”，则快照数据库可包含“B”，且源数据库可包含“A”、“D”、“B”。还原组件然后可通过简单在将“B”复制到“D”上而将数据库恢复到快照时间处的值。然而，在数据库快照被创建时，可能存在尚未提交的进行中的事务。这些事务不被数据库快照捕捉。因此，在源数据库被恢复到数据库快照之后，那些事务丢失了，且作为事务结果出现的操作(如插入、更新、删除)将不被执行。FIG. 2 depicts recovery component 140 in accordance with an aspect of the invention. In particular, the restore component includes a restore component 210 and an undo component 210 . The restore component 210 provides the main functionality in recovery according to the present invention, while the undo component 220 facilitates fine-grained recovery, which focuses on events such as errors. If the database snapshot has been created before the incident, the restore component 210 can utilize the sparse files in the database snapshot to restore the source database to the time before the incident. Data files or pages that were changed after the snapshot was created can be returned to the state at the time of the snapshot by copying the old values stored in the snapshot database to the source database. This results in loss of data, including errors, for example from the time the snapshot was created until the restore operation. For example, if the source database contains "A", "B" and "C" when the snapshot database was created, and then "B" is changed to "D", the snapshot database can contain "B" and the source database can contain "A", "D", "B". The restore component can then restore the database to the value at the time of the snapshot by simply copying "B" onto "D". However, there may be in-flight transactions that have not been committed when the database snapshot is created. These transactions are not captured by database snapshots. Therefore, after the source database is restored to the database snapshot, those transactions are lost, and operations (such as inserts, updates, deletes) that occurred as a result of the transactions will not be performed.

撤销组件220可用于补偿在还原时的这个不准确的表示等等。例如，撤销组件220可储存快照数据库创建时的所有打开的事务，这包括所有在快照创建之前开始并在其后终止的事务。这些被储存的事务可随后用于卷进(roll forward)恢复的主要数据库，以捕捉打开的事务，并集中于恢复事件。此外，撤销组件220可利用以周期间隔或根据管理员命令来捕捉数据库变化的传统数据库日志，以更接近地集中于诸如错误等事件，从而最小化了“良好”事务的丢失。因此，本发明至少促进了数据库恢复，并可利用较少的系统资源但以快得多的速度来常规地完成。Undo component 220 can be used to compensate for this inaccurate representation upon restoration, among other things. For example, undo component 220 can store all open transactions at the time of snapshot database creation, including all transactions that started before snapshot creation and terminated thereafter. These stored transactions can then be used to roll forward the recovered primary database to capture open transactions and focus recovery events. Additionally, the undo component 220 can utilize traditional database logs that capture database changes at periodic intervals or at administrator command to more closely focus on events such as errors, thereby minimizing the loss of "good" transactions. Thus, the present invention at least facilitates database recovery and can be done routinely with less system resources but at a much faster rate.

转到图3，依照本发明的一方面示出了时间线300，以进一步提供有关数据库恢复操作的清晰性。时间在时间线300上从左到右前进。换言之，定位在较远右边的事件在时间上发生在定位在较远左边的事件之后。在310处，创建数据库快照(DBSS)。一段时间过去(分钟、小时、天......)且事件发生在320处。例如，事件可对应于用户意外删除完整的表格或者修改页数据。其后，例如，可由数据库管理员启动恢复操作。因此，本发明的还原组件210(图2)可用来将数据库恢复到创建数据库快照的时间点310。根据本发明的一方面，这可通过将稀疏文件数据从快照数据库复制到源数据库中的对应数据，从而将源数据库置于缺少320处的事件效果的一致稳定状态来实现。撤销组件220(图2)然后可用来将源数据库前进到事件320，以维持“良好”数据，同时丢失或改变由事件所引发的数据。这可通过将储存在撤销文件中的打开的事务应用于所恢复的数据库和/或数据库日志文件而实现。Turning to FIG. 3, a timeline 300 is shown to provide further clarity regarding database recovery operations in accordance with an aspect of the present invention. Time advances from left to right on timeline 300 . In other words, events positioned farther to the right occur in time after events positioned farther to the left. At 310, a database snapshot (DBSS) is created. A period of time passes (minutes, hours, days...) and an event occurs at 320 . For example, an event may correspond to a user accidentally deleting a complete form or modifying page data. Thereafter, recovery operations can be initiated, for example, by a database administrator. Accordingly, the restore component 210 (FIG. 2) of the present invention can be used to restore the database to the point in time 310 when the database snapshot was created. According to an aspect of the invention, this may be accomplished by copying sparse file data from the snapshot database to corresponding data in the source database, thereby placing the source database in a consistent steady state lacking the effects of the event at 320 . Undo component 220 (FIG. 2) can then be used to advance the source database to event 320 to maintain "good" data while losing or changing data caused by the event. This can be accomplished by applying open transactions stored in undo files to the restored database and/or database log files.

图4依照本发明的一方面示出数据库快照系统400。DBSS系统400包含快照组件120、源数据库110、快照数据库130、目录组件410以及监视器组件420。如前面提到的源数据库110是寻求恢复的主或主要数据库。快照组件120使用源数据库110来生成数据库快照130。数据库快照(DBSS)130可以是只读的时间点数据库。它是空间有效的，这是因为其与源数据库130共享空间。该共享空间是在两个数据库中相同的所有数据页。根据一个示例性实现，DBSS可用下面的句法来创建：FIG. 4 illustrates a database snapshot system 400 in accordance with an aspect of the invention. DBSS system 400 includes snapshot component 120 , source database 110 , snapshot database 130 , catalog component 410 and monitor component 420 . The source database 110, as mentioned earlier, is the primary or primary database for which recovery is sought. The snapshot component 120 uses the source database 110 to generate a database snapshot 130 . Database snapshot (DBSS) 130 may be a read-only point-in-time database. It is space efficient because it shares space with the source database 130 . This shared space is the same for all data pages in both databases. According to an exemplary implementation, a DBSS may be created with the following syntax:

CREATE DATABASE ss_database_nameCREATE DATABASE ss_database_name

ON<filespec>[，...n]ON <filespec>[,...n]

AS SNAPSHOT OF source_database_name对源数据库110中的每个数据文件，创建用于DBSS 130的另一文件。这些新文件是稀疏文件。当页在源数据库110中被改变时，它们被首先复制到稀疏文件中。目录组件410可用来生成日志或日志文件，以跟踪DBSS 130中的页是与源数据库110共享，还是页已经被复制到稀疏文件中。根据本发明的一方面，日志被储存在源数据库110中，以便于对其容易访问。AS SNAPSHOT OF source_database_name For each data file in source database 110, another file for DBSS 130 is created. These new files are sparse files. When pages are changed in the source database 110, they are first copied into sparse files. Catalog component 410 may be used to generate a log or log file to track whether pages in DBSS 130 are shared with source database 110, or whether pages have been copied into sparse files. According to an aspect of the invention, the logs are stored in the source database 110 for easy access thereto.

应当认识到，恢复在其可读之前可在DBSS 130上运行，以将其带回到一致状态。当前打开的事务被卷回(roll back)，且某些原始页将很有可能由于卷回而被复制到稀疏文件中。根据本发明的一方面，DBSS没有日志。一旦创建了DBSS 130，则DBSS可保持存在，直到它被数据库管理员丢弃(若其持久保存的话)，否则它可在出错或系统关机时被丢弃。It should be appreciated that recovery can be run on the DBSS 130 before it is readable, to bring it back to a consistent state. The currently open transaction is rolled back, and some of the original pages will most likely be copied to the sparse file due to the rollback. According to an aspect of the invention, DBSS has no logs. Once the DBSS 130 is created, the DBSS can remain in existence until it is discarded by the database administrator (if it is persisted), or it can be discarded on error or system shutdown.

多个DBSS 130可存在于单个源数据库110的不同时间点处。根据本发明的一方面，除正被还原或正被恢复的快照以外的所有快照都可被销毁。或者，时间上驻留在被还原的快照之后的所有快照可被销毁，而那些时间上出现较早的快照可被保存。在正被还原或者将要被还原的快照之后拍摄的源数据库快照不是非常有价值的。此外，例如，可在源数据库已经使用数据库快照恢复之后对该源数据库拍摄新快照。在用于恢复的快照之前拍摄的快照要有价值得多，因为仍旧可还原到由先前的快照所捕捉的特定时间点处。Multiple DBSSs 130 may exist at different points in time on a single source database 110. According to an aspect of the invention, all snapshots other than the one being restored or being restored can be destroyed. Alternatively, all snapshots residing in time after the snapshot being restored can be destroyed, and those snapshots occurring earlier in time can be preserved. Source database snapshots taken after the snapshot being restored or to be restored are not very valuable. Also, for example, a new snapshot may be taken of the source database after the source database has been restored using the database snapshot. Snapshots taken prior to the snapshot used for recovery are much more valuable because it is still possible to restore to the specific point in time captured by the previous snapshot.

DBSS 130能以周期性的方式来创建和丢弃，或者若用户希望单个“安全”的时间点以允许对其还原，可用特别的方式来创建和丢弃。例如，用户可能想要在新应用程序安装或测试阶段创建快照。可利用下面的命令而发出到DBSS 130的还原：DBSS 130 can be created and dropped in a periodic manner, or can be created and dropped in an ad hoc fashion if the user desires a single "safe" point in time to allow it to be restored. For example, a user might want to create a snapshot during a new application installation or testing phase. A restore to DBSS 130 may be issued with the following command:

RESTORE DATABASE{database_name|database_name_var}FROMRESTORE DATABASE{database_name|database_name_var}FROM

DATABASE_SNAPSHOT＝<snapshotname>监视器组件420也可由快照组件120使用，以观察关于源数据库的事务，并启动数据库快照的自动生成。例如，若监视器组件420检测到或者推断可能显著改变源数据库的新应用程序的安装，则它可开始数据库快照的创建。为促进这样的功能，应当认识到，本发明可使用人工智能系统和方法，包括但不局限于，基于规则的专家系统、贝叶斯规则以及神经网络。DATABASE_SNAPSHOT=<snapshotname> Monitor component 420 may also be used by snapshot component 120 to observe transactions on the source database and initiate automatic generation of database snapshots. For example, if the monitor component 420 detects or deduces the installation of a new application that may significantly alter the source database, it can initiate the creation of a database snapshot. To facilitate such functionality, it should be appreciated that the present invention may employ artificial intelligence systems and methods, including, but not limited to, rule-based expert systems, Bayesian rules, and neural networks.

为恢复数据库，源数据库被关闭，而且对其的所有连接被关闭。因此，源在这个过程中不可用于事务。数据库可被标记为“恢复”，以通知外部观察者该数据库不可用。然后，服务器可继续将稀疏文件中的页复制到源数据库文件中的原始位置(当DBSS被创建时)。在绝大多数情况下，复制改变的页的速度比从备份中恢复要快得多。一旦页被复制，则数据库的日志可被重建。现在，源数据库返回到其在DBSS创建时的时间点处。从DBSS被创建以来的所有变化已经被消除了。若创建撤销文件，则日志链没有断裂，且可应用采自源数据库的数据日志备份，以卷进源数据库。To restore a database, the source database is shut down and all connections to it are closed. Therefore, the source is not available for transactions during this process. A database can be marked as "recovering" to notify external observers that the database is unavailable. The server can then proceed to copy the pages in the sparse file to their original locations in the source database file (when the DBSS was created). In the vast majority of cases, copying changed pages is much faster than restoring from backup. Once the pages are copied, the database's log can be rebuilt. The source database is now returned to its point in time when the DBSS was created. All changes since DBSS was created have been eliminated. If an undo file is created, the log chain is not broken and the data log backup taken from the source database can be used to roll into the source database.

支持在使用撤销文件还原之后卷进的方案可被概括如下。在创建DBSS时，由DBSS在创建时的恢复所接触的每个页的原始值可被保存为单独撤销文件中的“预图像(pre-image)”。在还原时，DBSS页就如所述地被复制到源数据库中。预图像可从单独的撤销文件中复制。在这一点上，数据库与在创建DBSS的时候完全一样。其后，数据库日志可用来将还原的DBSS卷进到正好在用户错误发生之前的时间点处，以最小化数据丢失的量。The scheme to support rolling in after restore using an undo file can be summarized as follows. Upon creation of the DBSS, the original value of each page touched by the recovery of the DBSS upon creation may be saved as a "pre-image" in a separate undo file. On restore, the DBSS pages are copied into the source database as described. Preimages can be copied from separate undo files. At this point, the database is exactly as it was when the DBSS was created. Thereafter, the database log can be used to roll the restored DBSS to a point in time just before the user error occurred to minimize the amount of data loss.

转到图5，依照本发明的一方面提供了示出示例性数据库恢复500的图示。源数据库110包括两个数据库文件510和520。数据库文件510和520都包含8个页，每个页具有不同的数据。生成源数据库的快照。从而创建数据库快照130。快照数据库可具有两个快照数据库稀疏文件530和540，其分别对应于源文件510和520。在快照起源的时候，这些源文件可简单地为外壳，因为它们与源文件510和520共享所有数据。在快照创建之后的某点处，源数据库中的值可被改变。这里，被改变的值驻留在文件510的页3和页7中，以及文件2的页4中。特别的，页3已经从“C”改变到“Z”，页7已经从“G”改变到“Y”，而且页4已经从“L”改变到“Z”。因此，改变前的原始值已经被保存到稀疏文件530和540处，这里为“C”、“G”和“L”。一旦被恢复，稀疏文件中的页，这里为来自文件530的3到7和来自文件540的页3可被复制回到源或主要数据库110中的已更新页面。Turning to FIG. 5 , a diagram illustrating an exemplary database restore 500 is provided in accordance with an aspect of the present invention. Source database 110 includes two database files 510 and 520 . Both database files 510 and 520 contain 8 pages, each with different data. Take a snapshot of the source database. A database snapshot 130 is thereby created. The snapshot database may have two snapshot database sparse files 530 and 540, which correspond to source files 510 and 520, respectively. These source files may simply be shells as they share all data with source files 510 and 520 at the time of snapshot origin. At some point after snapshot creation, values in the source database may be changed. Here, the changed values reside in pages 3 and 7 of file 510, and page 4 of file 2. In particular, page 3 has changed from "C" to "Z", page 7 has changed from "G" to "Y", and page 4 has changed from "L" to "Z". Therefore, the original values before the change have been saved to the sparse files 530 and 540, here "C", "G" and "L". Once restored, the pages in the sparse file, here pages 3 through 7 from file 530 and page 3 from file 540 may be copied back to the updated pages in the source or primary database 110 .

复制页包含几个情况，这包括简单复制回已修改的页，或者诸如添加或删除文件、将主要数据库文件增长或收缩至稀疏文件的大小、添加或删除页(即跟踪被添加到主要数据库的页，这样以使它们可在恢复时被消除)等等更复杂的情况。对文件添加和删除而言，可比较源数据库和数据库快照的文件列表并将其同步。对页添加而言，若源比快照或复制品大，则源可在末端被切去或被截短到复制品的大小，这部分是因为根据本发明的一方面，被添加的页是添加在源的末端的。若复制品较大，则这意味着页从主要源中删除。因此，源的大小可被增加，且该范围的所有页面将已经在快照中，由此将通过正常的复制操作而被复制。Copying pages covers several cases, this includes simply copying back modified pages, or things like adding or removing files, growing or shrinking the main database file to the size of the sparse file, adding or removing pages (i.e. tracking pages so that they can be erased during recovery) and more complex cases. For file additions and deletions, the file lists of the source database and the database snapshot are compared and synchronized. For page additions, if the source is larger than the snapshot or replica, the source can be cut off at the end or truncated to the size of the replica, in part because, according to an aspect of the invention, the pages being added are at the end of the source. If the replica is larger, this means that the page was removed from the primary source. Therefore, the size of the source can be increased, and all pages of the range will already be in the snapshot, and thus will be copied by the normal copy operation.

一种复制回页面的单纯算法(

algorithm)是将它们一个接一个的复制。然而，根据本发明的一方面，可使用异步复制操作实现。例如，可使用单个线程和三个队列一读队列、写队列以及空缓冲队列。在复制返回的同时，若页在源中(如，意味着它已改变)，或者它不在源中(如，意味着它被删除)，则它可被简单地复制到源。然而，若主要源中有额外页，则它们可能被添加到文件的末端，这可通过将文件截短到复制品的大小而补救。最后，源可被解锁，以完成恢复。A naive algorithm for copying back pages (

algorithm) is to copy them one by one. However, according to an aspect of the invention, an asynchronous copy operation implementation may be used. For example, a single thread and three queues—a read queue, a write queue, and an empty buffer queue—can be used. While copying back, if the page is in the source (eg, meaning it has changed), or it is not in the source (eg, meaning it was deleted), it can simply be copied to the source. However, if there are extra pages in the primary source, they may be added to the end of the file, which can be remedied by truncating the file to the size of the replica. Finally, the source can be unlocked to complete the restore.

根据本发明的另一个方面，与源数据库相关联的日志备份可在恢复之后被打破。因此，日志备份将在还原的数据库上失效，直到作出完全或文件备份。还原的数据库可保持与其被创建时所保持的相同的恢复模型。因此，本发明的系统可支持(1)启动还原；(2)操纵数据；(3)重建日志；以及(4)重启数据库。According to another aspect of the invention, log backups associated with the source database can be broken after recovery. Therefore, log backups will be invalidated on the restored database until a full or file backup is made. A restored database maintains the same recovery model as it was created. Therefore, the system of the present invention can support (1) start restore; (2) manipulate data; (3) rebuild log; and (4) restart database.

根据本发明的另一方面，新的源数据库可通过复制来自原始源数据库中的数据而从快照数据库中创建。数据库创建阶段的任何错误可能需要数据库管理员(DBA)例如一旦服务器变得可用就重启操作。一旦数据库被创建，DBA就可丢弃源数据库，并对还原的数据库进行重命名。这样的系统例如可用于数据镜像。According to another aspect of the invention, a new source database can be created from the snapshot database by copying data from the original source database. Any errors in the database creation phase may require a database administrator (DBA) to restart operations, for example, once the server becomes available. Once the database is created, the DBA can discard the source database and rename the restored database. Such a system can be used, for example, for data mirroring.

现在转到图6，依照本发明的一方面示出了示例性数据镜像系统600。系统600包含两个数据库：源数据库110和镜像数据库610，数据库快照130以及恢复组件140。源数据库110是主要数据库。镜像数据库610是单独的数据库，它几乎包含源数据库110的逐位复制品。这样，当对源数据库110做出改变时，例如可通过网络将它们发送到镜像数据库610。镜像的概念是若主要源数据库110例如由于电源故障而失效或变得不可用，则镜像数据库610可变成用于事务的新源数据库，从而通过冗余度而促进数据的高可用性。根据前述本发明的一方面，源数据库610可具有与之相关联的快照数据库快照130，以便于时间点还原。这样，数据库管理员可使用数据库快照130和恢复组件140将源数据库110还原到先前的时间点。若在源数据库110上执行还原，则镜像数据库610也应反映该变化。常规地，需要时间消耗备份和全恢复来更新并重新同步镜像数据库610。根据本发明的一方面，镜像数据库610可自动被更新并与源同步。由于对源的改变被自动反映在镜像数据库中，所以在还原阶段对源的改变也可在该还原已经在源上完成之后，同时或异步地传递到镜像中。Turning now to FIG. 6, an exemplary data mirroring system 600 is shown in accordance with an aspect of the present invention. System 600 includes two databases: source database 110 and mirror database 610 , database snapshot 130 and recovery component 140 . Source database 110 is the main database. Mirror database 610 is a separate database that contains an almost bit-by-bit replica of source database 110 . In this way, when changes are made to the source database 110, they may be sent to the mirror database 610, eg, over a network. The concept of mirroring is that if primary source database 110 fails or becomes unavailable, for example due to a power failure, mirror database 610 can become the new source database for transactions, thereby facilitating high availability of data through redundancy. According to the aforementioned aspect of the present invention, the source database 610 may have a snapshot database snapshot 130 associated therewith to facilitate point-in-time restoration. In this way, a database administrator can use database snapshot 130 and restore component 140 to restore source database 110 to a previous point in time. If the restore is performed on the source database 110, the mirror database 610 should also reflect the change. Conventionally, time consuming backups and full restores are required to update and resynchronize mirror database 610 . According to an aspect of the invention, the mirror database 610 can be automatically updated and synchronized with the source. Since changes to the source are automatically reflected in the mirror database, changes to the source during the restore phase can also be passed to the mirror concurrently or asynchronously after the restore has completed on the source.

转到图7，依照本发明的一方面示出了用于对具有快照的数据库建立镜像的示例性系统700。如图所示，系统700包括名为DB1_LS的基本或目标数据库610。应用程序620寻求从数据库610中查看数据。特别地，应用程序620可与名为DB1_001、DB1_002以及DB1_003的快照630交互。第一快照DB1_001可被创建并参照DB1，例如：Turning to FIG. 7, an exemplary system 700 for mirroring a database with snapshots is shown in accordance with an aspect of the present invention. As shown, system 700 includes a base or target database 610 named DB1_LS. Application 620 seeks to view data from database 610 . In particular, application 620 may interact with snapshots 630 named DB1_001, DB1_002, and DB1_003. A first snapshot DB1_001 can be created and referenced to DB1, for example:

CREATE DATABASE DB1_001 AS SNAPSHOT OF DB1_LSCREATE DATABASE DB1_001 AS SNAPSHOT OF DB1_LS

ON(NAME＝｀datafile′，FILENAME＝｀F：\DB1_001.SNP′)随后，第二快照DB1_002可被创建。仍旧使用DB1_001的用户继续使用它：ON(NAME='datafile', FILENAME='F:\DB1_001.SNP') Subsequently, a second snapshot DB1_002 can be created. Users who still use DB1_001 continue to use it:

CREATE DATABASE DB1_002 AS SNAPSHOT OF DB1_LSCREATE DATABASE DB1_002 AS SNAPSHOT OF DB1_LS

ON(NAME＝｀datafile′，FILENAME＝｀F：\DB1_002.SNP′)其后，第三快照可被创建，并做出对DB1的参照。仍旧使用DB1_001或DB1_002的用户继续使用它们：ON(NAME='datafile', FILENAME='F:\DB1_002.SNP') Thereafter, a third snapshot can be created and a reference to DB1 made. Users who still use DB1_001 or DB1_002 continue to use them:

CREATE DATABASE DB1_003 AS SNAPSHOT OF DB1_LSCREATE DATABASE DB1_003 AS SNAPSHOT OF DB1_LS

ON(NAME＝｀datafile′，FILENAME＝｀F：\DB1_003.SNP′)ON(NAME='datafile', FILENAME='F:\DB1_003.SNP')

另外，根据本发明的一方面，可使用数据库快照来对数据库进行一致性核查。例如，DBCC CHECKDB()命令可在数据库上执行。结果，具有备用存储的数据库的内部快照可在现有数据库文件的交替流中被创建。然后，可读取页以执行一致性核查，且系统可从基本数据库中读取它们(若它们还没有被修改)或者从交替流中读取(若它们已经被修改)。In addition, according to an aspect of the present invention, the database snapshot can be used to check the consistency of the database. For example, the DBCC CHECKDB() command can be executed on a database. As a result, internal snapshots of the database with alternate storage can be created in alternate streams of existing database files. The pages can then be read to perform a consistency check, and the system can read them from the base database (if they have not been modified) or from the alternate stream (if they have been modified).

也应当认识到的是，尽管没有示出，但是本发明可使用一个或多个图形用户界面(GUI)。GUI可用来支持快照管理。例如，包括但不限于文本框、按钮、选项卡、下拉菜单和滚动条的多个文本和图形组件可用来创建快照并随后将数据库还原到该处。It should also be appreciated that, although not shown, the invention may utilize one or more graphical user interfaces (GUIs). A GUI can be used to support snapshot management. For example, a number of text and graphics components including, but not limited to, text boxes, buttons, tabs, drop-down menus, and scroll bars can be used to create a snapshot and then restore the database thereto.

此外，在传统的意义上，数据库快照和与其相关联的源数据库都可被备份。例如，管理员可备份来自快照的单独文件或文件组。当备份被恢复时，它们被恢复为常规的数据库。在进行源的备份操作时，用户可指定哪个快照要用于备份。在恢复时，用户也可指定哪个快照要被恢复。Also, both a database snapshot and its associated source database can be backed up in the traditional sense. For example, administrators can back up individual files or groups of files from snapshots. When backups are restored, they are restored as regular databases. When performing a backup operation on the source, the user can specify which snapshot is to be used for the backup. When restoring, the user can also specify which snapshot is to be restored.

数据库快照(也称为写复制(Copy-On-Write)数据库)Database snapshots (also known as Copy-On-Write databases)

一般而言，数据库包含两种类型的文件：数据文件和日志文件。日志文件包含一系列日志记录。日志记录可用日志序列号(LSN)来标识。如图8中所描述的，根据本发明的一方面，主要数据库800包括一组数据文件802和日志文件810。数据文件802可被划分成被称作页的存储块或单元。In general, a database contains two types of files: data files and log files. A log file contains a series of log records. Log records can be identified by a Log Sequence Number (LSN). As depicted in FIG. 8, primary database 800 includes a set of data files 802 and log files 810, according to an aspect of the invention. Data file 802 may be divided into blocks or units of storage called pages.

用于数据库的数据库快照可被创建，它提供了先前的时间点处的现有数据库的事务一致视图，而不创建该数据库的完整副本。与数据库结合的数据库快照包括产生前一时刻的数据库副本的信息所需的所有信息。然而，数据库快照本身并不包含所有的信息，由此其大小可比全副本小。另外，快照可在对数据库做出修改时匆忙地(on the fly)创建，这允许成本(时间和处理)随时间展开。若在先前时刻对数据库快照进行复制，则时间和处理成本会同时集中。另外，数据库快照有这样的优点，即它们可在更新活动在数据库上继续时被创建。主要数据库是被使用并且其一个或多个数据库快照被创建的数据库。A database snapshot for a database can be created, which provides a transactionally consistent view of an existing database at a previous point in time without creating a full copy of the database. A database snapshot, combined with a database, includes all the information needed to produce the information of a copy of the database at a previous moment. However, a database snapshot itself does not contain all the information, so its size can be smaller than a full copy. Additionally, snapshots can be created on the fly as modifications are made to the database, which allows the cost (time and processing) to spread over time. If the database snapshot was replicated at a previous point in time, both time and processing costs would be intensive. Additionally, database snapshots have the advantage that they can be created while update activity continues on the database. The primary database is the database that is used and one or more database snapshots of which are created.

如上述，数据库快照包含所有需要的信息，以及主要数据库，其确定前一时刻主要数据库的内容。数据库快照可包含对应于主要数据库中的每一文件的副文件(side file)。副文件包含来自对应数据文件的所有数据的副本，该数据文件自从创建了数据库快照以来已经被改变。在本发明的一个方面中，为避免将表格从副文件中的页映射到主要文件中的页的必要，副文件被储存在稀疏空间中。在稀疏文件中，仅文件的实际被写入的部分需要存储空间。该文件的所有其它区域可以是未分配的。然而，应当注意到，在稀疏文件中储存副文件不是本发明所必需的，而且在本发明的范围内可考虑其它存储系统和方法。As mentioned above, the database snapshot contains all the required information, as well as the main database, which determines the content of the main database at the previous moment. A database snapshot may contain side files corresponding to each file in the primary database. A sidefile contains a copy of all data from the corresponding datafile that has been changed since the database snapshot was created. In one aspect of the invention, to avoid the need to map tables from pages in the side file to pages in the primary file, the side file is stored in sparse space. In a sparse file, only the portion of the file that is actually written requires storage space. All other areas of the file may be unallocated. It should be noted, however, that storing side files in sparse files is not required by the present invention, and other storage systems and methods are contemplated within the scope of the present invention.

根据本发明的另一个方面，稀疏文件机制以标准区域大小工作。若一个区域内的数据被写到稀疏文件中，则即使数据不填满整个区域，用于整个区域的空间也可被分配。因为这个空间被分配，而且可从中读取，所以可在该区域中用有效数据填充的范围和该区域的范围之间作出区分，其存在的原因是稀疏文件的粒度需要特定大小的区域被分配(若该区域内的任何存储是必需的话)。According to another aspect of the invention, the sparse file mechanism works with standard region sizes. If data within an area is written to a sparse file, space for the entire area can be allocated even if the data does not fill the entire area. Since this space is allocated and readable from it, a distinction can be made between the extent of the region filled with valid data and the extent of the region, which exists because the granularity of sparse files requires regions of a certain size to be allocated (if any storage in the area is required).

因为数据库快照包括主要数据库中自从数据库快照被创建以来已经改变的所有数据的原始值，所以数据库快照创建时间的数据库数据可从该快照中读取。为响应于对来自数据库快照的数据请求，数据可从数据库快照的副文件中读取(若副文件包含来自该请求的数据的话)。要被读取而不在副文件中的数据自从数据库快照被创建以来尚未改变，且可从主要数据库中读取。Because a database snapshot includes the original values of all data in the primary database that has changed since the database snapshot was created, database data at the time the database snapshot was created can be read from the snapshot. In response to a request for data from the database snapshot, data may be read from a side file of the database snapshot (if the side file contains data from the request). Data to be read that is not in the secondary file has not changed since the database snapshot was created and can be read from the primary database.

根据本发明的另一个方面，副文件包含来自主要数据库的数据页。当主要数据库的任一页上的数据被改变时，该数据页就被储存在副文件中。本发明是参考作为主要数据库中数据单元的页来描述的；然而，可以构想也可使用其它来自主要数据库的其它数据单元。According to another aspect of the invention, the secondary files contain data pages from the primary database. When data is changed on any page of the primary database, that data page is stored in the secondary file. The invention has been described with reference to pages as units of data in the primary database; however, it is contemplated that other units of data from the main database could also be used.

为确定哪个数据被写到副文件中，以及哪个数据应当被读到主要数据库中，应当确认副文件中的有效数据的存在。在一个实例中，它能被直接读取以确定有效数据是否存在。根据本发明的另一个方面，副页表可被创建，它储存有关给出页是否存在且有效的数据。To determine which data is written to the side file, and which data should be read into the main database, the presence of valid data in the side file should be confirmed. In one example, it can be read directly to determine whether valid data is present. According to another aspect of the invention, a secondary page table can be created which stores data as to whether a given page exists and is valid.

对主要数据库中的每个页而言，副页表可储存有关该页是否应从主要数据库中读取的信息，这表明其没有变化，或者副页表可储存关于该页是否应从副文件中读取的信息，这表明其已经变化了。副页表允许快速确定给定页面是否存在于副文件中。For each page in the main database, the subpage table can store information about whether the page should be read from the main database, indicating that it has not changed, or the subpage table can store information about whether the page should be read from the sidefile fetched information, which indicates that it has changed. The sidepage table allows to quickly determine whether a given page exists in the sidefile.

根据本发明的另一个方面，副文件和稀疏文件机制均使用相同的页/区域大小。即，副文件从主要数据库中保存的页与当任何存储器被写入稀疏文件中时稀疏文件所储存的区域是相同大小的。例如，若稀疏文件区域是8KB，且从主要数据库储存的页也是8KB，则页大小和区域大小是相等的。在这种情况下，任何被填充的区域将被读自主要数据库的页完全填充，而且不可能在该区域中储存无效数据。According to another aspect of the invention, both sidefile and sparsefile mechanisms use the same page/region size. That is, the pages saved by the side file from the primary database are the same size as the area the sparse file stores when any memory is written into the sparse file. For example, if the sparse file region is 8KB, and the pages stored from the main database are also 8KB, then the page size and region size are equal. In this case, any filled regions will be completely filled with pages read from the primary database, and it is impossible to store invalid data in that region.

依照本发明的另一方面，多个副文件区域可确切地对应于每个页。例如，若稀疏文件区域是8KB(千字节)，且从主要数据库储存的页是16KB，则储存在副文件中的每一页将填满两个区域。而且，在这种情况下，任何被填充的区域将被来自从主要数据库读出的页的内容完全填充。而且，无效数据不可能被包含在该区域中。According to another aspect of the present invention, a plurality of side file areas may correspond exactly to each page. For example, if the sparse file region is 8KB (kilobytes), and the pages stored from the primary database are 16KB, then each page stored in the secondary file will fill two regions. Also, in this case, any filled regions will be completely filled with content from pages read from the primary database. Also, invalid data cannot be contained in this area.

对本发明的这些方面而言，副页表包含存储器内位图，其持有关于副文件中每个页面信息的一个比特。对副文件中的每一页而言，对应的比特表明该页是否在副文件中。For these aspects of the invention, the side page table contains an in-memory bitmap that holds one bit of information about each page in the side file. For each page in the sidefile, a corresponding bit indicates whether the page is in the sidefile.

根据本发明的另一方面，副文件区域的粒度大于从主要数据库储存的页的粒度。例如，若副文件的每个区域是64KB，且该页大小是8KB，则副文件中区域的存在不必表明区域中的所有信息是来自主要数据库的有效数据。若仅有一个页被复制到副文件中，在此示例中，则所分配的区域中的64KB中仅有8KB会包含有效数据。在另一个实施例中，某些副文件页在区域中散布。According to another aspect of the invention, the granularity of the secondary file area is greater than the granularity of pages stored from the primary database. For example, if each region of the sidefile is 64KB, and the page size is 8KB, the presence of the region in the sidefile does not necessarily indicate that all the information in the region is valid data from the primary database. If only one page is copied into the sidefile, in this example only 8KB of the 64KB in the allocated region will contain valid data. In another embodiment, certain side file pages are scattered throughout the zone.

对这些方面而言，副页表包含两个存储器内位图，其持有关于副文件中每一页的信息的两个比特，称为比特1和比特2。对副文件中的每一页而言，对应的比特表明(比特1)页是否确实在副文件中，以及(比特2)页是否潜在地在副文件中。比特2也可被认为是表明已经分配了其中页应当被保存在副文件中的区域。然而，如下所述，在一个实施例中，该比特2仅当副页表被重新构造时被置位。For these purposes, the side page table contains two in-memory bitmaps that hold two bits of information about each page in the side file, referred to as bit 1 and bit 2. For each page in the sidefile, a corresponding bit indicates (bit 1) whether the page is actually in the sidefile, and (bit 2) whether the page is potentially in the sidefile. Bit 2 can also be considered to indicate that an area has been allocated where the page should be kept in the side file. However, as described below, in one embodiment, this bit 2 is only set when the secondary page table is reconstructed.

位图被维持在存储器内，因此可能不是持久的。若它们被擦除，则从稀疏文件信息中重新构造位图。咨询稀疏文件，并且对每一页而言，若副文件已经为其中定位页的区域分配了空间，则比特2被置位，以表明页潜在地在副文件中。对每一页而言，比特1最初被置位，以表明不确定该页在副文件中。Bitmaps are maintained in memory and therefore may not be persistent. If they are erased, the bitmap is reconstructed from the sparse file information. The sparse file is consulted, and for each page, if the side file has allocated space for the region in which the page is located, bit 2 is set to indicate that the page is potentially in the side file. For each page, bit 1 is initially set to indicate that the page is not in the sidefile.

若副页表以其持续的方式维持，则区域和页的粒度可能是可忽略的，且可使用一比特的副页表。然而，在一个实施例中，为在数据库服务器重启之后支持持久数据库视图，使用二比特页表。If subpage tables are maintained in their persistent fashion, the granularity of regions and pages may be negligible, and a one-bit subpage table may be used. However, in one embodiment, to support persistent database views across database server restarts, two-bit page tables are used.

依照本发明的一方面，没有为副文件创建页表。在这种情况下，每当有必要决定是否已经复制数据库快照中的页时，就咨询数据库快照。下面将参考本发明的一方面来描述本发明，其中存在一比特或二比特的页表，然而，也可构想其中没有页表的本发明的其它实施例，而且数据库视图必须被检查，以决定它是否包含从主要数据库中复制的页。According to an aspect of the invention, no page tables are created for side files. In this case, the database snapshot is consulted whenever it is necessary to determine whether pages in the database snapshot have been replicated. The invention will be described below with reference to an aspect of the invention in which there is a one-bit or two-bit page table, however, other embodiments of the invention are conceivable in which no page table is present and the database view must be examined to determine Whether it contains pages copied from the primary database.

如图9所示，主要数据库800的数据库快照920由副文件925组成。主要数据库800中的每个数据文件802具有数据库快照920中对应的副文件925。另外，副页表数据930被储存在数据库快照920的存储器内。根据本发明的一个方面，副页表数据930是一个副页表，其覆盖所有的副文件925。根据本发明的另一个方面，对每个副文件925可存在单独的副页表。As shown in FIG. 9 , database snapshot 920 of primary database 800 consists of secondary files 925 . Each data file 802 in primary database 800 has a corresponding secondary file 925 in database snapshot 920 . In addition, secondary page table data 930 is stored in the memory of the database snapshot 920 . According to one aspect of the present invention, secondary page table data 930 is a secondary page table that covers all secondary files 925 . According to another aspect of the invention, a separate side page table may exist for each side file 925 .

在数据库中，事务日志是自从事务日志被最后一次备份以来已对数据库执行的所有事务的连续记录。事务日志用来将数据库恢复到失效点。根据本发明的一个方面，事务日志被建模为循环队列。可通过删除日志的不活动部分而截短事务日志。这个不活动部分包含完成的事务，其不需要被恢复，这至少是因为它们反映的变化已经持久保存到数据文件。相反，事务日志的活动部分包含已完成的事务和仍旧在运行且未完成事务(活动事务)。可完成截短，以最小化事务日志中的不活动空间，而不是允许事务日志继续增长并使用更多空间。In a database, the transaction log is a continuous record of all transactions that have been performed on the database since the transaction log was last backed up. The transaction log is used to restore the database to the point of failure. According to one aspect of the invention, the transaction log is modeled as a circular queue. The transaction log can be truncated by removing inactive portions of the log. This inactive portion contains completed transactions, which do not need to be recovered, at least because they reflect changes that have been persisted to the data files. In contrast, the active portion of the transaction log contains completed transactions as well as incomplete transactions that are still running (active transactions). Truncation can be done to minimize inactive space in the transaction log, rather than allowing the transaction log to continue to grow and use more space.

活动事务可引发事务不一致性。对活动事务而言，数据文件的某些修改可能还没有从缓冲器高速缓存中写到数据文件中，且可能存在来自数据文件中未完成事务的某些修改。日志文件810可用来确保数据库的恢复是事务一致的。这可通过使用ARIES(用于恢复和隔离使用语义的算法)风格的恢复来完成。记录在日志中的可能还没有被写到数据文件中的每一修改通过对数据库执行修改而卷进。为确保数据库的完整性，在事务日志中发现的每个未完成事务通过撤销对数据库的修改而卷回。Active transactions can cause transaction inconsistencies. For active transactions, some modifications to the data file may not have been written to the data file from the buffer cache, and there may be some modifications from outstanding transactions in the data file. Log files 810 may be used to ensure that recovery of the database is transactionally consistent. This can be done using ARIES (Algorithms for Recovery and Isolation Usage Semantics) style recovery. Every modification recorded in the log that may not have been written to the data files is rolled in by performing the modification to the database. To ensure the integrity of the database, each incomplete transaction found in the transaction log is rolled back by undoing the modification to the database.

为创建数据库快照，数据库视图的实体结构(副文件和页表)必须被初始化。首先，对主要数据库800中的每个数据文件802创建副文件925。如上所述，副文件可以是稀疏文件，或者在另一个实施例中为与数据文件802大小相同的非稀疏文件。副文件925c与主要数据库800中的数据文件802相关联。To create a database snapshot, the physical structure of the database view (side files and page tables) must be initialized. First, a secondary file 925 is created for each data file 802 in the primary database 800 . As mentioned above, the side file may be a sparse file, or in another embodiment a non-sparse file of the same size as the data file 802 . Side file 925c is associated with data file 802 in primary database 800 .

因为事务是连续出现的，而且数据库视图是事务一致的，所以事务日志应当在数据库快照创建期间使用。为确保关于应当用于数据库视图的事务的信息不被丢弃，在主要数据库800上禁用日志截短(若其存在)。Because transactions occur sequentially and database views are transactionally consistent, a transaction log should be used during database snapshot creation. To ensure that information about transactions that should be used for database views is not discarded, log truncation is disabled on the primary database 800 (if it exists).

根据本发明的一方面，为数据库快照初始化副页表930。首先，副页表930被设置，以表明副文件925中不存在页，而且，在二比特副页表的情况下，没有页面潜在地在或确定在副文件925中。According to an aspect of the invention, secondary page table 930 is initialized for database snapshots. First, sidepage table 930 is set to indicate that no pages exist in sidefile 925, and, in the case of a two-bit sidepage table, no pages are potentially or determined to be in sidefile 925.

当完成初始化时，数据库快照准备好“上线”。现在，数据库快照将沿着主要数据库800运行，而且当执行修改时，修改的页的原始值的副本(即执行更新之前的页的内容)将被储存在数据库快照中。用于实现数据库的事务一致快照的示例性方法可包括确定事务日志上的拆分点。这个拆分将对应于数据库快照所代表的时间点。主要数据库800上日志末端的LSN可在数据库快照被创建时获取；这个LSN是主要数据库800和数据库快照820将开始分叉处的“拆分点”。然后，主要数据库800可被标记，这样以使需要数据库快照处理。主要数据库800中的数据库快照支持按如下所述开始。When initialization is complete, the database snapshot is ready to "go live". Now, the database snapshot will run along the primary database 800, and when a modification is performed, a copy of the original value of the modified page (ie the content of the page before the update was performed) will be stored in the database snapshot. An exemplary method for achieving a transactionally consistent snapshot of a database may include determining a split point on a transaction log. This split will correspond to the point in time represented by the database snapshot. The LSN at the end of the log on primary database 800 is available when the database snapshot is created; this LSN is the "split point" at which primary database 800 and database snapshot 820 will begin to fork. The primary database 800 may then be marked such that database snapshot processing is required. Database snapshot support in primary database 800 begins as described below.

为了数据库快照一致，必须分析拆分点之前的主要数据库800的日志，以确定什么事务在拆分时是活动的。日志中最老的活动(从拆分点开始)事务被标识。在最老活动事务之前允许日志截短。For database snapshot consistency, the log of the primary database 800 prior to the split point must be analyzed to determine what transactions were active at the time of the split. The oldest active (since the split point) transaction in the log is identified. Allow log truncation before the oldest active transaction.

通过类似于ARIES(用于恢复和隔离使用语义的算法)风格的恢复的方式，对数据库快照执行主要数据库800日志中所有来自拆分点之前的最老活动事务的操作。图10是根据本发明的一方面的示例性事务日志-日志文件810的框图。日志文件810中的日志条目包括日志条目1000、1010、1020、1030、1040、1050、1060、1080、1080、1090和1099。确定拆分点1075。事务继续被写到日志，然而禁用截短。检查日志文件810，并对副文件925执行任何对数据库的修改，作为从最老活动事务到拆分点的事务的结果(在图10的示例中，从日志条目n1000到日志条目n+7)。这些事务的每一个中的修改结果被储存在副文件925中。然而，检查这些事务。由日志中的任何活动事务，如日志条目n1000、日志条目n+21020和日志条目n+6写到日志文件的修改在副文件925中撤销。In a manner similar to ARIES (Algorithms for Recovery and Isolation Usage Semantics) style recovery, all operations from the oldest active transaction in the primary database 800 log before the split point are performed on the database snapshot. FIG. 10 is a block diagram of an exemplary transaction log-log file 810 in accordance with an aspect of the invention. The log entries in log file 810 include log entries 1000 , 1010 , 1020 , 1030 , 1040 , 1050 , 1060 , 1080 , 1080 , 1090 , and 1099 . Determine the split point 1075. Transactions continue to be written to the log, however truncation is disabled. Examine the log file 810 and perform any modifications to the database on the side file 925 as a result of the transaction from the oldest active transaction to the split point (in the example of Figure 10, from log entry n1000 to log entry n+7) . The results of modifications in each of these transactions are stored in side file 925. However, check these transactions. Modifications written to the log file by any active transaction in the log, such as log entry n1000, log entry n+21020, and log entry n+6, are undone in sidefile 925.

然而，某些事务可能还没有提交。因此，日志中直到拆分点的活动事务应当被定位并被撤销。根据本发明的一方面，在未完成事务改变数据库中某一位置的值处，已经被添加到上述副文件的变化从该副文件中删除。或者，事务的撤销可通过如下详细描述的修改数据库快照、设置副文件中的数据以匹配数据库中从拆分点开始的数据。However, some transactions may not have been committed. Therefore, transactions active in the log up to the split point should be located and undone. According to an aspect of the present invention, where an outstanding transaction changes the value of a certain position in the database, changes that have been added to the above-mentioned side file are deleted from the side file. Alternatively, the transaction can be undone by modifying the database snapshot as detailed below, setting the data in the side file to match the data in the database starting from the split point.

这样，仅有来自日志的未完成事务被反映在数据库快照中。当日志中的事务被反映在数据库快照中时，除在拆分点出现时活动但已被撤销的事务之外，在主要数据库800中允许日志截短。因为已经启用了数据库快照处理，所以数据库快照将在对主要数据库800做出改变时更新，并且因此数据库快照可用来确定从拆分点时刻开始的主要数据库800的内容。This way, only outstanding transactions from the log are reflected in the database snapshot. Log truncation is allowed in the primary database 800 when transactions in the log are reflected in the database snapshot, except for transactions that were active when the split point occurred but have been withdrawn. Because database snapshot processing has been enabled, database snapshots will be updated when changes are made to primary database 800, and thus database snapshots can be used to determine the contents of primary database 800 from the moment of the split point.

当数据库服务器在其被关闭之后重启时(或者正常或者非正常)，数据库快照应被重新初始化。为了这样做，已经被储存在存储器中的副页表必须被重新初始化。When the database server is restarted (either gracefully or abnormally) after it was shut down, the database snapshot should be reinitialized. In order to do this, the subpage table already stored in memory must be reinitialized.

为重新初始化副页表，在二比特副页表实现中，对已经被分配的副页表中的每个区域而言，已经被分配的区域中每一页的副页表中的数据(比特2)被置位，以表明该页可能已经被写到副文件925中。所有其它页面的副页表中的数据被置位，以表明页不可能被写到副文件925中。然而，页被写到副文件925中是不确定的，因此，比特1没有被初始地设置。To reinitialize the subpage table, in a two-bit subpage table implementation, for each region in the subpage table that has been allocated, the data in the subpage table for each page in the region that has been allocated (bit 2) is set to indicate that the page may have been written to sidefile 925. Data in the secondary page table for all other pages is set to indicate that the page cannot be written to secondary file 925. However, it is indeterminate that a page is written to sidefile 925, so bit 1 is not initially set.

或者，在二比特副页表实现中，或者在一比特副页表实现中，可检查副文件925，以确定对每一页而言，副文件925中的页面是否有效，如上所述。该页表被设置，以表明对每个确实存在的页而言，页确实实际存在于副文件925中。所有其它页被设置，以表明页不存在于副文件925中。Alternatively, in a two-bit side table implementation, or in a one-bit side page table implementation, side file 925 may be checked to determine for each page whether a page in side file 925 is valid, as described above. The page table is set to indicate that the page does actually exist in sidefile 925 for each page that does exist. All other pages are set to indicate that the page does not exist in sidefile 925.

为了让数据库快照在数据被盖写之前储存来自主要数据库800的信息，主要数据库800必须支持数据库快照的创建。对主要数据库800修改的每一页而言，必须做出页面是否在数据库快照中的确定。若页面存在于数据库快照中，则它是页的正确版本。例如，当对主要数据库800中的页做出前一个修改时可能会出现这种情况。若页在主要数据库800中被再次改变，则数据库视图中的版本不应变化。In order for the database snapshot to store information from the primary database 800 before the data is overwritten, the primary database 800 must support the creation of database snapshots. For each page modified by primary database 800, a determination must be made whether the page is in the database snapshot. If the page exists in the database snapshot, it is the correct version of the page. This may arise, for example, when a previous modification was made to a page in the primary database 800 . If the page is changed again in the primary database 800, the version in the database view should not change.

当从页已被改变的主要数据库800中接收信息时，若页在副文件925中，则什么也不做。若页不在副文件925中，则页应当被写到副文件925中，并且应当在副页表中置位正确的比特。在存在二比特页表的情况下，对于该页的比特1和比特2有三种可能性，如由下表1所示：比特1表明页确实在副文件中比特1不表明该页确实在副文件中比特2表明该页可能在副文件中情况1：页在副文件中情况2：页可能在副文件中比特2表明该页确实不在副文件中情况1：页在副文件中[或者：情况4：无效] 情况3：页确实不在副文件中 When receiving information from the primary database 800 that a page has been changed, if the page is in the secondary file 925, then do nothing. If the page is not in sidefile 925, the page should be written to sidefile 925 and the correct bit should be set in the sidepage table. In the case of a two-bit page table, there are three possibilities for bit 1 and bit 2 of the page, as shown in Table 1 below: Bit 1 indicates that the page is indeed in the sidefile Bit 1 does not indicate that the page is actually in the sidefile Bit 2 indicates that the page may be in the sidefile Case 1: The page is in the side file Case 2: Pages may be in sidefiles Bit 2 indicates that the page is indeed not in the sidefile case 1: page in side file [or: case 4: invalid] Case 3: The page is indeed not in the side file

表格1：对于二比特页表的情况Table 1: For two-bit page tables

根据本发明的一方面，当比特1表明页确实在副文件925中时，比特2被忽略；这样，如表1所示，若比特1表明页确实在副文件925中，则页被假定在副文件925中，而不管比特2表明什么。在另一个实施例中，当比特1被置位，以表明页确实在副文件925中时，比特2被置位，以表明页可能在副文件925中，并且在这个替换实施例中，当比特1表明页确实在副文件925中，而比特2表明该页确实不在副文件925中时，该情况是无效的，并且遇到错误。According to an aspect of the invention, when bit 1 indicates that the page is indeed in sidefile 925, bit 2 is ignored; thus, as shown in Table 1, if bit 1 indicates that the page is indeed in sidefile 925, then the page is assumed to be in sidefile 925, regardless of what bit 2 indicates. In another embodiment, when bit 1 is set to indicate that the page is indeed in sidefile 925, bit 2 is set to indicate that the page may be in sidefile 925, and in this alternate embodiment, when When bit 1 indicates that the page is indeed in sidefile 925 and bit 2 indicates that the page is indeed not in sidefile 925, the case is invalid and an error is encountered.

当主要数据库800表明页正被改变，则对二比特页表而言，应当对上文列出情况而采取的行动如下：When the primary database 800 indicates that a page is being changed, then for a two-bit page table, the actions that should be taken for the situations listed above are as follows:

情况1：什么也不做。Case 1: Do nothing.

情况2：确定页是否在副文件225中，若没有，则将该页写到副文件225中。Case 2: Determine whether the page is in the side file 225, if not, write the page into the side file 225.

情况3：将页写到副文件925中。Case 3: Write the page to the side file 925.

在情况1或情况2中，当页被写到副文件925中时，主要数据库800中页的老版本(现在正由主要数据库800修改的版本)被写到副文件925中。另外，页表被设置，以表明页现在正在副文件925中，这样以使对页的任何后续写入将根据情况1来处理，并且数据库视图的正确页面保持储存在副文件925中。In case 1 or case 2, when a page is written to side file 925, the old version of the page in main database 800 (the version that is now being modified by main database 800) is written to side file 925. Additionally, the page table is set to indicate that the page is now in sidefile 925 so that any subsequent writes to the page will be processed according to case 1 and the correct page for the database view remains stored in sidefile 925.

在情况2中为确定页是否在副文件925中，对应于该页的数据从副文件925中读取。若数据是有效的，则页的前一个版本在副文件925中，且其不应被盖写。在一个实施例中，对应于该页的页表比特1被置位，以表明该页确实在副文件925中，因此将来对该页的写入在情况1中处理。In case 2 to determine whether the page is in the side file 925, the data corresponding to the page is read from the side file 925. If the data is valid, then the previous version of the page is in side file 925 and it should not be overwritten. In one embodiment, the page table bit 1 corresponding to the page is set to indicate that the page is indeed in the sidefile 925, so future writes to the page are processed in case 1 .

数据无效性可由放置在新分配的区域中的数据来表示，以表明还没有有效数据被写到区域中。例如，若已知没有数据库页会包含全零的话，全零可被写到新分配的区域中。若情况如此，则副文件925中页的存在是由副文件925中的对应页来表示的，该页是分配的区域的一部分且包含某些非零数据。Data invalidity may be indicated by data placed in a newly allocated area to indicate that no valid data has yet been written to the area. For example, all zeros can be written to the newly allocated area if it is known that no database page will contain all zeros. If this is the case, the existence of a page in sidefile 925 is indicated by a corresponding page in sidefile 925 that is part of the allocated region and that contains some non-zero data.

表格1中详细描述的情况也适用于执行储存在数据库快照中的数据读取。当页中的数据从数据库视图中读取时，该页应从副文件925中读取(若其存在于副文件925中的话)。若其不存在于，则页应当从主要数据库800中读取。在二比特页表系统中，应当对三种情况而采取的行动如下：The conditions detailed in Table 1 also apply to performing reads of data stored in database snapshots. When data in a page is read from a database view, the page should be read from sidefile 925 (if it exists in sidefile 925). If it does not exist, the page should be read from the primary database 800. In a two-bit page table system, the actions that should be taken for the three situations are as follows:

情况1：从副文件925中读取页。Case 1: A page is read from the side file 925.

情况2：确定页是否在副文件925中，若其在副文件925中，则从副文件925中读取页，若其不在副文件925中，则从主要数据库800中读取页。Case 2: Determine if the page is in the side file 925, if it is in the side file 925, then read the page from the side file 925, if it is not in the side file 925, then read the page from the primary database 800.

情况3：从主要数据库800中读取页。Case 3: A page is read from the primary database 800 .

数据库快照代表先前的时间点处的数据库状态。用户可选择使用数据库快照作为数据库。例如，用户可选择对数据库快照执行行动，以创建该数据库的数据库快照，如同它已在先前的时间点对数据库快照执行了行动。另外，在初始化阶段，如上面详细说明的，可在数据库快照上执行并撤销事务。A database snapshot represents the state of the database at a previous point in time. Users can choose to use a database snapshot as the database. For example, a user may choose to perform an action on a database snapshot to create a database snapshot of the database as if it had performed an action on the database snapshot at a previous point in time. Additionally, during the initialization phase, as detailed above, transactions can be executed and undone on the database snapshot.

为修改数据库快照，修改应基于数据库快照中的数据，而且所得的页应被储存在数据库快照中。若数据库快照中不存在关于页的数据，则修改应基于主要数据库800中的数据，且所得的页应储存在数据库快照中。In order to modify a database snapshot, the modification should be based on the data in the database snapshot, and the resulting pages should be stored in the database snapshot. If the data about the page does not exist in the database snapshot, the modification should be based on the data in the primary database 800 and the resulting page should be stored in the database snapshot.

在二比特页表系统中，应对于三种情况而采取的行动如下：In the two-bit page table system, the actions to be taken in the three cases are as follows:

情况1：从副文件925中读取页、执行修改、将页写到副文件925。Case 1: Read a page from side file 925, perform modification, write the page to side file 925.

情况2：确定页是否在副文件925中，若其在的话，则如情况1进行，若其不在的话，则如情况3进行。Case 2: Determine if the page is in side file 925, if it is, then proceed as in case 1, if not, then proceed as in case 3.

情况3：从主要数据库800中读取页、将页写到副文件925中、以及设置页表，以表明该页在副文件925中。执行对页的修改，并将已修改的页在适当时写到副文件925中。Case 3: A page is read from the main database 800, written to the sidefile 925, and the page table is set to indicate that the page is in the sidefile 925. Modifications to pages are performed and the modified pages are written to sidefile 925 as appropriate.

在上述一个或多个示例性系统的视图中，参考图11-13的流程图，可更好地认识到可依照本发明实现的方法。虽然为解释简单起见，以一系列方框示出并描述方法，然而应当理解并被认识到的是，本发明不受方框顺序的限制，因为依照本发明，某些方框可能以不同的次序发生和/或与这里描述并说明的其它方框同时发生。而且，依照本发明，不是需要所有示出的方框来实现本方法。In view of one or more of the exemplary systems described above, methods that may be implemented in accordance with the present invention may be better appreciated with reference to the flowcharts of FIGS. 11-13 . While the methodologies are shown and described as a series of blocks for simplicity of explanation, it is to be understood and appreciated that the invention is not limited by the order of the blocks, as certain blocks may be in different order in accordance with the invention. Occurs sequentially and/or concurrently with other blocks described and illustrated herein. Moreover, not all illustrated blocks may be required to implement a methodology in accordance with the invention.

另外，应当进一步认识到的是，以下和贯穿本说明书所揭示的方法能够被储存在制造物品中，以便于将这样的方法运输并传递到计算机中。所用的术语制造物品旨在包含可从任何计算机可读设备、载波或介质上访问的计算机程序。作为说明而非限制，制造物品可具体体现为计算机可读指令、数据结构、模式、程序模块等等。In addition, it should be further appreciated that the methods disclosed below and throughout this specification can be stored in an article of manufacture to facilitate transport and transfer of such methods to a computer. The term article of manufacture is used to encompass a computer program accessible from any computer readable device, carrier, or media. By way of illustration and not limitation, articles of manufacture may embody computer readable instructions, data structures, schemas, program modules, and the like.

图11依照本发明的一方面描述了用于建立快照数据库1100的方法。快照数据库维持有关从其创建的时间点以来对源数据库的改变的数据。在1100处，接收变更源或主要数据库中的数据的请求。例如，可做出改变或变更数据页的请求。在1120处，要由新数据置换的源数据库中的数据副本被复制到对应于源数据库的修改的快照数据库文件和页中。在1130处，新数据被源数据库中的老数据复制或替代。最后，在1130处，可更新目录，以通知对数据库的改变以及快照数据库中的条目。FIG. 11 depicts a method for building a snapshot database 1100 in accordance with an aspect of the present invention. A snapshot database maintains data about changes to the source database since the point in time it was created. At 1100, a request to alter data in a source or primary database is received. For example, a request to change or alter a data page may be made. At 1120, a copy of the data in the source database to be replaced by the new data is copied into the modified snapshot database files and pages corresponding to the source database. At 1130, the new data is copied or replaced by the old data in the source database. Finally, at 1130, the catalog can be updated to notify of changes to the database and entries in the snapshot database.

图12依照本发明的一方法示出了数据恢复方法1200。在1210处，储存在快照数据库中的每一数据页可盖写主要数据库中的对应位置上数据。此外，应当认识到，数据库中的主要文件可在必要的情况下被扩展，以启用快照数据的接收。在1220处，快照数据库或其中文件的大小可被标识，并与主要数据库或其中对应文件的大小相比较。若快照数据库或其中的文件小于主要数据库或其中的文件，其意味着文件不应存在于恢复的数据库中，则主要数据库可被截短，以删除添加的数据页。这可对应于当根据本发明方面新添加的数据被添加到文件末端时删除文件中的最后一个数据页。在1230处，可从存储器中检索在快照创建时尚未提交的打开的事务，并应用于正恢复的主要数据库。接下来，可在1240处检索数据库日志，并应用于恢复的数据库，以进一步将数据库推进到更接近需要恢复的事件处，这样以使保存直到事件出现但是在事件出现之前的尽可能多的“良好”事务。Figure 12 illustrates a data recovery method 1200 in accordance with a method of the present invention. At 1210, each page of data stored in the snapshot database may overwrite data at a corresponding location in the primary database. In addition, it should be appreciated that the main files in the database can be extended if necessary to enable the receipt of snapshot data. At 1220, the size of the snapshot database or files therein may be identified and compared to the size of the primary database or corresponding files therein. If the snapshot database or the files in it are smaller than the primary database or the files in it, meaning that the files should not exist in the restored database, the primary database can be truncated to remove the added data pages. This may correspond to deleting the last page of data in the file when newly added data is added to the end of the file according to aspects of the invention. At 1230, open transactions that were not committed at the time of snapshot creation can be retrieved from memory and applied to the primary database being restored. Next, the database log can be retrieved at 1240 and applied to the recovered database to further advance the database closer to the event that needs to be recovered so that as many " good" business.

现在转到图13，依照本发明的一方面来描述数据恢复方法1300。在1310处，数据库快照被创建并被维持。数据库快照可由用户在任何时刻创建。此外，可创建一个以上快照，这样以随着时间的推移而提供多个版本点。快照数据库也可被自动创建。例如，监视器组件可观察关于源或主要数据库的行动作，并检测和/或推断可显著变更数据库的行动。例如，快照可在检测到新应用程序安装时被自动创建。根据本发明的一方面，数据库快照可储存对源数据库的变化。因此，维持数据库快照对应于复制对其的变化。根据本发明的另一方面，快照可包含稀疏文件，从而仅储存对于对应页的改变，并且与主要数据库共享所有其它的数据。在1320处，数据库可在出现包括但不局限于用户错误(如快手指删除)等事件时，还原到由快照所标记的先前的时间点。特别地，还原或恢复可包含，将来自快照数据库中的页复制到主要数据库中的页上、截短主要数据库、在快照创建的时候将未提交的打开的事务应用于数据库、以及将数据库日志信息应用于主要数据库以集中于事件上。Turning now to FIG. 13, a data recovery method 1300 is described in accordance with an aspect of the present invention. At 1310, a database snapshot is created and maintained. Database snapshots can be created by users at any time. Additionally, more than one snapshot can be created, thus providing multiple version points over time. Snapshot databases can also be created automatically. For example, a monitor component can observe actions on a source or primary database, and detect and/or infer actions that may significantly alter the database. For example, a snapshot can be automatically created when a new application installation is detected. According to an aspect of the invention, a database snapshot may store changes to a source database. Therefore, maintaining a database snapshot corresponds to replicating changes to it. According to another aspect of the invention, snapshots may contain sparse files, storing only changes to corresponding pages, and sharing all other data with the primary database. At 1320, the database can be restored to the previous point in time marked by the snapshot when an event occurs including but not limited to a user error (such as quick finger deletion). In particular, restore or recovery may involve copying pages from the snapshot database to pages in the primary database, truncating the primary database, applying uncommitted open transactions to the database at the Information is applied to the main database to focus on events.

示例操作环境Example operating environment

为提供关于本发明各方面的环境，图14以及下面的讨论旨在提供其中可实现本发明各方面的合适计算环境的简短概括描述。虽然上面已经通过在一台和/或多台计算机上运行的计算机程序的计算机可执行指令的通用上下文来描述了本发明，但是本领域的技术人员会认识到，本发明也可结合其它的程序模块来实现。一般而言，程序模块包括例程、程序、组件、数据结构等，其执行特定的任务和/或实现特定的抽象数据类型。此外，本领域的技术人员会认识到，本发明方法可用其它计算机系统配置来实施，包括单处理器或多处理器计算机系统、小型计算设备、大型机，以及个人计算机、手持式计算设备、基于微处理器的或可编程消费者电子设备等等。本发明所示的各方面也可在分布式计算环境中实施，其中任务由通过通信网络链接的远程处理设备来执行。然而，本发明的某些(若不是所有)方面可在独立的计算机上实施。在分布式计算环境中，程序模块可位于本地和远程存储器存储设备中。In order to provide context with respect to aspects of the invention, FIG. 14 and the following discussion are intended to provide a brief general description of a suitable computing environment in which aspects of the invention may be implemented. Although the invention has been described above in the general context of computer-executable instructions of a computer program running on one and/or more computers, those skilled in the art will recognize that the invention can also be combined with other programs module to achieve. Generally, program modules include routines, programs, components, data structures, etc. that perform particular tasks and/or implement particular abstract data types. In addition, those skilled in the art will recognize that the methods of the present invention may be implemented with other computer system configurations, including single-processor or multi-processor computer systems, miniature computing devices, mainframes, as well as personal computers, handheld computing devices, based Microprocessor's or programmable consumer electronics, etc. The illustrated aspects of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all, aspects of the invention can be implemented on stand-alone computers. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

参考图14，用于实现本发明各个方面的示例性环境1410包括计算机1412。计算机1412包括处理单元1414、系统存储器1416以及系统总线1418。系统总线1418将系统组件(包括但不局限于系统存储器1416)连接到处理单元1414。处理单元1414可以是任何多种可用处理器。双微处理器和其它多处理器体系结构也可被用作处理单元1414。Referring to FIG. 14 , an exemplary environment 1410 for implementing aspects of the invention includes a computer 1412 . Computer 1412 includes a processing unit 1414 , a system memory 1416 and a system bus 1418 . System bus 1418 connects system components, including but not limited to system memory 1416 , to processing unit 1414 . Processing unit 1414 may be any of a variety of available processors. Dual microprocessors and other multiprocessor architectures may also be used as processing unit 1414 .

系统总线1418可以是任何几种类型的总线结构的一种或多种，包括存储器总线或存储器控制器、外围总线或外部总线、和/或使用任何多种可用总线体系结构的任一种的局部总线，这包括但不局限于，11位总线、工业标准体系结构(ISA)、微通道体系结构(MCA)、扩展ISA(EISA)、智能驱动电子设备(IDE)、VESA局部总线(VLB)、外围部件互联(PCI)、通用串行总线(USB)、高级图形端口(AGP)、个人计算机存储器卡国际协会总线(PCMCIA)，以及小型计算机系统接口(SCSI)。The system bus 1418 can be one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any of a variety of available bus architectures. Buses, which include, but are not limited to, 11-bit buses, Industry Standard Architecture (ISA), Micro Channel Architecture (MCA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer System Interface (SCSI).

系统存储器1416包括易失性存储器1420和非易失性存储器1422。基本输入/输出系统(BIOS)包含基本例程，用来如在启动阶段在计算机1412内的元件之间传递信息，该基本输入/输出系统(BIOS)被储存在非易失性存储器1422中。作为说明而非限制，非易失性存储器1422可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除ROM(EEPROM)、或者闪存。易失性存储器1420包括随机存取存储器(RAM)，其用作外部高速缓存存储器。作为说明但非限制，RAM可以多种形式可用，如同步RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路DRAM(SLDRAM)，以及直接存储器总线RAM(DRRAM)。System memory 1416 includes volatile memory 1420 and nonvolatile memory 1422 . A basic input/output system (BIOS), which contains the basic routines used to transfer information between elements within the computer 1412 , such as during startup, is stored in non-volatile memory 1422 . By way of illustration and not limitation, nonvolatile memory 1422 may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 1420 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain way DRAM (SLDRAM), and direct memory bus RAM (DRRAM).

计算机1412也包括可移动/不可移动、易失性/非易失性计算机存储介质。图14示出例如盘存储1412。盘存储1412包括但不局限于诸如磁盘驱动器、软盘驱动器、磁带驱动器、Jaz驱动器、Zip驱动器、LS-100驱动器、闪存卡、或者记忆棒等设备。另外，盘存储器1424可单独包括存储介质或与其它存储介质结合，其它存储介质包括但不限于光盘驱动器，如光盘ROM设备(CD-ROM)、CD可记录驱动器(CD-R驱动)、CD可重写驱动器(CD-RW驱动器)或者数字多功能盘ROM驱动器(DVD-ROM)。为促进盘存储设备1424与系统总线1418的连接，通常使用可移动或不可移动接口，如接口1426。Computer 1412 also includes removable/non-removable, volatile/nonvolatile computer storage media. FIG. 14 shows disk storage 1412, for example. Disk storage 1412 includes, but is not limited to, devices such as magnetic disk drives, floppy disk drives, tape drives, Jaz drives, Zip drives, LS-100 drives, flash memory cards, or memory sticks. In addition, disk storage 1424 may include storage media alone or in combination with other storage media, including but not limited to optical disk drives, such as compact disk ROM devices (CD-ROMs), CD recordable drives (CD-R drives), CD recordable drives (CD-R drives), Rewrite drive (CD-RW drive) or Digital Versatile Disk ROM drive (DVD-ROM). To facilitate the connection of disk storage device 1424 to system bus 1418, a removable or non-removable interface, such as interface 1426, is typically used.

应当认识到图14描述了担当用户以及在合适的操作环境1410中描述的基本计算机资源之间的中介的软件。这样的软件包括操作系统1428。操作系统1428可储存于盘存储器1424中，其用来控制并分配计算机系统1412的资源。系统应用程序1430利用操作系统1428通过储存在系统存储器1416或盘存储器1424中的程序模块1432和程序数据1434对资源的管理。此外，应当认识到，本发明可用各种操作系统或操作系统的组合来实现。It should be appreciated that FIG. 14 depicts software that acts as an intermediary between a user and the basic computer resources described in a suitable operating environment 1410 . Such software includes an operating system 1428 . An operating system 1428 may be stored in disk storage 1424 and is used to control and allocate resources of computer system 1412 . System applications 1430 take advantage of the management of resources by operating system 1428 through program modules 1432 and program data 1434 stored in system memory 1416 or disk storage 1424 . In addition, it should be appreciated that the invention can be implemented with various operating systems or combinations of operating systems.

用户通过输入设备1436将命令或信息输入到计算机1412中。输入设备1436包括，但不限于，定位设备(如鼠标、跟踪球、触针、触摸垫、触摸屏)、键盘、话筒、操纵杆、游戏垫、圆盘式卫星电视天线、扫描仪、电视调谐器卡、数码照相机、数码录像机、网络照相机等等。这些和其它输入设备由接口端口1438，通过系统总线1418连接到处理单元1414。接口端口1438包括如串行端口、并行端口、游戏端口以及通用串行总线(USB)。输出设备1440使用某些与输入设备1436相同类型的端口。这样，例如，USB端口可用来将输入提供给计算机1412，并将信息从计算机1412输出到输出设备1440。提供输出适配器1442，用于说明存在某些输出设备1440，像监视器、扬声器和打印机以及其它输出设备1440，它们需要特殊的适配器。输出适配器1442包括(作为说明但非限制)视频卡和声卡，其提供输出设备1440和系统总线1418之间的连接手段。应当注意，如远程计算机1444等其它设备和/或设备系统提供输入以及输出能力。A user enters commands or information into computer 1412 through input device 1436 . Input devices 1436 include, but are not limited to, pointing devices (eg, mouse, trackball, stylus, touch pad, touch screen), keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video recorder, network camera, etc. These and other input devices are connected to processing unit 1414 through system bus 1418 by interface port 1438 . Interface ports 1438 include, for example, serial ports, parallel ports, game ports, and universal serial bus (USB). Output devices 1440 use some of the same types of ports as input devices 1436 . Thus, for example, a USB port may be used to provide input to the computer 1412 and to output information from the computer 1412 to the output device 1440 . An output adapter 1442 is provided to illustrate that there are certain output devices 1440 like monitors, speakers and printers and other output devices 1440 that require special adapters. Output adapters 1442 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between output devices 1440 and system bus 1418 . It should be noted that other devices and/or device systems, such as remote computer 1444, provide input as well as output capabilities.

计算机1412可在使用到一台或多台远程计算机，如远程计算机1444的逻辑连接的网络化环境中工作。远程计算机1444可以是个人计算机、服务器、路由器、网络PC、工作站、基于微处理器的装置、对等设备或者其它普通网络节点等等，其通常包括许多或所有上面相对计算机1412所描述的元件。为简单起见，仅与远程计算机1444一起示出了存储器存储设备1446。远程计算机1444逻辑上通过网络接口1448与计算机1412相连，然后通过通信连接1450物理地连接。网络接口1448包含如本地网(LAN)和广域网(WAN)这样的通信网络。LAN技术包括光纤分布式数据接口(FDDI)、铜缆分布式数据接口(CDDI)、以太网/IEEE 802.3、令牌环/IEEE 802.5等等。WAN技术包括，但不限于，点对点链路、像综合业务数字网(ISDN)及其变化的电路交换网络、分组交换网络、以及数字用户线(DSL)。Computer 1412 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer 1444 . Remote computer 1444 may be a personal computer, server, router, network PC, workstation, microprocessor-based device, peer-to-peer device, or other common network node, etc., which typically includes many or all of the elements described above with respect to computer 1412 . For simplicity, only memory storage device 1446 is shown with remote computer 1444 . Remote computer 1444 is logically connected to computer 1412 through network interface 1448 and then physically connected through communication link 1450 . Network interface 1448 includes communication networks such as local network (LAN) and wide area network (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5, and more. WAN technologies include, but are not limited to, point-to-point links, circuit switched networks like Integrated Services Digital Networks (ISDN) and variations thereof, packet switched networks, and Digital Subscriber Lines (DSL).

通信连接1450指用于将网络接口1448连接到总线1418的硬件/软件。虽然为说明的清晰性，示出通信连接1450在计算机1412内部，但是其也可在计算机1412外部。连接到网络接口1448所需的硬件/软件包括(仅示例性的)内部和外部技术，如调制解调器(包括常规电话级调制解调器、电缆调制解调器、DSL调制解调器、功率调制解调器、ISDN适配器以及以太网卡。Communications connection 1450 refers to the hardware/software used to connect network interface 1448 to bus 1418 . While communication link 1450 is shown internal to computer 1412 for clarity of illustration, it could also be external to computer 1412 . The hardware/software required to connect to network interface 1448 includes (for example only) internal and external technologies such as modems (including conventional telephone grade modems, cable modems, DSL modems, power modems, ISDN adapters, and Ethernet cards.

图15是可与本发明交互的示例计算环境1500的原理框图。系统1500包括一个或多个客户机1510。客户机1510可以是硬件和/或软件(如线程、进程、计算设备)。系统1500也包括一个或多个服务器1530。服务器1530也可以是硬件和/或软件(如线程、进程、计算设备)。服务器1530可容纳线程，以如通过使用本发明而执行变换。客户机1510和服务器1530之间的一个可能通信可以是以适合在两个或多个计算机进程中传送的数据分组的形式。系统1500包括通信框架1550，其可用来促进客户机1510和服务器1530之间的通信。客户机1510可操作上连接到用于储存对客户机1510本地的信息的一个或多个客户机数据存储1560。同样的，服务器1530操作上可连接到可用来储存对服务器1530本地的信息的一个或多个服务器数据存储1540。FIG. 15 is a functional block diagram of an example computing environment 1500 that can interact with the present invention. System 1500 includes one or more clients 1510 . Client 1510 can be hardware and/or software (eg, thread, process, computing device). System 1500 also includes one or more servers 1530 . Server 1530 can also be hardware and/or software (eg, thread, process, computing device). Servers 1530 may house threads to perform transformations as by using the present invention. One possible communication between client 1510 and server 1530 may be in the form of data packets suitable for transfer between two or more computer processes. System 1500 includes a communications framework 1550 that can be used to facilitate communications between clients 1510 and servers 1530 . Client 1510 is operatively connected to one or more client data stores 1560 for storing information local to client 1510 . Likewise, server 1530 may be operatively connected to one or more server data stores 1540 that may be used to store information local to server 1530 .

上面已经描述的包括本发明的示例。当然，不可能为描述本发明而描述组件或方法的每个可想到的组合，但是本领域的普通技术人员会认识到，本发明的许多其它组合和排列也是可能的。因此，本发明旨在包含落入所附权利要求书的精神和范围内的所有这样的变更、修改和变化。此外，在详细描述或权利要求书中使用术语“包括或具有”的意义上，这样的术语应类似于术语“包含”为包括性的，如同“包含”在用作权利要求书中的过渡词时被解释的那样。What has been described above includes examples of the present invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the present invention, but those of ordinary skill in the art will recognize that many other combinations and permutations of the present invention are possible. Accordingly, the present invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, where the term "comprises or has" is used in the detailed description or in the claims, such terms should be inclusive like the term "comprises" as if "comprising" is used as a transition word in the claims as explained at the time.

Claims

1. data recovery system comprises:

One snapshot assembly is suitable for generating snapshot database according to source database, and wherein, said snapshot database holds sparse file, and said sparse file stores because the data that the modification of said source database is replaced;

One recovery component is suitable for through the snapshot database file copy is returned to incident before time point to utilize said snapshot database with said source database to relevant source database files; And

One directory component, whether the page or leaf that is suitable for following the tracks of in the said snapshot database is shared with source database, still has been copied to said snapshot database.

2. the system of claim 1 is characterized in that, said incident is corresponding to user error.

3. system as claimed in claim 2 is characterized in that, said recovery component comprises a reduction assembly, and it copies to said source database with said snapshot data database data.

4. system as claimed in claim 3 is characterized in that, said recovery component comprises cancels assembly, and it is stored in the affairs of opening between the startup stage of snapshot database, and adds it to recovered source database.

5. system as claimed in claim 4 is characterized in that, the said assembly of cancelling utilizes database log file to concentrate on said mistake.

6. the system of claim 1 is characterized in that, said snapshot assembly comprises a monitor assemblies, it observes said source database, and when having taken place maybe said source database be modified to the incident of specific degrees, starts snapshot creation.

7. the system of claim 1 is characterized in that, said recovery component is side by side upgraded and synchronous one or more mirror database automatically and with the recovery of said source database.

8. the system of claim 1 is characterized in that, the original value of the said source database files of sparse file representative before change of said snapshot database.

9. system as claimed in claim 8 is characterized in that, said snapshot database and said source database are shared the data that since said snapshot database is created, just do not had change.

10. data reconstruction method comprises:

Create the snapshot of the source database at a time point place;

The data that will be created office's displacement of submitting to after the said snapshot copy to the sparse file of snapshot database; And

When incident occurs; The state of said source database when creating said snapshot to revert to through copying to corresponding source database page or leaf from the page or leaf of sparse file; Wherein a directory component is used to confirm that which data is shared between source database and snapshot database, and which data is unique to said snapshot database.

11. method as claimed in claim 10 is characterized in that, said incident is a user error.

12. method as claimed in claim 11 is characterized in that, the state that reverts to said source database when creating said snapshot utilizes graphic user interface to start by the data base administrator.

13. method as claimed in claim 10 is characterized in that, also comprises the affairs that seizure is not submitted to when creating said snapshot.

14. method as claimed in claim 13 is characterized in that, also comprises the said transactional applications of not submitting in said source database.

15. method as claimed in claim 14; It is characterized in that, also comprise the searching database daily record, and said daily record is applied to said source database; So that involve in said source database, thereby be reflected in the variation that said source database before appears in said incident along the time.

16. method as claimed in claim 10 is characterized in that, also comprises automatically when said source database reduction, upgrades one or more mirror databases automatically.

17. method as claimed in claim 10 is characterized in that, said impinging upon soon when detecting the incident that possibly significantly change database created automatically.

18. method as claimed in claim 17 is characterized in that, the said installation that possibly significantly change the incident of database corresponding to new application program.

19. method as claimed in claim 10 is characterized in that, the structure of said source database when said sparse file representes to create said snapshot.