[go: up one dir, main page]

CN1256672C - Remote data flexible replication system and method - Google Patents

Remote data flexible replication system and method Download PDF

Info

Publication number
CN1256672C
CN1256672C CNB018131956A CN01813195A CN1256672C CN 1256672 C CN1256672 C CN 1256672C CN B018131956 A CNB018131956 A CN B018131956A CN 01813195 A CN01813195 A CN 01813195A CN 1256672 C CN1256672 C CN 1256672C
Authority
CN
China
Prior art keywords
data
replication
remote
unit
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB018131956A
Other languages
Chinese (zh)
Other versions
CN1457457A (en
Inventor
罗恩·麦卡彼
特蕾西·坎普
斯图亚特·W·卡德
大卫·J·斯罗德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miralink Corp
Original Assignee
Miralink Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miralink Corp filed Critical Miralink Corp
Publication of CN1457457A publication Critical patent/CN1457457A/en
Application granted granted Critical
Publication of CN1256672C publication Critical patent/CN1256672C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2069Management of state, configuration or failover
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2058Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using more than 2 mirrored copies
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2064Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring while ensuring consistency

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Methods, systems, and configured storage media are provided for flexible data mirroring. In particular, the invention provides local-remote role reversal (1506), implementation of hot standby server status through a ''media not ready'' signal (1508), several alternate buffer contents and buffering schemes, transactioning (1516), many-to-one mirroring through use of ''virtual'' remote mirroring units (1520), identification (1522) of frequently accessed data withbout application-specific knowledge but based instead on an application's logged and analyzed behavior, and use (1526) of the secondary server in a non-authoritative manner.

Description

远程资料弹性复制系统与方法Remote data flexible replication system and method

发明领域field of invention

本发明关于一种由服务器或其它计算机所进行的数字资料远程复制,以提供较佳的容错及/或事故复原能力,特别是关于其工具与技术,以增加远程资料复制的弹性。The present invention relates to a remote replication of digital data by a server or other computer to provide better fault tolerance and/or recovery from accidents, especially its tools and techniques to increase the flexibility of remote data replication.

发明的技术背景Technical Background of the Invention

美国专利第5,537,533号说明用以由主网络服务器到远程网络服务器,进行数字资料远程复制的工具与技术。依据该专利的系统包括具有主服务器接口以及主链接接口的主数据传输单元,与具有远程链接接口以及远程服务器接口的远程数据传输单元。该主链接接口包括一能够对主网络服务器而产生“预确认(Pre-acknowledgement)”的欺骗封包产生器。换言之,该系统具有“智能型缓冲区”,而可在复制资料已储存于主链接接口内的非挥发性缓冲区之后,并在说明该复制资料已储存于远程服务器的应答信息抵达之前,送给主服务器一预确认,或是“欺骗”信息。US Patent No. 5,537,533 describes tools and techniques for remote copying of digital data from a host network server to a remote network server. The system according to this patent includes a master data transfer unit having a master server interface and a master link interface, and a remote data transfer unit having a remote link interface and a remote server interface. The main link interface includes a spoofed packet generator capable of generating "pre-acknowledgment" to the main network server. In other words, the system has "smart buffers" that send data after the replicated data has been stored in a non-volatile buffer within the main link interface, but before a response message arrives stating that the replicated data has been stored on the remote server. Give the master server a pre-confirmation, or "spoof" message.

犹他州盐湖城MiraLink公司为美国专利第5,537,533号的所有人。该公司已于本申请案的前一年,将Off-SiteServer产品(Off-SiteServer为MiraLink公司的商标)行销上市。该Off-SiteServer产品包括将NovellNetWare服务器磁盘(NetWare为Novell公司的商标),透过低频宽通讯链接,以远程方式将其复制至另外一个地理性远程的服务器的技术。MiraLink Corporation of Salt Lake City, Utah is the owner of US Patent No. 5,537,533. The company has marketed the Off-SiteServer product (Off-SiteServer is a trademark of MiraLink Corporation) one year before this application. The Off-SiteServer product includes the technology of remotely copying the Novell NetWare server disk (NetWare is a trademark of Novell Corporation) to another geographically remote server through a low-bandwidth communication link.

利用资料复制法,由主网络服务器到远程置换网络服务器所进行的远程资料复制,是一功能强大的资料备份方法。远程复制可在安全距离之外由原始资料产生资料复制,并且该操作基本上是与储存原始资料同步。如发生严重事故后,如该储存于远程的资料是存放于“暖机”远程网络服务器,则该资料几乎立即可以使用,换言之,在真实或仿真的灾变后数分钟内,远程服务器即可以新的主网络服务器角色激活并执行。Utilizing the data duplication method, the remote data duplication carried out from the main network server to the remote replacement network server is a powerful data backup method. Remote replication creates copies of data from original data at a safe distance, and the operation is essentially synchronous with storing the original data. After a serious accident, if the data stored remotely is stored in a "warm machine" remote network server, the data can be used almost immediately, in other words, within a few minutes after a real or simulated disaster, the remote server can be updated. The main web server role of the server is activated and executed.

在一般的安装程序中,使用该Off-SiteServer产品会牵涉到一对Off-SiteServer机盒:其一为本地机盒,另一为远程机盒。该Off-SiteServer机盒设定以专用的硬件、韧体及/或其它软件配置的,并概述于美国专利第5,537,533号文件内。使用专用的序列线,将本地端的NetWare服务器连接到其中一个机盒。NetWare服务器本身使用Vinca适配卡(VINCA为Vinca公司的商标)。该适配卡是由“NetWare可加载式模块(NLM)”所驱动,该程序截取硬盘机请求,然后由该序列线将资料传送到本地端Off-SiteServer机盒。In the general installation procedure, using the Off-SiteServer product will involve a pair of Off-SiteServer boxes: one is a local box and the other is a remote box. The Off-SiteServer box is configured with dedicated hardware, firmware and/or other software as outlined in US Patent No. 5,537,533. Connect the local NetWare server to one of the boxes using a dedicated serial cable. The NetWare server itself uses a Vinca adapter card (VINCA is a trademark of Vinca Corporation). The adapter card is driven by "NetWare Loadable Module (NLM)". This program intercepts the request from the hard disk drive, and then transmits the data to the local Off-SiteServer box through the serial line.

该本地端Off-SiteServer机盒具有4Giga字节例如像IDE硬盘机的非挥发性缓冲区。将资料预先确认后进入此Off-SiteServer缓冲区。对于该本地端服务器的操作系统而言,会在本地端进行第二次的“复制”写入操作。实际上,该Off-SiteServer产品已由NLM接收到该资料,并存妥于本地的缓冲区内。本地端Off-SiteServer机盒会将扇区及磁道变化资料储存起来,直到该机盒可安全地将该资料传送至位于远处的远程Off-SiteServer机盒。本地端Off-SiteServer机盒的缓冲区亦为“智能型”,此因该缓冲区可储存任何在通讯链接上可区域性处理的资料。该资料会被存放于本地端Off-SiteServer机盒内,直到远程Off-SiteServer机盒已成功地将其写入远程第二服务器,并回传一确认信号予该本地端Off-SiteServer机盒。当收到该确认信号后,该本地端Off-SiteServer机盒即释放本地端原先被成功地传出的扇区/磁道/区块资料所占用的非挥发性缓冲区空间。The local Off-SiteServer box has 4Gigabytes such as non-volatile buffers like IDE hard drives. Enter the Off-SiteServer buffer after confirming the data in advance. For the operating system of the local server, a second "copy" write operation will be performed on the local side. In fact, the Off-SiteServer product has received the data by NLM and stored it in the local buffer. The local Off-SiteServer box will store the sector and track change data until the box can safely transmit the data to the remote Off-SiteServer box at a distance. The buffer of the local Off-SiteServer box is also "intelligent", so that it can store any data that can be processed regionally on the communication link. The data will be stored in the local Off-SiteServer machine box until the remote Off-SiteServer machine box has successfully written it into the remote second server, and returns a confirmation signal to the local Off-SiteServer machine box. After receiving the acknowledgment signal, the Off-SiteServer box at the local end releases the non-volatile buffer space originally occupied by the successfully transmitted sector/track/block data at the local end.

该Off-SiteServer产品使用V.35接口作为在本地端的资料输出之用。V.35是一连接至“频道服务单元/数据服务单元(CSU/DSU)”的序列式通讯标准,而后再与通讯网路介接。该远程(第二)位置处设有一第二CSU/DSU,以中继该扇区/磁道/区块资料给远程第二Off-SiteServer机盒的V.35输入接口。该远程第二Off-SiteServer机盒透过序列缆线连接到远程第二服务器内的另一张Vihca适配卡专用的序列线,输出该扇区/磁道/区块资料。接着,远程服务器资料复制与系统软件将该扇区/磁道/区块资料写入远程服务器磁盘驱动器内,并且将写入操作确认信息回复该本地端Off-SiteServer机盒。本系统一小时内可处理约300Mega字节的资料变更操作。The Off-SiteServer product uses V.35 interface for data output at the local end. V.35 is a serial communication standard connected to "Channel Service Unit/Data Service Unit (CSU/DSU)" and then interfaced with the communication network. A second CSU/DSU is provided at the remote (second) location to relay the sector/track/block data to the V.35 input interface of the remote second Off-SiteServer box. The remote second Off-SiteServer box is connected to another dedicated serial line of the Vihca adapter card in the remote second server through a serial cable, and outputs the sector/track/block data. Then, the remote server data replication and system software write the sector/track/block data into the remote server disk drive, and reply the write operation confirmation message to the local Off-SiteServer box. The system can handle about 300Mega bytes of data change operations within one hour.

Off-SiteServer智能型产品足可感测频宽是否增加或减少,且/或该通讯链接是否中断。在链接中断期间内,Off-SiteServer机盒可由本地端非挥发性智能型缓冲区将变更的资料储存起来。而当链接再度激活时,Off-SiteServer机盒即开始自动传送资料。如当可用频宽变多或少时,Off-SiteServer机盒可随时变更其输出频宽。所有上述的传输操作,也合并有标准的软件式总和误码检查侦错及校正,及/或硬件式错误校正码(ECC)错误处理功能。The Off-SiteServer is smart enough to sense if the bandwidth increases or decreases, and/or if the communication link is broken. During the link interruption period, the Off-SiteServer box can store the changed data in the local non-volatile intelligent buffer. And when the link is activated again, the Off-SiteServer box will start to transmit data automatically. For example, when the available bandwidth becomes more or less, the Off-SiteServer box can change its output bandwidth at any time. All of the above transfer operations also incorporate standard software-based error checking detection and correction, and/or hardware-based error correction code (ECC) error handling.

万一在本地端(主)NetWare服务器出现磁盘或服务器当机,则按照上述方式而连附到远程(第二)Off-SiteServer机盒的第二(远程)服务器,可具有本地端(主)服务器上所有资料的完整复制磁盘复制。该远程备份复制可被存放于本地端(主)服务器上。而在发生事故时,该第二远程服务器亦可代替本地端主服务器。可借简易的指令,以极迅速的方式执行这种第二回存与/或替代操作。In case a disk or a server crash occurs in the local (main) NetWare server, the second (remote) server attached to the remote (second) Off-SiteServer box can have a local (main) A full duplicate disk copy of all material on the server. The remote backup copy can be stored on the local (primary) server. And when an accident occurs, the second remote server can also replace the local main server. This second restore and/or replacement operation can be performed in a very rapid manner with simple instructions.

简言之,Off-SiteServer产品及其它资料复制技术,可对无论是重要工作资料或是文件,提供极具价值的容错与灾变复原能力。不过,这些既有的方案,其弹性均受到了不必要的限制。In short, Off-SiteServer products and other data replication technologies can provide valuable fault tolerance and disaster recovery capabilities for important work data or files. However, the flexibility of these existing solutions is unnecessarily limited.

例如,Off-SiteServer产品需要一特定的Vinca公司硬件及软件版本。除了Novell NetWare平台之外,Vinca公司产品这项版本要求并不支持其它操作系统/档案系统平台。同时,必要的Vinca套装方案的硬件组件,也不能与较新、较快的服务器与较大的磁盘容量相配合。For example, the Off-SiteServer product requires a specific Vinca hardware and software version. In addition to Novell NetWare platform, this version requirement of Vinca products does not support other operating system/file system platforms. Also, the hardware components necessary for the Vinca package are not compatible with newer, faster servers and larger disk capacities.

原本的Off-SiteServer产品也是针对连接一本地服务器至远程服务器而设计。在一给定时间内,只能对单一本地服务器进行复制到远程服务器。不同位置的多个服务器无法即刻复制到单一的远程处。同样地,如果某公司有多个本地服务器是使用多种操作系统及/或档案系统,那么每个执行个别操作平台的服务器,就必须复制到其相对应的远程服务器上。The original Off-SiteServer product is also designed for connecting a local server to a remote server. Only a single local server can be replicated to a remote server at a given time. Multiple servers in different locations cannot replicate to a single remote at once. Similarly, if a company has multiple local servers using multiple operating systems and/or file systems, each server running a specific operating platform must be copied to its corresponding remote server.

此外,原本的Off-SiteServer产品要在本地服务器安装NLM,并且其是设计为利用私用专属的通讯链接。传统的复制亦要求一远程服务器,以便保持所复制的信息在远程为可开机格式。In addition, the original Off-SiteServer product needs to install NLM on the local server, and it is designed to use private and dedicated communication links. Traditional replication also requires a remote server to maintain the replicated information remotely in a bootable format.

于专利申请第09/438,184号案中注记这些限制及其它项目。本申请案可提供远程资料复制的额外工具及技术,以充分发挥如母申请案所述的技术以及其它进展。These limitations and others are noted in patent application Ser. No. 09/438,184. This application may provide additional tools and techniques for remote data replication to take advantage of the techniques described in the parent application, as well as other advances.

发明目的与概述Invention purpose and overview

本发明可提供一种资料复制工具及技术,可并同运用在本专利申请发明案内或其它具体实施例中。本发明概述焦点在于未在先前所详述的工具及技术。例如,本发明可提供像是本地-远程角色互换、借一「媒体未待机」信号的热待机服务器状态实施方式、数种替代性缓冲器内容及缓冲法则、交易、借运用「虚拟」远程复制单元的多对一复制处理、无应用特定知识但另基于应用项目经登注及分析行为的频繁接取资料的识别处理,以及按未授权方式运用第二服务器等的工具及技术。而经下列说明将可更佳了解本发明其它特性及优点。The present invention can provide a data replication tool and technology, which can be used together in the invention of the patent application or in other specific embodiments. The present summary focuses on tools and techniques not previously detailed. For example, the present invention can provide things like local-remote role swap, hot standby server state implementation with a "media not standby" signal, several alternative buffer contents and buffering laws, transactions, use of "virtual" remote Many-to-one replication processing of replication units, identification processing of frequently accessed data without application-specific knowledge but additionally based on registration and analysis behavior of application items, and tools and techniques for using second servers in an unauthorized manner. Other characteristics and advantages of the present invention will be better understood through the following description.

示意图简单说明Simple schematic diagram

为说明本发明是如何获致其优点及特性,兹参考所附示意图进行本发明细部说明。该示意图仅述及本发明数项特点,惟本发明并不局限于此。In order to illustrate how the present invention achieves its advantages and characteristics, the detailed description of the present invention will be described with reference to the attached schematic diagram. The schematic diagram only describes several features of the present invention, but the present invention is not limited thereto.

图1为说明一现有的计算机网络复制的示意图,可对该技术进行调整以适用于本发明。FIG. 1 is a schematic diagram illustrating an existing computer network replication, which technology can be adapted for the present invention.

图2为说明符合本发明的计算机系统示意图,无远程服务器,但包括了一具较大缓冲区的远程复制单元。Figure 2 is a schematic diagram illustrating a computer system according to the present invention, without a remote server, but including a remote replication unit with a relatively large buffer.

图3为说明符合本发明的计算机系统示意图,包括了具有“可热交换(Hot-Swappable)”RAID单元的远程服务器,以及一具相对较小缓冲区的远程复制单元。3 is a schematic diagram illustrating a computer system according to the present invention, including a remote server with a "hot-swappable" RAID unit, and a remote replication unit with a relatively small buffer.

图4为说明符合本发明的计算机系统示意图,无远程服务器,但包括了一具相对较小缓冲区以及可热交换RAID单元的远程复制单元。Figure 4 is a schematic diagram illustrating a computer system according to the present invention, without a remote server, but including a remote replication unit with a relatively small buffer and hot-swappable RAID units.

图5为说明符合本发明的多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行一特定平台的本地服务器,以及单一具相对较小缓冲区和多个可热交换RAID单元的远程复制单元。Figure 5 is a schematic diagram illustrating a many-to-one replicating computer system consistent with the present invention, having no remote servers, but including multiple local servers executing a particular platform each with its own local replicating unit, and a single server with a relatively small buffer and multiple Remote replication unit for hot-swappable RAID units.

图6为说明符合本发明的另外多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行一特定平台的本地服务器,以及单一具相对较小缓冲区和多具个别外部储存文件名录的远程复制单元。6 is a schematic diagram illustrating another many-to-one replicating computer system consistent with the present invention, having no remote servers but including multiple local servers each with its own local replicating unit executing a particular platform, and a single server with a relatively small buffer and Multiple remote replication units with individual external storage file directories.

图7为说明符合本发明的另外多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行一特定平台的本地服务器,以及单一具相对较小缓冲区、具多个磁盘分割的单一个别外部储存文件名录、和一个同样具多个磁盘分割的可热交换RAID单元的远程复制单元。7 is a schematic diagram illustrating another many-to-one replicating computer system consistent with the present invention, having no remote servers but including multiple local servers each with its own local replicating unit executing a particular platform, and a single relatively small buffer, A single individual external storage file directory with multiple partitions, and a remote replication unit with a hot-swappable RAID unit also with multiple partitions.

图8为说明符合本发明的另外多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行不同平台的本地服务器,以及单一具相对较小缓冲区和多个可热交换RAID单元的远程复制单元。8 is a schematic diagram illustrating another many-to-one replicating computer system consistent with the present invention, having no remote server but including multiple local servers each with its own local replicating unit executing a different platform, and a single server with a relatively small buffer and multiple Remote replication unit for hot-swappable RAID units.

图9为说明符合本发明的另外多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行不同平台的本地服务器,以及单一具相对较小缓冲区和多个外部储存文件名录的远程复制单元。9 is a schematic diagram illustrating another many-to-one replicating computer system consistent with the present invention, having no remote server but including multiple local servers each with its own local replicating unit executing a different platform, and a single server with a relatively small buffer and multiple A remote replication unit for externally stored file directories.

图10为说明符合本发明的另外多对一复制计算机系统示意图,无远程服务器,但包括了各自具有其本地复制单元的多个执行不同平台的本地服务器,以及单一具相对较小缓冲区、一个具多个磁盘分割的外部储存文件名录、和一个同样具多个磁盘分割的可热交换RAID单元的远程复制单元。Figure 10 is a schematic diagram illustrating another many-to-one replicating computer system consistent with the present invention, having no remote server but including multiple local servers each with its own local replicating unit executing a different platform, and a single, relatively small buffer, a An external storage file directory with multiple partitions, and a remote replication unit that is also a hot-swappable RAID unit with multiple partitions.

图11为说明符合本发明的另外一对多复制计算机系统示意图,其中一本地服务器连接到多个本地复制单元,以便将数据复制到多个远程位置。11 is a schematic diagram illustrating another one-to-many replication computer system in accordance with the present invention, wherein a local server is connected to multiple local replication units for replicating data to multiple remote locations.

图12为说明符合本发明的另外一对多复制计算机系统示意图,其中一本地服务器连接到一个多阜(Multi-Ported)的本地复制单元,以便将数据复制到多个远程位置。12 is a schematic diagram illustrating another one-to-many replication computer system in accordance with the present invention, wherein a local server is connected to a multi-ported local replication unit to replicate data to multiple remote locations.

图13为说明本发明方法的流程图。Figure 13 is a flowchart illustrating the method of the present invention.

图14为说明一在远程复制单元、远程服务器以及RAID单元间具有双主机配置的示意图,且可执行符合本发明的切换操作。FIG. 14 is a schematic diagram illustrating a dual-host configuration among remote replication units, remote servers, and RAID units, and switching operations in accordance with the present invention can be performed.

图15为一进一步说明本发明方法的流程图。Figure 15 is a flow chart further illustrating the method of the present invention.

较佳实施例详细说明DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

本发明关于用于弹性资料复制的计算机系统及方法。本发明可提供一种非侵入式复制、有或无专属私有通讯链接的复制,以及于目的地端有或无专属服务器或另一服务器协助该远程复制单元的复制。本发明也可提供多对一资料复制,包括从位于两个或更多地理分布位置处,执行相同或不同操作系统及/或档案系统的本地服务器进行复制。此外,本发明可借由允许利用一或更多外部储存单元及/或RAID单元的各式组合来保存经复制数据以提供弹性。The present invention relates to computer systems and methods for elastic data replication. The present invention can provide a non-intrusive replication, replication with or without a dedicated private communication link, and replication with or without a dedicated server or another server at the destination to assist the remote replication unit. The present invention can also provide many-to-one data replication, including replication from local servers at two or more geographically distributed locations, running the same or different operating systems and/or file systems. Furthermore, the present invention can provide flexibility by allowing replicated data to be stored using various combinations of one or more external storage units and/or RAID units.

本发明也提供弹性资料复制的工具及技术。包含复制单元角色互换;服务器热待机模式运作;复制数据储存选项;有关于经变动数据的SCSI指令的储存与重输入;交易;虚拟复制单元;应用程序状态复原;以及资料容量再同步。这些主题参照于图15所述,而应了解到按一给定主题的适当信息,实不必然地仅出现在图15及其直接提到的文字内。The invention also provides tools and techniques for elastic data replication. Includes replica unit role reversal; server hot standby mode operation; replicated data storage options; storage and re-import of SCSI commands related to changed data; transactions; virtual replica units; application state restoration; and data volume resynchronization. These topics are described with reference to Figure 15, and it should be understood that the appropriate information on a given topic does not necessarily appear only in Figure 15 and the text directly referred to therein.

本发明可依各种方法、系统进行实施。除另明确指出,任一种具体实施例的讨论说明亦适用于其它形式的具体实施例。例如,关于本发明系统的说明,亦有助于了解用以配置设定这些系统及/或方法的发明方法,以透过这些系统传送资料以取得复制资料,且反之亦然。特别是,虽图15显示一流程图,但此并不受限于方法,而是亦有助于叙述根据本发明而配置设定的媒体及系统。The present invention can be implemented according to various methods and systems. Unless otherwise specified, the discussion and description of any specific embodiment is also applicable to other forms of specific embodiments. For example, descriptions of the systems of the present invention are also helpful in understanding the inventive methods for configuring these systems and/or methods to transmit data through these systems to obtain replicated data, and vice versa. In particular, although FIG. 15 shows a flowchart, this is not limited to the method, but is also useful in describing media and systems configured in accordance with the present invention.

计算机及网络概论Introduction to Computer and Network

图1描述一网络100,其中该本地服务器102透过传统路由104,而被复制到远程服务器106。该传统路由104并不仅限于通讯链接,而是包括调制解调器、数据传输单元,以及其它用来在该链路上传送及/或接收由此传出的资料的传统工具与技术。特别但不限,该传统路由104可包括服务器接口、链接接口以及如图1美国专利第5,537,533号及其专利所讨论内容所说明的DTUs。FIG. 1 depicts a network 100 in which the local server 102 is replicated to a remote server 106 via conventional routing 104 . The conventional router 104 is not limited to communication links, but includes modems, data transmission units, and other conventional tools and techniques for transmitting and/or receiving data transmitted over the link. In particular, but not limited to, the legacy router 104 may include server interfaces, link interfaces, and DTUs as illustrated in FIG. 1 US Patent No. 5,537,533 and its discussion.

此外,该传统路由104可包括“小型计算机系统接口(SCSI)”性能扩充器,或是标准的“储存接取网络(SAN)”连接器。这些装置需要一极高的频宽链接及最低的迟延。因距离会导致迟延问题,故其距离一般或限制于10至20英哩范围内。譬如说,在单模光纤配置下,因迟延问题故一给定的SCSI扩充器可允许资料源及目的地之间的距离约为15公里。而使用多模光纤配置,则因迟延问题使其可用距离约为其三分之二。对于这种链接,不可产生或是仅能允许低于每秒钟内极小片段的迟延或中断,或者是最多仅数秒的处理迟延而已。同样的问题发生在大型主机频道扩充器上。Additionally, the legacy router 104 may include Small Computer System Interface (SCSI) performance expanders, or standard Storage Access Network (SAN) connectors. These devices require an extremely high bandwidth link with the lowest possible latency. The distance is generally or limited to 10 to 20 miles because the distance will cause delay problems. For example, in a single-mode fiber configuration, a given SCSI extender can allow a distance of about 15 kilometers between the data source and destination due to latency issues. With a multimode fiber configuration, about two-thirds the usable distance is due to latency issues. For such links, no delays or interruptions of less than a fraction of a second per second, or processing delays of at most a few seconds, are allowed or are not allowed. The same problem happens with mainframe channel extenders.

虽然示意图的网络100符合传统工具及技术而设定来进行复制,但也可以是许多符合本发明适合加以调整并采用的网络。这种调整操作包括了各种步骤,依照本发明特定的具体实施例而定。譬如说,该调整操作可包括如果远程服务器106不再需要时,即可切断与其之间的联机,并且以符合本发明而连接的复制单元补充或替代传统路由104,由本地服务器102剥除掉复制NLM或是其它专用软件,增加更多欲复制的本地服务器,及/或以外部储存文件名录且/或“独立磁盘备援矩阵(RAID)”单元的形式,以增加远程储存量。然而,该调整操作一般至少会牵涉到增加至少一个本地复制单元,与至少一个远程复制单元,且该远程复制单元能够为进行符合本发明的操作而互连。While the schematic network 100 is set up for replication in accordance with conventional tools and techniques, there may be many networks suitable for adaptation and use in accordance with the present invention. This adjustment operation includes various steps, depending on the particular embodiment of the invention. For example, this adjustment operation may include disconnecting the remote server 106 if it is no longer needed, and supplementing or replacing the traditional routing 104 with a replicated unit connected in accordance with the present invention, stripped by the local server 102. Replicate NLM or other proprietary software, add more local servers to replicate, and/or increase remote storage capacity in the form of external storage file directories and/or "Redundant Array of Independent Disks (RAID)" units. However, the adjustment operation will typically at least involve the addition of at least one local replicating unit, and at least one remote replicating unit capable of being interconnected for operation consistent with the present invention.

在进行该调整操作之前及/或之后,网络100可透过一网关器或是类似机制,联机至其它网络108,包括LAN或WAN,或是网际网络或内部网络的局部,而构成更广域的网络。在该示意图的网络100中,本地服务器102借由通讯链接或是网络信号线路110,联机到一个或更多网络客户端112。至于其它合适的网络,则是包括了“多服务器网络”与“点对点网络”。某一特定网络内的多个服务器102与客户端112,可为单处理器、多处理器,或是团簇式处理器设备。多个服务器102与客户端112,每一个均包含一例如像随机存取内存的可寻址式储存媒体。Before and/or after performing this adjustment operation, the network 100 can be connected to other networks 108, including LAN or WAN, or part of the Internet or internal network through a gateway or similar mechanism, to form a wider area network of. In the schematic network 100 , a local server 102 is connected to one or more network clients 112 via a communication link or network signal line 110 . As for other suitable networks, "multi-server network" and "peer-to-peer network" are included. The plurality of servers 102 and clients 112 in a particular network can be single processor, multi-processor, or cluster processor devices. A plurality of servers 102 and clients 112, each comprising an addressable storage medium such as random access memory.

适合的客户端112,包括但不限,个人计算机、膝上型计算机114,个人数字助理与其它行动装置;以及工作站116。信号线路110可为双绞线、同轴缆线或光纤缆线、电话线路、卫星、微波中继、模块或AC电力线路、RF联机、网络联机、拨接式联机、例如红外线的可携式联机,及/或其它数据传输“线路”或是现有技术的通讯链路。该链接110可以传统或创新信号方式实施,特别是可由所述的一系列复制资料指令与/或数据结构而进行实施。远程服务器106可将由传统路由104所取得的复制资料,储存到所附接例如外部硬盘或RAID子系统118的储存装置之上。Suitable clients 112 include, but are not limited to, personal computers, laptop computers 114 , personal digital assistants and other mobile devices; and workstations 116 . Signal line 110 may be twisted pair, coaxial or fiber optic cable, telephone line, satellite, microwave relay, modular or AC power line, RF connection, network connection, dial-up connection, portable connection, and/or other data transmission "wires" or communication links of the prior art. The link 110 can be implemented by conventional or innovative signaling, in particular by a series of copy material instructions and/or data structures as described. The remote server 106 can store the replicated data obtained by the conventional router 104 on an attached storage device such as an external hard disk or RAID subsystem 118 .

弹性复制单元系统实施例Elastic replication unit system embodiment

图2说明符合本发明的发明系统。与先前讨论的传统方法不同的是,符合本图的系统并不需要远程服务器。本地服务器200或是其它的主机200透过本地链接202,而与本地复制单元204连接。该本地复制单元204透过旅程链接206,而与远程复制单元208连接。各个本地复制单元包括为产生“预确认”资料给本地服务器200的欺骗封包产生器,以及一为在远程存妥资料之前而保存复制资料的非挥发性资料缓冲区210。远程复制单元具有一目的地非挥发性储存装置,以存放透过旅程链接206而自本地复制单元204所接收到的复制数据。该远程复制单元可实体上与本地服务器200分隔开,距离可自不及10英哩、至少10英哩到至少100英哩。此距离仅为举例,因为本发明可充分利用旅程链接206,而符合本发明的系统并无既存距离限制。以下将对个别的复制单元,就图2至12显示实施例的弹性,以及其组件与操作概述两者进行详细讨论。Figure 2 illustrates an inventive system consistent with the present invention. Unlike the traditional approaches discussed previously, a system conforming to this diagram does not require a remote server. The local server 200 or other host 200 is connected to the local replication unit 204 through the local link 202 . The local replication unit 204 is connected to the remote replication unit 208 through a journey link 206 . Each local replication unit includes a spoofed packet generator for generating "pre-validated" data to the local server 200, and a non-volatile data buffer 210 for storing the replicated data until the data is stored remotely. The remote replication unit has a destination non-volatile storage device for storing replicated data received from the local replication unit 204 via the trip link 206 . The remote replicating unit can be physically separated from the local server 200 by a distance of from less than 10 miles, at least 10 miles, to at least 100 miles. This distance is an example only, as the present invention can take full advantage of journey link 206, and systems consistent with the present invention have no existing distance limitations. The flexibility of the embodiments shown in FIGS. 2 to 12 , as well as an overview of their components and operations, will be discussed in detail below for individual replication units.

不过,在此注意到此点或将有所帮助,即某些本地复制单元204具体实施例包括了SCSI仿真软件及/或硬件,而可让本地链接202成为SCSI联机,并借此而使得本地复制单元204对于本地服务器200或是其它的主机200而言,就像是SCSI磁盘或其它传统的SCSI装置。这可借由于本地复制单元204内使用一SCSI主机适配卡,且在目的模式下而非在一般激活模式下执行所实现。具该目的模式的合适SCSI主机适配卡,至少包括像Adaptec 2940UWQ适配卡、以及像Logic QLA-1040适配卡。同样地,本地链接202可为光纤频道连接、USB连接、大型主机频道扩充器、V.35CSU/DSU连接、IEEE 1394连接、内存型态(例如AS/400复制内存,而非磁盘)、IDE总线、PCMCIA连接、序列式连接、以太网络连接、FDDI连接,或是其它种类将磁盘与/或RAID子系统,连接到服务器的标准总线。如此,则传统的复制用硬件及/或软件,即可适用于本地服务器200内,使得被复制的资料仅被送至本地磁盘,正如同透过旅程链接206而送达远处。However, it may be helpful to note that some embodiments of the local replication unit 204 include SCSI emulation software and/or hardware to allow the local link 202 to become a SCSI connection, thereby making the local The replication unit 204 looks like a SCSI disk or other conventional SCSI device to the local server 200 or other hosts 200 . This is accomplished by using a SCSI host adapter card within the local mirroring unit 204 and executing in target mode rather than normal active mode. Suitable SCSI host adapter cards with this purpose mode include at least an adapter card like the Adaptec 2940UWQ, and an adapter card like the Logic QLA-1040. Likewise, the local link 202 can be a Fiber Channel connection, a USB connection, a mainframe channel expander, a V.35CSU/DSU connection, an IEEE 1394 connection, a memory type (eg, AS/400 replicates memory, not disk), an IDE bus , PCMCIA connection, serial connection, Ethernet connection, FDDI connection, or other standard bus that connects the disk and/or RAID subsystem to the server. In this way, conventional replication hardware and/or software can be applied to the local server 200, so that the replicated data is only sent to the local disk, just as it is sent to a remote location through the journey link 206.

与先前讨论的传统方式的长途链接不同,该旅程链接206不必为专属的私用通讯链路。虽然该种链路仍可应用在其它的具体实施例中,不过本发明仍可透过网络,或是透过使用诸如以太网络协议、FDDI、V.35、或其它数据链结协议;IP或其它网络协议;及/或UDP、TCP、或其它传输协议的一系列类似像网际网络的网络上,来提供复制单元204、208,而不必考虑该协议的路由可通性或不可通性。因此,如有必要该二复制单元204、208可相隔数十至数百英哩远。Unlike the conventional long-distance links previously discussed, the journey link 206 need not be a dedicated private communication link. Although this kind of link can still be applied in other specific embodiments, the present invention can still pass through the network, or by using such as Ethernet protocol, FDDI, V.35, or other data link protocols; IP or Other network protocols; and/or UDP, TCP, or other transport protocols on a series of networks similar to the Internet to provide the replication units 204, 208, regardless of the routing availability or non-routing of the protocol. Therefore, the two replicating units 204, 208 can be tens to hundreds of miles apart if necessary.

该旅程链接206可由传统链接104,以及作为资料取得点的欺骗性本地复制单元204所形成。然而,该旅程链接206并不强迫要求高频宽及低迟延,而传统链接104一般均有如此的要求。譬如说,与SAN不同,一种利用该旅程链接206的系统,可由资料源传送复制资料到距离无限制的目的地。该旅程链接206也可以提供频宽分享,就如同在一般网际网络或是其它广域网络上相同。此外,该旅程链接206及/或复制单元,可提供对于中断与断线具有相当高度容忍性优点的新式系统。The itinerary link 206 can be formed by the legacy link 104, and the fraudulent local copy unit 204 as the material retrieval point. However, the journey link 206 does not mandate high bandwidth and low latency, which are generally required by the legacy link 104 . For example, unlike a SAN, a system utilizing the journey link 206 can transmit replicated data from a data source to a destination of unlimited distance. The Journey Link 206 can also provide bandwidth sharing, just like on the Internet or other wide area networks. In addition, the journey link 206 and/or the replicating unit can provide a novel system with the advantage of being relatively highly tolerant to interruptions and disconnections.

该远程复制单元208拥有一大型缓冲区212。故该远程复制单元208可对于本地服务器200或是其它主机200的完整档案目录提供缓冲功能。在某些具体实施例中,本地复制单元204也拥有一大型缓冲区。例如在一个实施例里,该本地服务器200档案目录与该大型缓冲区(本地及远程)可以非挥发性储存的方式,各自容纳1Tera字节的资料。该缓冲功能可以例如于本地复制单元204或者是远程复制单元208上,使用Qlogic QLA-1040适配卡而控制可达1Tera字节数据,而大致上不需另加修改的方式来完成。因此该本地服务器200完整档案目录的影像文件(Image),即可储存在这些复制单元的缓冲区内。The remote mirroring unit 208 has a large buffer 212 . Therefore, the remote replication unit 208 can provide a buffer function for the complete file directory of the local server 200 or other hosts 200 . In some embodiments, the local replication unit 204 also has a large buffer. For example, in one embodiment, the file directory of the local server 200 and the large buffer (local and remote) can each hold 1 Tera byte of data in non-volatile storage. The buffering function can be implemented by using a Qlogic QLA-1040 adapter card to control up to 1 Tera byte of data, such as on the local replication unit 204 or the remote replication unit 208, substantially without additional modification. Therefore, the image file (Image) of the complete file directory of the local server 200 can be stored in the buffers of these replication units.

对于增加的资料复原能力,可另产生一选用的本地复制档230;普通称之为“满注”本地复制档,此因该档具有一致性即可用性,但却不一定是最新的。该本地复制操作可以各种方式实现。其中包括但不限于,利用另一个第二本地复制单元204,或是多阜式本地复制单元204的第二阜,来复制数据到一“远程”磁盘子系统,而实际上就地理性而言,该子系统接近本地主机200;在本地复制单元204内将数据分岔到单元204的磁盘仿真层之下,以此来产生另外一个经由SCSI或类似的总线上,而传送到本地附接的磁盘子系统的复制(第一份复制已透过旅程链接206传送到远程复制单元);或是利用具有本地复制单元204的传统工具及技术,来产生与维护该本地复制档230。For increased data recovery, an optional local replica 230 can be generated; commonly referred to as a "full fill" local replica, since the file is consistent, ie available, but not necessarily up-to-date. This local copy operation can be implemented in various ways. These include, but are not limited to, utilizing another second local copy unit 204, or a second scan of a multi-socket local copy unit 204, to replicate data to a "remote" disk subsystem, and in fact geographically , the subsystem is close to the local host 200; in the local replication unit 204 the data is forked below the disk emulation layer of the unit 204 to generate another one via a SCSI or similar bus to the locally attached Disk subsystem replication (the first copy has been sent to the remote replication unit via journey link 206); or use conventional tools and techniques with the local replication unit 204 to generate and maintain the local replica 230.

该复制文件230包括了服务器200档案目录的复制,以便发生硬软件错误时能够复原。但是因为该本地复制档230位于本地而非远程,故无法对自然灾变、战争、恐怖攻击行动、实体破坏、以及其它地理性位置所生的危害,而提供基本性的保护给服务器200。因此,无论该复制档230是否包含了另外一个复制单元204来实施本发明,该复制档230都并不提供如同远程复制一般相同程度的数据保护。该本地复制文件230借由路径232连接到复制单元204,而该路径可包括一如同路径104的传统链接,或者是一符合本发明的新型路径。虽然该本地复制档230并未明述于其余示意图中,但一个或多个本地复制仍是可应用于其余示意图里,以及其它符合本发明的系统。The copy file 230 includes a copy of the file directory of the server 200, so that it can be restored when a hardware or software error occurs. However, because the local copy file 230 is located locally rather than remotely, it cannot provide basic protection to the server 200 against natural disasters, wars, terrorist attacks, physical sabotage, and other hazards in geographical locations. Therefore, whether or not the replica file 230 includes another replica unit 204 to implement the present invention, the replica file 230 does not provide the same level of data protection as remote replication. The locally replicated file 230 is connected to the replication unit 204 via a path 232, which may include a conventional link like path 104, or a novel path consistent with the present invention. Although the local replica 230 is not explicitly shown in the remaining diagrams, one or more local replicas may still be used in the remaining diagrams, as well as other systems consistent with the present invention.

譬如说,其中一方法为采用Nonstop Networks Limited公司的技术或其它技术,来对两个服务器进行复制;本地复制单元被用来作为第二服务器的唯一(主)磁盘子系统。另一方法为,借着让本地复制单元成为主机200的唯一磁盘子系统,使得对该两个复制单元而言,所有的复制操作均为内部化;该本地复制文件230成为主磁盘,并且使得该远程复制档作为唯一真正的复制档。最后一项为较低保证配置,但是该项仍可以较低成本提供较高的性能。For example, one approach is to replicate two servers using Nonstop Networks Limited or other technology; the local replication unit is used as the only (primary) disk subsystem for the second server. Another approach is to internalize all replication operations for both replication units by making the local replication unit the only disk subsystem of the host 200; the local replication file 230 becomes the master disk and makes This remote copy serves as the only true copy. The last item is a lower guaranteed configuration, but this item can still provide higher performance at lower cost.

图3说明一本地服务器200与本地复制单元204间在本地链接之上进行通讯的系统。该本地复制单元204与远程复制单元308间在旅程链接206之上进行通讯。与具有足以维持从整个本地服务器200文件名录传送资料的大型非挥发性缓冲区212的远程复制单元208不同,远程复制单元308仅拥有一相当小的非挥发性缓冲区310,而让该缓冲区310保存仅约数个(例如四个)兆字节的资料。FIG. 3 illustrates a system in which a local server 200 communicates with a local replication unit 204 over a local link. The local replication unit 204 communicates with the remote replication unit 308 over the journey link 206 . Unlike the remote copy unit 208, which has a large non-volatile buffer 212 large enough to maintain data transfers from the entire local server 200 file directory, the remote copy unit 308 only has a relatively small non-volatile buffer 310, which allows the buffer 310 holds only about a few (eg, four) megabytes of data.

不过,符合图3的系统包括一远程服务器300,该服务器具有一相关的非挥发性内部或外部储存装置。未说明此点,图3显示一RAID单元312,可由远程服务器300对于该单元某点上进行控制。该RAID单元312为“可热交换”,即在该RAID单元312内,故障的磁盘可于计算机300执行期间,并且直接移除而代换的;档案系统结构与其它在替换磁盘资料上的资料,可自动安装设定完成。在某些情况下,如同图3内从RAID单元312到服务器300的箭头所示,借由例如像在服务器300上含有专属的复制软件的传统方式,该RAID单元312可被视为服务器300的一部分或者是被连接到此。However, the system consistent with FIG. 3 includes a remote server 300 with an associated non-volatile internal or external storage device. Without illustrating this point, FIG. 3 shows a RAID unit 312 that can be controlled by a remote server 300 at a certain point of the unit. The RAID unit 312 is "hot-swappable", that is, in the RAID unit 312, the failed disk can be directly removed and replaced during the execution of the computer 300; the file system structure and other data on the replacement disk data , the installation and setting can be completed automatically. In some cases, as shown by the arrow from RAID unit 312 to server 300 in FIG. part or is connected to this.

不过,该RAID单元312也可借由以下将另行详述并绘示于图14的配置1400里双主机连接,而被连接到远程复制单元308与服务器300。该双主机连接可允许由具有一被动远程服务器300、一远程RAID单元312或是其它仅作为复制用的远程磁盘子系统、与一本地复制档及/或主动提供服务予读取请求的本地主机200磁盘的第一“正常复制”状态,切换到具有一主动提供以由远程RAID单元312,或是其它远程磁盘子系统来读取资料的服务请求的远程服务器300。However, the RAID unit 312 can also be connected to the remote replication unit 308 and the server 300 through a dual-host connection in the configuration 1400 shown in FIG. 14 , which will be described in more detail below. The dual host connection may allow a local host to have a passive remote server 300, a remote RAID unit 312 or other remote disk subsystem used only for replication, and a local replica and/or actively servicing read requests. 200 A first "normal copy" state of the disk, switched to the remote server 300 with an unsolicited service request for data to be read by the remote RAID unit 312, or other remote disk subsystem.

在第一“正常复制”状态下,远程复制单元308透过例如像以太网络及/或TCP/IP联机206,接收由本地复制单元204传来的数据。如图2说明,本地链接202可为SCSI总线、USB、光纤频道或是类似的联机。透过远程链接302与远程复制单元308,该远程复制单元308将资料传送给远程服务器300,以便对可热交换RAID单元312进行后续的储存操作,或者是如果采用双主机联机1400的话,则可由远程复制单元308直接送到RAID单元312。远程链接302可为例如像SCSI总线联机,能够让远程复制单元308对于远程服务器300而言就如同一SCSI磁盘,可被远程服务器300复制到另一个“磁盘”,即RAID单元312。该远程链接302亦可为序列式、以太网络、FDDI、USB、光纤频道或是其它非私有的联机。In a first "normal replication" state, the remote replication unit 308 receives data from the local replication unit 204 via, for example, an Ethernet and/or TCP/IP connection 206 . As illustrated in FIG. 2, the local link 202 may be a SCSI bus, USB, fiber channel, or similar connection. Through the remote link 302 and the remote replication unit 308, the remote replication unit 308 transmits data to the remote server 300 for subsequent storage operations on the hot-swappable RAID unit 312, or if a dual-host connection 1400 is used, by Remote replication unit 308 directly sends to RAID unit 312 . The remote link 302 can be, for example, a SCSI bus connection, enabling the remote replication unit 308 to appear to the remote server 300 as a SCSI disk that can be replicated by the remote server 300 to another "disk", namely the RAID unit 312 . The remote link 302 can also be serial, Ethernet, FDDI, USB, Fiber Channel or other non-proprietary connections.

本地复制单元204具有一类似或等同于(除了对储存于此的特定数据之外)远程复制单元小型缓冲区310的非挥发性缓冲区。本地服务器200经预确认后,置入本地复制单元204缓冲区内。对于主服务器200来说,第二“复制”写入操作只会以本地的方式进行。事实上,本地复制单元204已收到数据,并储存于该本地缓冲区内。该本地复制单元204存妥该扇区以及磁道更动数据(或是类似的区块层级数据),一直到本地复制单元204可安全地透过旅程链接206,传送资料给远程复制单元308。本地复制单元204的智能型缓冲区,可储存任何在旅程链接206上能够当地处理的资料。这些数据会被储存在本地复制单元204上,一直到远程复制单元308已成功地将其写入远程服务器300,并且送回一确认信号给本地复制单元204为止。当收到该确认信号后,本地复制单元204将已成功地传送的扇区/磁道/区块资料片段由本地非挥发性缓冲区消除掉。不同于传统系统,服务器200及300正好相反,都不需要标准档案系统以及操作系统所需的NLM或者是特地为资料复制所设计的软件。The local mirroring unit 204 has a non-volatile buffer similar to or identical to (except for specific data stored therein) the remote mirroring unit mini-buffer 310 . After the local server 200 is pre-confirmed, it is put into the buffer of the local replication unit 204 . For the primary server 200, the second "replication" write operation will only be performed locally. In fact, the local replication unit 204 has received the data and stored it in the local buffer. The local replica unit 204 stores the sector and track change data (or similar block-level data) until the local replica unit 204 can safely transmit the data to the remote replica unit 308 via the trip link 206 . The intelligent buffer of the local replication unit 204 can store any data that can be processed locally on the journey link 206 . These data will be stored on the local replication unit 204 until the remote replication unit 308 has successfully written them to the remote server 300 and sent back a confirmation signal to the local replication unit 204 . After receiving the acknowledgment signal, the local replication unit 204 deletes the successfully transmitted sector/track/block data fragments from the local non-volatile buffer. Unlike conventional systems, servers 200 and 300 are just the opposite, neither requiring standard file systems nor NLM required by operating systems or software specifically designed for data replication.

图4说明一种具有多个如同上述,并标以如前绘图示相同代号的组件的系统。不过,图4的系统里,一远程复制单元408包括小型非挥发性缓冲区310以及大型非挥发性缓冲区;该大型非挥发性缓冲区由可热交换RAID单元312实施的,并直接连接到该远程复制单元408。而小型缓冲区310用来作为对由旅程链接206所收到的资料提供缓冲,使得数据可被确认回复到本地复制单元204,并对该数据提供缓冲直到该数据已被远程复制单元408存妥于大型缓冲区312上。此处不需要远程服务器。Figure 4 illustrates a system having a number of components as described above, and labeled with the same reference numbers as shown in the previous figures. However, in the system of FIG. 4, a remote replica unit 408 includes a small non-volatile buffer 310 as well as a large non-volatile buffer; the large non-volatile buffer is implemented by a hot-swappable RAID unit 312 and is directly connected to The remote replication unit 408 . The small buffer 310 is used as a buffer for the data received by the journey link 206, so that the data can be confirmed and returned to the local replication unit 204, and the data is buffered until the data has been stored by the remote replication unit 408 on the large buffer 312. No remote server is required here.

图5说明一些两个或多个本地服务器200写入到远程复制单元508的系统。在本图及其它图式里,应了解到以本地服务器200为参考,一般亦包括不做服务器使用的主机200在内。换言之,本发明可用以复制任何连接到本地复制单元204的主机计算机系统200。主机200最常被以服务器作例子。但是其它主机200例子包括了不为服务器多个计算机的团簇、大型主机、“储存进接网络(SAN)”或是“网络式附接储存(NAS)”资料源。该本地服务器200或是其它主机200彼此在实体上可为距离低于10英哩、至少10英哩或远达百英哩等类似地分隔。在本图所示的系统内,特定系统内每个本地服务器200均执行于相同的操作系统及档案系统平台上,但是符合图5的不同的系统亦可采用相异的平台。例如,每个服务器200均可能是这个系统的Novell NetWare服务器,而在另外的系统里,服务器200也可能是采用NT档案系统的MicrosoftWindows NT服务器。FIG. 5 illustrates some systems where two or more local servers 200 write to a remote replication unit 508 . In this figure and other figures, it should be understood that the local server 200 is used as a reference, generally including the host 200 not used as a server. In other words, the present invention can be used to replicate any host computer system 200 connected to the local replication unit 204 . The host 200 is most often exemplified by a server. But other host 200 examples include clusters of multiple computers that are not servers, mainframes, "Storage Access Network (SAN)" or "Network Attached Storage (NAS)" data sources. The local server 200 or other hosts 200 may be physically separated from each other by a distance of less than 10 miles, at least 10 miles, or as much as a hundred miles, and the like. In the system shown in this figure, each local server 200 in a specific system runs on the same operating system and file system platform, but different systems according to FIG. 5 can also use different platforms. For example, each server 200 may be a Novell NetWare server of this system, and in another system, the server 200 may also be a Microsoft Windows NT server using the NT file system.

系统里各个主机200均以SCSI、光纤频道、USB、序列式、或是其它标准储存子系统或其它外围联机202,而连接到其本地复制单元204。该本地复制单元204借由旅程链接206,连接到单一远程复制单元508。该远程复制单元508以SCSI、光纤频道、USB或是类似掌控每个本地复制单元204的控制卡。Each host 200 in the system is connected to its local replication unit 204 via SCSI, fiber channel, USB, serial, or other standard storage subsystems or other peripheral connections 202 . The local replication unit 204 is connected to a single remote replication unit 508 via a journey link 206 . The remote mirroring unit 508 hosts the control card of each local mirroring unit 204 with SCSI, Fiber Channel, USB or the like.

由本地复制单元204而来的数据,可经由SCSI、光纤频道、USB或是类似远程复制单元508内的联机,直接(即不通过远程服务器)传送到RAID单元群组512内独立的可热交换RAID储存单元312。譬如说以包含有连接到旅程链接206的以太网络卡的该部分为例子,RAID储存单元312至少对于一部分的远程复制单元508而言,可为实体上的外部装置。然而,远程复制单元508是以功能而非包装来定义。特别是,除非另有说明(如图14所述),RAID储存单元312会被认定为远程复制单元508的一部分。各个RAID储存单元312均具有一远程可开机磁盘名录,并且该资料以扇区/磁道或是区块的方式写入。该远程复制单元508也包括了一小型缓冲区310以进行确认,以及对由旅程链接206所收到的资料提供缓冲功能。Data from the local replicating unit 204 can be transferred directly (i.e., not through a remote server) to an independent hot-swappable RAID unit group 512 via SCSI, Fiber Channel, USB, or a similar connection within the remote replicating unit 508. RAID storage unit 312 . For example, the RAID storage unit 312 may be physically external to at least a portion of the remote replication unit 508 , such as the portion including the Ethernet card connected to the journey link 206 . However, remote copy unit 508 is defined in terms of functionality rather than packaging. In particular, RAID storage unit 312 is considered to be part of remote replication unit 508 unless otherwise stated (as described in FIG. 14 ). Each RAID storage unit 312 has a remote bootable disk directory, and the data is written in sectors/tracks or blocks. The remote replication unit 508 also includes a small buffer 310 for acknowledgment and buffering of data received by the journey link 206 .

图6说明类似于图5所示的系统,但是远程复制单元608将资料写入群组616内的外部可开机储存磁盘名录。在相同平台上执行的本地服务器200,实际上是写入本地复制单元204,然后再将数据写入远程复制单元608。远程复制单元608具有SCSI、光纤频道、USB或是类似的控制卡以及对应到各张本地复制单元204的可开机储存磁盘名录614。由各个本地复制单元204所传来的数据,会再从远程复制单元608,经由SCSI总线或其它的资料线路,而直接传送到所对应的储存磁盘名录614。每个磁盘名录614都有一远程可开机磁盘名录,并且该资料以扇区/磁道或是区块的方式写入。FIG. 6 illustrates a system similar to that shown in FIG. 5 , but with remote replication unit 608 writing data to an external bootable disk directory within group 616 . The local server 200 running on the same platform actually writes to the local replication unit 204 and then writes the data to the remote replication unit 608 . The remote copying unit 608 has a SCSI, Fiber Channel, USB or similar control card and a bootable disk directory 614 corresponding to each local copying unit 204 . The data transmitted from each local replication unit 204 will be directly transmitted from the remote replication unit 608 to the corresponding storage disk directory 614 via the SCSI bus or other data lines. Each disk directory 614 has a remote bootable disk directory, and the data is written in sectors/tracks or blocks.

而本系统另外大致符合本图6与其它系统的具体实施例里,也可使用单独的扇区来保存各个本地服务器200的复制资料,而不必将复制数据保存在相对应单独的磁盘614上(即如图6所示),或者是单独的RAID储存单元312(即如图5所示)。在各式的多对一系统里,或许有激活一程序,来将本身分岔出另一新的连接,并且借由IPC或其它机制来从多个复制尝试操作以锁定磁盘名录复制文件的必要。And this system roughly conforms to this Fig. 6 and in the specific embodiment of other systems in addition, also can use independent sector to preserve the duplicate data of each local server 200, and needn't duplicate data be preserved on the corresponding independent disk 614 ( That is, as shown in FIG. 6 ), or a separate RAID storage unit 312 (that is, as shown in FIG. 5 ). In various many-to-one systems, it may be necessary to activate a program that forks itself to another new connection and locks the disk directory to copy files from multiple copy attempts via IPC or other mechanisms .

图7说明一种系统,其中该远程复制单元708包含有各自的外部储存磁盘名录614,以及RAID单元312两者。被复制的资料会由该远程复制单元708存放于两个储存子系统312、614里,以提供额外有关于当有需要时该资料随时可用的保证。FIG. 7 illustrates a system in which the remote replication unit 708 includes both the external storage disk directory 614 and the RAID unit 312 . The replicated data is stored by the remote replication unit 708 in both storage subsystems 312, 614 to provide additional assurance that the data will be readily available when needed.

图7说明一种系统,其中两个或多个本地复制单元204,将所有自被直接挂载于远程复制单元708的单一大型储存磁盘名录(312或614或两者都有,依具体实施例而定)之下,各个本地服务器200所复制的资料,写入一远程复制单元708,而不必分别在如图5、6的多个远程储存单元312或614上,将被复制的数据进行切割。该远程复制单元708所使用的磁盘名录具有各个本地复制单元204所使用的扇区。每个扇区均提供一远程可开机“磁盘名录”,而且以往常方式将日期记录在扇区/磁道或区块上。FIG. 7 illustrates a system in which two or more local replicating units 204, all from a single mass storage disk directory (312 or 614 or both, depending on the embodiment) are mounted directly on a remote replicating unit 708. Depending on), the data replicated by each local server 200 is written into a remote replication unit 708, without cutting the replicated data on a plurality of remote storage units 312 or 614 as shown in Figures 5 and 6 respectively . The disk directory used by the remote copy unit 708 has sectors used by each local copy unit 204 . Each sector provides a remote bootable "disk directory", and the date is recorded on the sector/track or block in the usual way.

图7也说明另外一种系统,其中被复制的资料备分为两个或多个储存单元,且直接连接到具一储存有给定本地复制单元204被复制数据的特定储存单元的远程复制单元708。不过,在此使用的是外部磁盘614与RAID单元312的组合,而非仅仅使用RAID单元(如图5所示)或是外部磁盘(如图6所示)的系统。譬如说,外部磁盘614保有从第一本地复制单元204传来的数据,而RAID单元312则保存从第二本地复制单元204传来的数据。在该系统里,该远程复制单元708具有SCSI、光纤频道、USB或是类似对应到各张本地复制单元204的控制卡,而由本地复制单元204所传来的资料,会直接地(即不通过如服务器300的服务器)传送到个别的外部可热交换RAID储存单元312,或者是透过SCSI、光纤频道、USB或是类似的通讯线路,而传送到外部可开机式磁盘驱动器614。FIG. 7 also illustrates another system in which replicated data is backed up into two or more storage units and is directly connected to a remote replication unit that has a specific storage unit that stores the replicated data for a given local replication unit 204. 708. However, a combination of external disks 614 and RAID units 312 is used here, rather than a system using only RAID units (as shown in FIG. 5 ) or external disks (as shown in FIG. 6 ). For example, the external disk 614 holds the data transmitted from the first local replication unit 204 , and the RAID unit 312 stores the data transmitted from the second local replication unit 204 . In this system, the remote copying unit 708 has SCSI, Fiber Channel, USB or similar control cards corresponding to each local copying unit 204, and the data sent by the local copying unit 204 will be directly (that is, not to individual external hot-swappable RAID storage units 312 via a server such as server 300 , or to external bootable disk drives 614 via SCSI, Fiber Channel, USB, or similar communication lines.

图8说明与图5所讨论有关的系统。然而,在图8里的系统,本地服务器200执行于不同的平台上,正如图中各822、824与826所示的号码。当然,符合本图或其它示意图的系统,并不一定具有正好拥有三个本地服务器200以及其所对应的本地复制单元204;以本地服务器200与对应的本地复制单元204算是一对,它们可为两对或更多对。例如,一个符合图8的系统,包括了Novell NetWare服务器822以及Microsoft Windows NT服务器824,但是另外一个符合图8的系统,则包括两个Novell NetWare服务器822、826,以及一个Microsoft WindowsNT服务器824。FIG. 8 illustrates the system discussed in relation to FIG. 5 . However, in the system of FIG. 8, the local server 200 is executed on a different platform, as indicated by numbers 822, 824 and 826 in the figure. Of course, a system that conforms to this figure or other schematic diagrams does not necessarily have exactly three local servers 200 and their corresponding local replication units 204; the local server 200 and the corresponding local replication unit 204 are regarded as a pair, and they can be Two or more pairs. For example, a system consistent with FIG. 8 includes a Novell NetWare server 822 and a Microsoft Windows NT server 824, but another system consistent with FIG. 8 includes two Novell NetWare servers 822, 826, and a Microsoft Windows NT server 824.

图9说明和图6、8所讨论有关的系统。然而,与图6里的系统不同,该本地服务器200执行于不同的平台上,而与图8里的系统的不同之处,则是该远程复制单元为单元608,该单元使用外部磁盘驱动器614的群组616,而非RAID单元312的群组512。FIG. 9 illustrates the system discussed in relation to FIGS. 6 and 8 . However, unlike the system in FIG. 6, the local server 200 executes on a different platform, and unlike the system in FIG. 8, the remote replication unit is unit 608, which uses an external disk drive 614 Group 616 of RAID units 312 instead of group 512 of RAID units 312 .

图10说明与图7所讨论有关的系统。然而,符合本图10的系统,其中的本地服务器200执行于不同的平台上。正如图7所示,在一些系统中,本地复制单元204可被对映到扇区或是储存单元。当对映到扇区时,本地复制单元204可被对映到RAID单元312里的扇区、外部磁盘驱动器614的扇区,或者是对映到同时也被复制到外部磁盘驱动器614的RAID单元312里的扇区。而当本地复制单元204被对映到储存单元时,一个或多个本地复制单元204可透过远程复制单元708,将它们的数据送给对应的外部磁盘驱动器614,而一个或多个其它的本地复制单元204,则可透过远程复制单元708,将它们的数据送给对应的RAID单元312。FIG. 10 illustrates the system discussed in relation to FIG. 7 . However, in accordance with the system of FIG. 10, the local server 200 is executed on a different platform. As shown in FIG. 7, in some systems, the local copy unit 204 may be mapped to a sector or storage unit. When mapping to sectors, the local copy unit 204 can be mapped to a sector in the RAID unit 312, to a sector of the external disk drive 614, or to a RAID unit that is also copied to the external disk drive 614 312 sectors. And when the local replication unit 204 is mapped to the storage unit, one or more local replication units 204 can send their data to the corresponding external disk drive 614 through the remote replication unit 708, and one or more other The local replication unit 204 can send their data to the corresponding RAID unit 312 through the remote replication unit 708 .

图11说明一系统,其中该资料被复制到两个或更多个远程位置。就以图5-10指的是“多对一”复制系统(超过一个被复制到远程位置的本地服务器)的角度来说,这种系统可以视为如图5-10所描述系统的反例,而图1l所说明的是“一对多”的复制系统(一个本地服务器被复制到超过一个远程位置)。一般说来,该本地复制单元204被复制的数据相同,但是使用多重的本地复制单元204,即可允许透过至少一个由旅程链接206,而使得即使是某一特定本地复制单元204无法取用,也能够可以继续复制而不中断。本地链接202也都可以使用与此相同,或者是不同的连接形式。譬如说,本地链接202可为一SCSI连接,而另一本地链接202可为USB连接。而旅程链接206也可以是一致或是变化不同的。同样地,各个远程复制单元也可能具有相同的组件(即每个都是使用RAID单元312),或是在不同位置应用不同的组件。Figure 11 illustrates a system in which the material is replicated to two or more remote locations. To the extent that Figure 5-10 refers to a "many-to-one" replication system (more than one local server being replicated to a remote location), such a system can be viewed as a counterexample to the system described in Figure 5-10, And what Fig. 11 illustrates is a "one-to-many" replication system (one local server is replicated to more than one remote location). Generally speaking, the data replicated by the local replica unit 204 is the same, but using multiple local replica units 204 allows even a particular local replica unit 204 to be inaccessible through at least one route link 206 , it is also possible to continue copying without interruption. The local link 202 can also use the same or a different connection form. For example, local link 202 can be a SCSI connection and another local link 202 can be a USB connection. The journey links 206 can also be consistent or vary. Likewise, each remote replication unit may have the same components (ie, each uses RAID unit 312), or different components may be implemented in different locations.

图12说明一种系统,且类似于如图11所讨论的,该数据复制到两个或多个远程位置。然而,图12的本地复制单元204为多阜式复制单元。亦即,该单元可以类似于传统的多阜式服务器同时连接方式,同时被连接到一个以上的旅程链接206。该多阜式复制单元204由主机200,经由每一个作用中的链接而送出资料,借此协助将主机200复制到多个彼此间距离或为数英里之隔的远程位置。该多阜式本地复制单元204仅需要一个本地缓冲区,并且如同其它系统内的复制单元204,该多阜式复制单元204可选择性地包括一完整本地复制档230。Figure 12 illustrates a system, and similar to that discussed with Figure 11, the data is replicated to two or more remote locations. However, the local replication unit 204 of FIG. 12 is a multiplexed replication unit. That is, the unit can be connected to more than one journey link 206 at the same time, similar to conventional multi-server simultaneous connection. The multi-port replication unit 204 facilitates replication of the host 200 to multiple remote locations separated by distances or miles from each other by sending data from the host 200 via each active link. The multiplexed local replica unit 204 requires only a local buffer, and like replicators 204 in other systems, the multiplexed replicator 204 can optionally include a full local replica 230 .

复制单元续论Replication Unit Continuation

复制单元的组件与操作为于前述的图2-12所讨论。以下所提供的额外资料片段,并非必定要属于符合本发明的各个系统内每一个复制单元,不过该额外信息,仍有助于了解复制单元是如何提供更多的弹性给负责确保资料恰当复制的人员与企业。The components and operation of the replication unit are discussed above with respect to Figures 2-12. The additional pieces of data provided below are not necessarily pertaining to every replication unit in each system consistent with the present invention, but this additional information is helpful in understanding how replication units provide more flexibility to those responsible for ensuring that data is properly replicated people and businesses.

至少部分的复制单元能够可靠地仿真以SCSI、光纤频道、USB或是类似执行在Novell NetWare及/或Microsoft Windows NT平台上的标准服务器驱动程序的链接所连接的磁盘驱动器。同时,也可提供在其它操作系统下的SCSI、光纤频道、USB或是类似仿真程序。At least some of the replication units are capable of reliably emulating disk drives connected by SCSI, Fiber Channel, USB, or similar links to standard server drivers running on Novell NetWare and/or Microsoft Windows NT platforms. At the same time, SCSI, Fiber Channel, USB or similar emulation programs under other operating systems can also be provided.

每一个本地或远程的控制单元,均以较适当的方式进行设定,以便可透过插接的显示器、键盘及鼠标来支持I/O。某些复制单元具有网络位置,或可允许网络管理者透过远程工作站116上的浏览器或其它方式,来接取到某一经调适过的网络100上特定复制单元。Each local or remote control unit is configured in an appropriate manner to support I/O through a plugged-in monitor, keyboard and mouse. Certain replication units have network locations, or may allow a network administrator to access specific replication units on an adapted network 100 through a browser on a remote workstation 116 or otherwise.

该类复制单元为可支持“简易网络管理协议(SNMP)”形式较佳。网管人员对于本地及远程复制单元两者均具有远程访问能力。复制单元204软件可提供一监控公用程序状态的接口。特别是,每一个本地复制单元204均扮演网络代理人的角色,因为单元204可追踪对本地服务器200的读写次数、各个本地服务器200的状态、各个本地服务器200的重新激活/暖开机次数等等,并且当有必要时,产生SNMP捕捉功能。本地复制单元204亦可提供下列的资料片段给网管人员:现存于缓冲区210内的区块数、当缓冲区210满溢或超过一特定门槛值时的警示信号、服务器200激活后所传送的区块数以及服务器200激活后所接收的区块数。Such duplication units are preferably in the form of supporting "Simple Network Management Protocol (SNMP)". Network administrators have remote access to both local and remote replication units. The replication unit 204 software may provide an interface to monitor the status of the utility. In particular, each local replication unit 204 acts as a proxy for the network, because the unit 204 can track the number of reads and writes to the local server 200, the status of each local server 200, the number of reactivation/warm boot times of each local server 200, etc. etc., and when necessary, generate SNMP traps. The local replication unit 204 can also provide the following pieces of data to network administrators: the number of blocks currently stored in the buffer 210, the warning signal when the buffer 210 overflows or exceeds a specific threshold, and the information sent by the server 200 after activation The number of blocks and the number of blocks received by the server 200 after activation.

一些本地复制单元204亦可提供选择性的拨接增加功能。倘若一顾客正以拨接连线方式使用该复制单元204,且并不想一直保持联机,该单元204提供一选项以便于特定时刻经由旅程链接206传出资料。同时,也可设定本地复制单元204为当处于高流量时段内,并不允许资料传送到调整过的网络100,或者是旅程链接206的另外一部分。本地复制单元204内的缓冲区210空间应足够大,以便对这些不传送时段里由本地服务器200所收到的资料提供缓冲功能。Some local mirroring units 204 may also provide optional dial-up functionality. If a customer is using the replication unit 204 with a dial-up connection and does not want to stay connected all the time, the unit 204 provides an option to send data via the journey link 206 at specific times. At the same time, the local replication unit 204 can also be set to not allow data to be transmitted to the adjusted network 100 or another part of the journey link 206 during high traffic periods. The buffer 210 in the local replication unit 204 should be large enough to buffer data received by the local server 200 during these non-transmission periods.

一般说来,本地复制单元204根据数据传输速度、可靠性以及与现有服务器200平台的兼容性较佳配合高速的RAID磁盘子系统性能。由于实施以软件部分为主,不太可能符合这些性能目标,因此本地复制单元204最好包括特定功能的硬件在内,包括必要的韧体在内,软硬件适当的设计与建构,可由本行业技艺人士特别针对下列事项:传统复制路径104;SCSI控制器或类似SCSI、光纤频道、USB或是类似的控制器;个别为众知的子系统如缓冲区210、212、310、磁盘614与RAID单元312及其接口;如FreeBSD驱动程序的软件;以太网络以及个别为众知的“网络适配卡(NIC)”;如以太网络与TCP/IP协议的网络协议;此处所述的说明及实施例;与其它现有或这些人士后续可用的工具和技术等各项进行处理而得。Generally speaking, the local replication unit 204 preferably matches the performance of the high-speed RAID disk subsystem according to the data transmission speed, reliability and compatibility with the existing server 200 platform. Since the implementation is mostly software, it is unlikely to meet these performance goals, so the local replica unit 204 preferably includes function-specific hardware, including necessary firmware, the appropriate design and construction of which can be determined by the industry. Those in the art are particularly interested in the following: legacy copy paths 104; SCSI controllers or controllers like SCSI, Fiber Channel, USB, or the like; individual well-known subsystems such as buffers 210, 212, 310, disks 614, and RAID Unit 312 and its interfaces; software such as FreeBSD drivers; Ethernet and individually known "Network Adapter Cards (NICs)"; network protocols such as Ethernet and TCP/IP protocols; descriptions and Examples; processed with other existing or subsequently available tools and techniques, etc. to these persons.

一般说来,如果要写入本地复制单元204则需进行确认操作并写到本地缓冲区210,或者也可以透过传统路径104或其它路径,写到完整本地复制磁盘目录230,即使这种本地复制方式并未于图3到12明白叙述。对于性能来说,一般可接受借由本地复制单元204,或者是本地服务器200或两者内的RAM高速缓存的方式,来提供写入操作的缓冲功能。特别是指,可利用现有的硬件RAID单元312快取或其它SCSI、光纤频道、USB或类似快取的优点而实施的。由本地复制单元204进行的读取操作,一般由本地复制文件230提供适当的资料。Generally speaking, if you want to write to the local replication unit 204, you need to confirm and write to the local buffer 210, or you can write to the complete local replication disk directory 230 through the traditional path 104 or other paths, even if this local The replication method is not explicitly described in FIGS. 3 to 12 . For performance, it is generally acceptable to use the local replication unit 204, or the RAM cache in the local server 200, or both, to provide a write operation buffering function. In particular, implementations that can take advantage of existing hardware RAID unit 312 caches or other SCSI, Fiber Channel, USB, or similar caches. The read operations performed by the local replica unit 204 typically provide appropriate data from the local replica file 230 .

当在毁损或重开机或其它中断之后本地复制单元204又再度上线时,会自动开始由本地缓冲区210送出数据给远程复制单元208、308、408、508、608及708。本地复制单元204不可送出SCSI、光纤频道、USB或类似装置的重置信号,以避免损毁主机200的运作。写入本地复制单元缓冲区210的数据,将以“先入先出”的方式,透过网络或是其它旅程链接206,传送给远程复制单元。这可由使用TCP/IP或另外的旅程链接协议而实现。该远程复制单元以保持一完整、一致的复制档为佳,以便远程磁盘名录为可用状态,并且无论复制的同步状态为如何,均可随时由操作系统所挂载。When the local replica unit 204 comes online again after being damaged or rebooted or otherwise interrupted, it will automatically start sending data from the local buffer 210 to the remote replica units 208 , 308 , 408 , 508 , 608 and 708 . The local replica unit 204 must not send out SCSI, Fiber Channel, USB or similar device reset signals to avoid damaging the operation of the host 200 . The data written into the buffer 210 of the local replication unit will be transmitted to the remote replication unit through the network or other journey links 206 in a "first in, first out" manner. This can be achieved using TCP/IP or another journey link protocol. The remote copy unit preferably maintains a complete, consistent copy so that the remote disk directory is available and can be mounted by the operating system at any time regardless of the synchronization state of the copy.

至少在使用FreeBSD为基础的软件情形下,本地复制单元204的核心(Kernel)问题最好是不可以发生,除非是基本性的复制软硬件发生故障。本地复制单元204的设定发生错误,或是主机服务器200的任何行为,都不可导致系统产生关闭状况。如可能最好以无须重开机的方式来重新设定复制单元软件的配置;软件变更时,最好附有唯一的版本号码。因此,该软件最好是透过可由管理人员发出的系统呼叫,自行读取所有的起始信息及配置,而复制单元不会中断数据处理程序。主机服务器200不可被中断。无论远程复制单元是否上线,也无论是否可以使用网络或是其它旅程链接206的频宽,除非是本地缓冲区210已满溢,否则本地复制单元204都能接受由主机服务器200传来的写入操作为佳。At least in the case of using FreeBSD-based software, the core (Kernel) problem of the local replication unit 204 is preferably not allowed to occur unless the basic replication software and hardware fail. Errors in the setting of the local replication unit 204, or any behavior of the host server 200, cannot cause the system to shut down. If possible, it is best to reset the configuration of the replication unit software without rebooting; when the software is changed, it is best to attach a unique version number. Therefore, the software preferably reads all initial information and configuration itself through a system call that can be issued by the administrator without the replication unit interrupting the data processing program. The host server 200 cannot be interrupted. The local replica unit 204 can accept writes from the host server 200 regardless of whether the remote replica unit is online or not, and regardless of whether the bandwidth of the network or other journey link 206 is available, unless the local buffer 210 has overflowed. Operation is better.

如果本地缓冲区210已满溢,则本地复制单元204能继续保存本地复制档230(如果存在)为佳,并且最好是继续由本地缓冲区210的环型队列(Queue)销去排队的项目。然而,本地复制单元204直到使用者(一般为管理人员)的程序告知可开始接受排队项目之前,最好是暂停将其加入到对列里。最好是系统呼叫,而非重开机,可允许使用者方面的程序来关掉或再激活本地缓冲区210队列。If the local buffer 210 has overflowed, it is better that the local replication unit 204 can continue to save the local replication file 230 (if it exists), and it is preferable to continue to cancel queued items by the ring queue (Queue) of the local buffer 210 . However, the local replication unit 204 preferably suspends adding queued items to the queue until the user (typically an administrator) program informs that it can start accepting queued items. Preferably a system call, rather than a reboot, allows a user-side program to deactivate or reactivate the local buffer 210 queue.

复制单元最好可对网络或是其它旅程链接206频宽的消失与再联机进行自动侦错。例如,将本地复制单元的以太网络卡断线,然后隔天再重新接上,如果这样子只要本地缓冲区210内有足够的空间来掌握在本地复制单元204断线时所累积的数据变动,仍然可以维持零资料漏失状态,并且不需要网管人员的干预为佳。Preferably, the replication unit can automatically detect the loss and reconnection of the network or other journey link 206 bandwidth. For example, the Ethernet card of the local replication unit is disconnected and then reconnected the next day. If so, as long as there is enough space in the local buffer 210 to grasp the data changes accumulated when the local replication unit 204 is disconnected, It is still possible to maintain a state of zero data loss, and it is better not to require the intervention of network administrators.

复制单元或与其有关单元的监控软件,最好可决定该系统在先前的开机程序之后,是否已被完全关闭,以便该监控软件可决定远程复制文件已为异步的可能性。当电力中断时,本地复制单元204最好尽可能不漏失资料。因此部分的复制单元可包括一“不中断电源供应器(UPS)”。可假定有时当发生电力中断时,将RAM所缓冲的写入操作,倾送到本地复制档(如果存在)及/或本地缓冲区210内。Monitoring software for the copying unit, or a unit related thereto, preferably can determine whether the system has been completely shut down after a previous boot sequence, so that the monitoring software can determine the possibility that the remote copying of files has become asynchronous. When the power is interrupted, the local replication unit 204 preferably does not lose data as much as possible. Thus part of the duplication unit may include an "uninterruptible power supply (UPS)". It can be assumed that sometimes when a power outage occurs, write operations buffered in RAM are dumped into the local replica (if present) and/or the local buffer 210 .

在某一具体实施例中,复制单元操作系统(即FreeBSD)以只读模式由硬盘开机,以避免具FreeBSD的档案系统自身问题。将配置设定资料写到较小的扇区,并且可以两种方式存妥,一是由相同的复制单元点的信息,或者是送出SNMP警示信号,说明该复制单元已漏失配置设定资料并且会离线,一直到重新安装为止。当无法接取到该复制单元点时,即可使用该警示信号。某些具体实施例中也避开进行控制卡起始程序,因为此时磁盘驱动器无法自行操作,故可免除例如像总线重置等问题。而且,如果复制单元缓冲器已经满溢,则最好是仅需对写入操作响应确认信号并在本地端复制的,同时送出缓冲器已经满溢,并且远程复制档已不与本地复制文件同步的警示信号。In one embodiment, the replication unit operating system (ie, FreeBSD) boots from the hard disk in read-only mode to avoid problems with the file system itself of FreeBSD. Write the configuration setting data to a smaller sector, and save it in two ways, one is from the information of the same replication unit point, or send an SNMP warning signal, indicating that the configuration setting data has been missed by the replication unit and Will be offline until reinstalled. When the replication unit point cannot be accessed, the warning signal can be used. In some embodiments, the control card initialization process is also avoided, because the disk drive cannot operate by itself at this time, so problems such as bus reset can be avoided. Also, if the replication unit buffer has overflowed, it is best to only respond to the write operation with an acknowledgment signal and replicate locally, while the send buffer is already overflowing, and the remote replicated file is out of sync with the locally replicated file warning signs.

如同前述,如果可能,最好是本地复制单元204的冷启动,不会影响到主机系统200,特别是对于SCSI、光纤频道、USB或类似装置的交握(Handshaking)信号。本地复制单元的缓冲区210保留写入请求的次序,并且由本地复制单元204以与接收时相同的次序,将其传送到远程复制单元,以随时保存数据的一致性。As mentioned above, if possible, it is preferable that the cold boot of the local mirroring unit 204 will not affect the host system 200, especially for handshaking signals of SCSI, Fiber Channel, USB or similar devices. The buffer 210 of the local replication unit preserves the order of write requests, and the local replication unit 204 transmits them to the remote replication unit in the same order as received, so as to preserve data consistency at all times.

远程复制单元自本地复制单元204处,接收TCP“协议数据单元”(也称为TCP封包),并将其写入磁盘子系统(例如外部叠机614或RAID单元312),以使得该磁盘驱动器至少在逻辑上为“区块对区块”的方式与本地复制档230,如果有的话,以及与早先时的主机200磁盘目录相同。被复制的资料或为过时,但是仍必须保持一致。The remote replicating unit receives TCP "protocol data units" (also referred to as TCP packets) from the local replicating unit 204 and writes them to the disk subsystem (e.g., external stack 614 or RAID unit 312) so that the disk drives At least in a logical "block-by-block" fashion the same as the local replica 230, if any, and the host 200 disk directory earlier. The copied material may be out of date, but must still be consistent.

为达到资料复原的目的,远程复制单元软件最好具有一使用者端接口,该程序可关闭或重开复制单元软件的读取、写入、及/或远程复制文件搜寻功能,使得该远程磁盘子系统,因此也包括该复制资料,可被位于相同链接上的第二SCSI主机所接取。在远处,远程复制单元与备份主机服务器,会被附接到该分享式磁盘子系统。譬如说,远程复制单元可使用SCSI ID 6,而作为复原使用的远程服务器则用到SCSI ID 7。当远程复制单元在作复制时,远程主机不会挂载该分享式磁盘驱动器。为进行资料复原,作为切换的一部分,远程复制单元会中止存取该分享式磁盘驱动器,并且让备份主机服务器挂载。In order to achieve the purpose of data restoration, the remote copy unit software preferably has a user-side interface, and this program can close or reopen the read, write, and/or remote copy file search functions of the copy unit software, so that the remote disk The subsystem, and thus the replicated data, can be accessed by a second SCSI host on the same link. Remote replication units and backup host servers are attached to the shared disk subsystem at a remote location. For example, the remote replication unit can use SCSI ID 6, while the remote server used as a restore uses SCSI ID 7. When the remote replication unit is replicating, the remote host will not mount the shared disk drive. For data recovery, as part of the switchover, the remote replication unit disables access to the shared disk drive and allows the backup host server to mount it.

远程复制单元最好是可向使用者端报告由本地复制单元204所传来的区块总数。远程复制单元将其复制到磁盘子系统,以便磁盘名录可以在与产生本地磁盘名录的本地服务器200相同的操作系统上,被主机系统挂载。如果远程复制单元从本地复制单元204处,接收到一写到逻辑区块代号N的请求,则会将该数据写到远程复制单元磁盘子系统312或614的逻辑区块代号N的位置。为保持数据的一致性,从本地复制单元204处而来的写入请求,会依照当本地复制单元204处接收到该请求时的顺序,依次写到远程复制单元磁盘子系统312或614。Preferably, the remote replication unit can report the total number of blocks sent by the local replication unit 204 to the client. The remote copy unit copies it to the disk subsystem so that the disk directory can be mounted by the host system on the same operating system as the local server 200 that generated the local disk directory. If the remote replication unit receives a request to write to the logical block number N from the local replication unit 204 , it will write the data to the location of the logical block number N of the disk subsystem 312 or 614 of the remote replication unit. In order to maintain data consistency, write requests from the local replication unit 204 are written to the remote replication unit disk subsystem 312 or 614 sequentially in the order in which the local replication unit 204 receives the requests.

在旅程链接206上,本地复制单元204处与远程复制单元处之间的通讯可采用TCP协议,因为其特性为错误复原与传输保证能力。远程复制单元软件可当作TCP服务器;本地复制单元204则可作为远程单元的客户端。而失去网络频宽及联机最好是不会中断本地复制单元204,也不会中断远程复制单元。同样地,远程位置的资料复原操作最好也不会中断本地复制单元204。如果本地复制单元204与远程复制单元之间的联机过时或是断线,则该本地复制单元204能够尝试再联机直到重建完成较佳。然后,本地复制单元204最好可由原先中断处继续传送复制资料,或是重新开始正常的操作。On the journey link 206, the communication between the local replication unit 204 and the remote replication unit may use the TCP protocol because of its error resilience and delivery guarantee capabilities. The remote copy unit software can be used as a TCP server; the local copy unit 204 can be used as a client of the remote unit. Preferably, the loss of network bandwidth and connection will not interrupt the local replication unit 204, nor will it interrupt the remote replication unit. Likewise, data restore operations at remote locations preferably do not interrupt the local replication unit 204 . If the connection between the local mirroring unit 204 and the remote mirroring unit is stale or disconnected, the local mirroring unit 204 can try to reconnect until the re-establishment is complete. Then, the local replication unit 204 preferably can continue to transmit the replicated material from where it was originally interrupted, or restart normal operation.

新式的复制单元较原先的Off-SiteServer产品具有智能,就在于该新式的复制单元执行于以FreeBSD UNIX操作系统为基础而再行修改过的操作系统上。其中一项修正包括改换Qlogic SCSI控制器的驱动程序,以使得该卡如同成为SCSI的目标位置而非主机,让它可仿真为磁盘驱动器;也可以使用其它搭配有合适驱动程序的控制器。开机程序也被加以修改,以便在主控台上显示复制单元配置的公用程序而不是提示字符,同时操作系统核心部分也加以重新编译过。在来源端每一个复制单元204均执行于允许其可完全独立于主机服务器200的操作系统上。因此,此处提供的各项弹性复制的特点之一,即是复制单元204并不需要主机服务器200处的起始或是联机软件(在原先的Off-SiteServer产品上,该软件采用Vinca NLM的型式)。The new type of replication unit is more intelligent than the original Off-SiteServer product, because the new type of replication unit is executed on the modified operating system based on the FreeBSD UNIX operating system. One of the fixes involves changing the driver for the Qlogic SCSI controller so that the card behaves as if it were a SCSI target rather than a host, allowing it to emulate a disk drive; other controllers with appropriate drivers can also be used. The boot sequence was also modified to display the replication unit configuration utility on the console instead of a prompt, and the operating system core was recompiled. Each replication unit 204 at the source runs on an operating system that allows it to be completely independent of the host server 200 . Therefore, one of the characteristics of the elastic replication provided here is that the replication unit 204 does not require initiating or on-line software at the host server 200 (on the original Off-SiteServer product, this software uses Vinca NLM's type).

另外不同的是,复制单元204操作系统可仿真一SCSI或是其它标准磁盘或资料取得点。使得该复制单元204譬如说可被挂载于任何支持SCSI操作系统下的SCSI复制磁盘,至少包括有Microsoft Windows 95、Microsoft Windows98、Microsoft Windows NT、Novell NetWare、FreeBSD以及Linux操作系统。该磁盘仿真最好可被施加于任何执行标准磁盘操作的节点上(至少是从服务器200的观点而言),包含除了磁盘读取与磁盘写入之外,还有处理服务器200的磁盘格式化、磁盘分割、诸如扫描磁盘等的磁盘整体检查请求。Another difference is that the operating system of the replication unit 204 can emulate a SCSI or other standard disk or data access point. Make this replication unit 204, for example, be mounted on any SCSI replication disk that supports the SCSI operating system, including at least Microsoft Windows 95, Microsoft Windows 98, Microsoft Windows NT, Novell NetWare, FreeBSD and Linux operating systems. This disk emulation is preferably applied to any node that performs standard disk operations (at least from the server 200's point of view), including handling disk formatting of the server 200 in addition to disk reads and disk writes , Disk Partition, Disk Entire Check requests such as Scan Disk.

符合本发明的系统也可以本地的方式,为容错而维护完整复制磁盘目录230。由于该复制操作是以在复制单元204的软件仿真层级之下,分岔出该资料(或是进行两次写入操作)的方式进行,而该复制单元204即能够经由使用一序列式数据变动缓冲区,来维护该本地磁盘目录230。这可使得该复制单元204能够以服务器200对本地读取操作提供服务,而不会有过度的迟延,因此系统能够执行且不会造成磁盘失效问题,也不需要切分搜寻(Split-Seek)软件,因而消除掉潜在的软件兼容性问题。如此也让本新式系统可在本地磁盘复制操作下,将复制资料送回到服务器200的本地磁盘,而不必经过旅程链接206。此外,如果本地复制档230已经维护,则本地复制单元204就不需要包括欺骗封包产生器来预确认写入操作到主机200,因为本地复制档230并不会受到在旅程链接206上,与传送复制资料相关连的延误与危险的影响。Systems consistent with the present invention may also maintain a fully replicated disk directory 230 in a local fashion for fault tolerance. Since the copy operation is performed by branching out the data (or performing two write operations) under the software emulation level of the copy unit 204, the copy unit 204 can use a sequence of data changes Buffer to maintain the local disk directory 230. This can enable the replication unit 204 to use the server 200 to provide services for local read operations without excessive delay, so the system can perform without causing disk failure problems, and does not need split-seek (Split-Seek) software, thereby eliminating potential software compatibility issues. This also allows the novel system to send the copied data back to the local disk of the server 200 without going through the journey link 206 under the local disk copy operation. Furthermore, if local replica 230 is already maintained, then local replica unit 204 does not need to include a spoofed packet generator to pre-confirm write operations to host 200, because local replica 230 is not affected by the travel link 206 with the transmission Effects of delays and dangers associated with copying material.

一个符合本发明的复制单元,一般包括有操作系统软件。因此,至少某些复制单元可执行多重“主机”应用软件,以对所取得的复制资料进行操控。该系统也可以借由驱动程序及/或适当软件及/或硬件来进行扩充或缩减,以配合特殊环境的需求。处理程序可扩展到多个处理器、SCSI卡、及/或其它“智能型”装置,以处理更多的操作与负担。同样地,也可以将系统缩减下来,以降低成本而仍然可以符合较低性能环境的需求。以合适的软件,本地复制单元204可以独立智能型磁盘子系统来执行,或是为本地端容错功能,来仿真主机200操作系统失效情形。万一主机200智能型磁盘子系统损毁,该本地复制磁盘名录230可提供本地端容错功能,而作为本地复制替代者。A replication unit according to the invention generally includes operating system software. Thus, at least some of the replication units may execute multiple "host" applications to manipulate the acquired replicated material. The system can also be expanded or reduced by drivers and/or appropriate software and/or hardware to meet the needs of special environments. The process can be extended to multiple processors, SCSI cards, and/or other "smart" devices to handle more operations and workloads. Likewise, systems can be scaled down to reduce cost while still meeting the needs of lower performance environments. With suitable software, the local replication unit 204 can be implemented as an independent intelligent disk subsystem, or as a local fault tolerance function to simulate the failure of the host 200 operating system. In case the intelligent disk subsystem of the host 200 is damaged, the local replica disk directory 230 can provide the fault tolerance function of the local side and serve as a substitute for the local replica.

本系统维持远程位置的一致性与可用性,一部分要靠一个智能型缓冲区210,已先进先出的方式来维持与送出资料。如此,数据可以其经由本地复制单元204的仿真层被接收到时相同的次序,被送往远程位置。由于封包式资料并不一定会与送出时相同的次序抵达目的地,故也可使用序号及/或时间戳记。The system maintains consistency and availability at remote locations, in part by an intelligent buffer 210 that maintains and sends data in a first-in, first-out manner. In this manner, data may be sent to the remote location in the same order as it was received via the emulation layer of the local replication unit 204 . Since packetized data does not necessarily arrive at the destination in the same order as it was sent, sequence numbers and/or timestamps may also be used.

当停机关闭事件发生时,有些具体实施例会采用下述环型缓冲区及其它装置等方法来保护资料。除了以Qlogic卡作为磁盘目的地仿真器之外,本地复制单元拥有两个透过一本地SCSI磁盘控制器而附接的磁盘系统。一磁盘其上包含有主机操作系统(即FreeBSD),以及相关的公用程序与复制单元管理软件。该磁盘也可作为缓冲区210磁盘。另外一个附接到该复制单元的磁盘系统,为至少与被复制的主机210磁盘一样大小,并且作为主机210磁盘的本地复制文件。When a shutdown event occurs, some embodiments employ ring buffers and other devices described below to protect data. In addition to using the Qlogic card as the disk destination emulator, the local replication unit has two disk systems attached via a local SCSI disk controller. A disk contains the host operating system (ie, FreeBSD) on it, as well as related utilities and replication unit management software. This disk also serves as the buffer 210 disk. Another disk system attached to the replication unit is at least as large as the host 210 disk being replicated and serves as a local copy of the host 210 disk.

SCSI资料由Qlogic卡读出,并且在核心部分依照读写请求而进行求值。由Qlogic卡而来的读取请求最好是以本地复制磁盘230处理的,而不必传经网络206。写入指令则是尽快地直接复制到本地复制磁盘230,并确认到主机系统200(但是不一定要预确认),同时加到缓冲区磁盘或是非挥发性RAM的环型缓冲区内。SCSI data is read by the Qlogic card and evaluated in the kernel according to read and write requests. Read requests from Qlogic cards are preferably processed locally on the replica disk 230 rather than traveling across the network 206. The write command is directly copied to the local copy disk 230 as soon as possible, and confirmed to the host system 200 (but not necessarily pre-confirmed), and added to the buffer disk or the ring buffer of the non-volatile RAM.

每次当一区块要被写入该环型缓冲区内时,实际上是依序写进两个区块,一个是实际会被传送的资料区块,另一个是队列目前尾端指针的时间戳记,或再包括如LBN(逻辑区块数)的数据。后者区块即为所谓的“超资料(Meta-Data)”区块。这种方法对使用空间不具效益,但是该法可以减低所需要的磁盘读写次数而保持队列指针。如果RAM可用,也可以保留至少一份复制,或是将整个该环型缓冲区存入非挥发性RAM之内的方式,来维持队列指针。有一种同时节省空间及时间的方法,是一次将一大块的资料写入该环型缓冲区,将区块送入内存内缓冲,一直累积到足以执行写入操作为止。这可允许该超数据区块能被多个数据区块所使用,以减少磁盘写入操作次数并且节省磁盘空间。Every time when a block is to be written into the ring buffer, two blocks are actually written in order, one is the actual data block to be transmitted, and the other is the current end pointer of the queue. Timestamp, or further data such as LBN (Logical Block Number). The latter block is the so-called "Meta-Data" block. This method is not space efficient, but it reduces the number of disk reads and writes required while maintaining queue pointers. Queue pointers can also be maintained by keeping at least one copy if RAM is available, or by storing the entire ring buffer in non-volatile RAM. One way to save both space and time is to write a large chunk of data to the ring buffer at a time, buffering the chunks in memory until enough is accumulated to perform the write operation. This allows the hyperblock to be used by multiple data blocks, reducing the number of disk write operations and saving disk space.

当发生停机关闭事件并重开机时,可由搜寻该超数据段落内最近的时间戳记,来寻得队列的起始端,然后用该超数据段落来定出尾端指针的位置。例如,这可由执行一项二元搜寻法而实现。由于是以环型方式来实施缓冲区,故不必实际地将被传输的资料由缓冲区移除(即将其删除或归零);将尾端指针递增就可以做到。当头端指针比该尾端指针小于1时,即可侦知缓冲区满溢的状态。指针是指出该环型缓冲区的位置,而不是指向缓冲区的资料值(此为一矩阵而非链接表)。When a shut down event occurs and the machine is restarted, the head end of the queue can be found by searching the latest time stamp in the hyperdata section, and then the position of the tail pointer can be determined by using the hyperdata section. For example, this can be achieved by performing a binary search method. Since the buffer is implemented in a circular fashion, it is not necessary to actually remove the transferred data from the buffer (ie, delete or zero it); this can be done by incrementing the tail pointer. When the head pointer is less than 1 than the tail pointer, the state of buffer overflow can be detected. The pointer points to the location of the ring buffer, not to the data values of the buffer (this is a matrix rather than a linked list).

因为既然有在系统关闭之前最近的秒数,而该秒数即已足够决定所写入的最后一个区块位置,故也有可能不需要保留该64位的时间戳记。例如,假设四个区块在同一秒钟被写入,且拥有相同的时间戳记。那么,由于此位有序式队列,故按该时间戳记最后一个区块为最后写入的那一个。如果时间戳记耗去太多的计算资源,那么一较简易的计数器或已足够,虽然不到公元2038年就会跑完一圈。该队列缓冲区的大小,会随着终端使用者的资料变化率,以及客户需求的时间长度而变,以经得起网络206中断的问题。该队列缓冲区可小到仅仅只有数百个Mega字节,或是大到与被复制的主机磁盘目录相同的容量。Since there is a recent number of seconds before the system shutdown, which is sufficient to determine the last block location written, it may not be necessary to keep the 64-bit timestamp. For example, suppose four blocks were written in the same second and have the same timestamp. Then, because of this bit-ordered queue, the last block according to the timestamp is the last written one. If time stamping consumes too many computing resources, then a simpler counter may suffice, although it will complete the lap in less than 2038 AD. The size of the queue buffer will vary with the end user's data change rate and the length of time required by the client, so as to withstand the problem of network 206 interruption. The queue buffer can be as small as a few hundred Megabytes, or as large as the host disk directory being replicated.

缓冲区的大小并未有既定的最低或最高限制,并且当预期到旅程链接206上会有高资料速度变化,及频繁的冗长中断发生时,缓冲区或将需要比被复制的主机磁盘目录容量还要大。There is no set minimum or maximum limit on the size of the buffer, and when high data rate changes and frequent lengthy interruptions are expected on the journey link 206, the buffer will likely need to be larger than the capacity of the host disk directory being replicated Even bigger.

一单独的处理程序,可在使用者端或是系统端执行,由该环型缓冲区读取区块,并且透过网络206将其传送给远程复制单元。该传输程序可随时知会该队列处理程序,要去传送程序的目前指针位置,以及能够观察时间戳记以便决定何时该队列为净空状态。如果存放在该超资料内的尾端指针仅略为过时,则仍为可接受,因为发生最坏的状况时,只要当系统重新激活时,该重传区块数不会累积成为过量,系统即会再度重传其已传出资料区块个数的资料区块。最好是当服务器激活时,该传输程序也可以决定区块数。在某些情形下,可预先假定该缓冲区将可对整个主机磁盘容量进行缓冲。在这种“无害”的哲学下,最好是不冒任何减缓SCSI总线性能的风险,并且仅仅将这些无法置入已满溢缓冲区的资料倾列出来,并通知使用者端来监控该事件程序。A separate process, which can be executed on the client side or the system side, reads the blocks from the ring buffer and sends them over the network 206 to the remote replication unit. The transfer program can keep the queue handler informed of the current pointer position of the transfer program, and can observe timestamps to determine when the queue is clear. If the tail pointer stored in the hyperdata is only slightly out of date, it is still acceptable, because in the worst case, as long as the number of retransmitted blocks does not accumulate into excess when the system is reactivated, the system will Will retransmit data blocks with the number of data blocks that have been sent out again. Preferably when the server is activated, the transfer program can also determine the number of blocks. In some cases, it can be presupposed that the buffer will be able to buffer the entire host disk capacity. Under this "do no harm" philosophy, it's best not to risk any slowdown of the SCSI bus performance, and just dump the data that doesn't fit into the overflowed buffer, and notify the client side to monitor it. event program.

为尝试减低重送区块个数,本系统可对本地复制档检查写入操作,且仅仅当操作不同时才会将其加入环型缓冲区,以避免任何怠惰的写入操作。这可由维护磁盘上每个LBN检查总和的杂凑表(Hash Table)来完成;而其取舍即在于处理器花时间在计算检查总和,及内存或额外的磁盘操作。In an attempt to reduce the number of resent blocks, the system checks writes against the local replica and only adds them to the ring buffer if they differ, avoiding any lazy writes. This can be done by maintaining a hash table (Hash Table) that checks the sum of each LBN on disk; the trade-off is that the processor spends time calculating the check sum, and memory or additional disk operations.

方法概论Methodology

图13及15说明本发明有关远程资料复制的方法。某些方法包括安装复制单元的步骤;为简化起见,这些步骤就整体合并为安装步骤1300。譬如说,当进行图2到12任何一种的系统安装时,系统整合者、复制设备贩售者、与管理者可被授权来执行步骤1300所示的部分或是全部。本发明其它方法还包括传输资料给一个或多个复制单元的步骤;为简化起见,这些步骤就整体合并为传输步骤。这些传输步骤可在安装者的授权之下以测试资料来进行,以作为安装步骤1300的一部分,但是这些步骤也可依照符合本发明的系统的正常使用者要求,以例行性的方式用对于工作极为关键的资料来执行。13 and 15 illustrate the method of the present invention related to remote data replication. Some methods include the step of installing a replication unit; these steps are collectively combined as installing step 1300 for simplicity. For example, system integrators, replica device vendors, and administrators may be authorized to perform some or all of the steps shown in step 1300 when performing any of the system installations shown in FIGS. 2 to 12 . Other methods of the present invention also include the step of transmitting data to one or more copying units; for simplicity, these steps are collectively combined as a transmitting step. These transmission steps may be performed with test data under the authorization of the installer as part of the installation step 1300, but these steps may also be used in a routine manner in accordance with normal user requirements of systems consistent with the present invention. Work is extremely critical data to perform.

在联机步骤1304中,至少一个服务器200会被连接上至少一个本地复制单元204。正如前述,该联机可为SCSI、光纤频道、USB或是其它标准磁盘子系统总线的形式。由于该本地复制单元204可仿真磁盘子系统,故在步骤1304进行联机基本上是与将磁盘子系统连接到服务器200相同,至少由服务器200的观点是如此。特别是不再需要安装特殊的NLM或其它复制软件。In the connecting step 1304 , at least one server 200 is connected to at least one local replication unit 204 . As previously mentioned, the connection may be in the form of SCSI, Fiber Channel, USB, or other standard disk subsystem bus. Since the local replication unit 204 can emulate a disk subsystem, connecting at step 1304 is basically the same as connecting the disk subsystem to the server 200 , at least from the perspective of the server 200 . In particular it is no longer necessary to install special NLM or other replication software.

在联机步骤1306中,至少一个本地复制单元204会被连接上至少一个相对应到的旅程链接206上。依状况而定,可能会牵涉到许多操作。例如,如果旅程链接206包括一个局域网络,那么本地复制单元204可以像其它的节点般连接到该网络上;也可以安装SNMP支持。如果旅程链接206包括一个由本地复制单元204发出的拨接式联机,那么也可以设定拨接式联机的参数。同样地,如果旅程链接206包括一个例如T1线路的专属的私用通讯线路,那么可使用相似的操作来进行联机。In the connecting step 1306 , at least one local replication unit 204 is connected to at least one corresponding journey link 206 . Depending on the situation, many operations may be involved. For example, if journey link 206 comprises a local area network, then local replication unit 204 can be connected to the network like any other node; SNMP support can also be installed. If the journey link 206 includes a dial-up connection issued by the local replication unit 204, the parameters of the dial-up connection can also be set. Likewise, if journey link 206 includes a dedicated private communication line, such as a T1 line, then similar operations can be used to make the connection.

在联机步骤1308里,至少一个远程复制单元208、308、408、508、608或708会被连接到至少一个相对应到的旅程链接206上。一般这可由与步骤1306中本地复制单元204联机的相同方式而实现。不过,在特定的具体实施例中,当远程复制单元作为一个TCP服务器时,本地复制单元204会成为远程复制单元的客户端。如此一来,在这种的实施例里,联机步骤1306连接到TCP客户端,而联机步骤1308则连接到TCP服务器。In the connecting step 1308 , at least one remote replication unit 208 , 308 , 408 , 508 , 608 or 708 is connected to at least one corresponding journey link 206 . Typically this can be accomplished in the same manner as the local replication unit 204 was brought online in step 1306 . However, in certain embodiments, when the remote replication unit acts as a TCP server, the local replication unit 204 becomes a client of the remote replication unit. Thus, in such an embodiment, the connecting step 1306 is connected to a TCP client, and the connecting step 1308 is connected to a TCP server.

在测试步骤1310里,会在复制单元上执行测试。这些测试可包括例如像是以RAID单元性能,来对本地复制单元204的输出性能进行比较;由远程位置重新复制数据回到本地位置;将不当配置信息放回本地复制单元204,然后校正该信息;本地复制单元204重新开机;切断旅程链接206;中断本地复制单元204电源供应;中断远程复制单元电源供应;将本地复制单元204的缓冲区210满溢;以及其它测试。特别但不限于,该测试步骤1310可牵涉到执行一个或多个本文“测试组合”章节的测试。该测试步骤1310也牵涉到传输下述与步骤1302有关的资料,但为简化起见,测试在图13中仅以单独步骤示之。In a testing step 1310, testing is performed on the replica unit. These tests may include, for example, comparing the output performance of the local replica unit 204 with RAID unit performance; re-replicating data from a remote location back to the local location; putting improperly configured information back into the local replica unit 204, and then correcting the information ; reboot the local replication unit 204; disconnect the journey link 206; interrupt the power supply to the local replication unit 204; interrupt the power supply to the remote replication unit; overflow the buffer 210 of the local replication unit 204; and other tests. In particular but without limitation, this testing step 1310 may involve performing one or more of the tests of the "Test Combinations" section herein. This test step 1310 also involves the transmission of data related to step 1302 described below, but for simplicity, the test is only shown as a single step in FIG. 13 .

传输步骤1302也可以包括传输步骤1312,该步骤由服务器200透过标准总线,传送资料给本地复制单元204。这是可实现的,因为本发明与传统路径104不同,可提供能够仿真一磁盘或RAID子系统的复制单元。The transmitting step 1302 may also include a transmitting step 1312, in which the server 200 transmits the data to the local replication unit 204 through the standard bus. This is achievable because the present invention, unlike conventional path 104, provides replication units capable of emulating a disk or RAID subsystem.

在传输步骤1314,被复制的资料是透过旅程链接206传送出去。如同前述,这可如同传统路径104般以一专属链接的方式实现,但是也可借由例如像以太网络、及/或TCP、及/或其它开放式标准协议而完成,其中包括该相关的类似像局域网络及/或网际网络的传统网络架构基础建设。In the transmission step 1314, the copied data is transmitted via the journey link 206. As before, this can be done as a dedicated link like conventional path 104, but can also be done via, for example, an open standard protocol like Ethernet, and/or TCP, and/or other open standard protocols, including the related similar Traditional network architecture infrastructure like LAN and/or Internet.

在某些具体实施例里,该复制资料会被本地复制单元204登录上时间戳记,以维持一序列记录,其中为早先被复制的资料区块,并且也将资料联系到一特定时点上。这会附带有足够大的远程及/或本地资料储存装置,以掌握一个或多个复制磁盘名录,再加上对该磁盘名录的扇区/磁道/区块层级的递增变化量的快照(Snapshots),而不是仅仅掌握目前的复制磁盘名录复制。在较合宜的实施例中,只需要一个快照。该单一快照即可提供一基准线,而后续的变化即登入日志,以便让在任何所预设的时点上(依照该日志的时间细密程度而定)的磁盘名录状态都可以被复原。该日志可为任意大小,如有必要可另外加储存空间以维持该档案,当然也可以设计为固定大小的FIFO环型缓冲区,当原先日志缓冲区为满溢后,旧项目即被新项目给覆写。一般说来,合适重复制的软件,再加上快照与(如有必要)递增变化值,在稍后就可用以复原该存在于早先某一特定时点的复制磁盘名录。In some embodiments, the replicated data is time-stamped by the local replication unit 204 to maintain a sequential record of previously replicated data blocks and also associate the data to a specific point in time. This would be accompanied by remote and/or local data storage devices large enough to hold one or more replicated disk directories, plus snapshots of incremental changes to the disk directory's sector/track/block level (Snapshots ) instead of just grasping the current replication disk directory replication. In a more convenient embodiment, only one snapshot is required. This single snapshot provides a baseline, and subsequent changes are logged so that the state of the disk directory at any preset point in time (depending on the time granularity of the log) can be restored. The log can be of any size. If necessary, additional storage space can be added to maintain the file. Of course, it can also be designed as a fixed-size FIFO ring buffer. When the original log buffer is full, the old items will be replaced by new items. to overwrite. In general, suitable replication software, coupled with snapshots and (if necessary) incremental changes, can later be used to restore the replicated disk directory as it existed at a specific point in time earlier.

在传输步骤1316里,复制资料被传送到无服务器的复制单元上。该配置可如图2所说明。远程复制单元并非传统服务器,虽然具有与其相同的硬件及功能。该服务器可提供比复制单元更多的一般功能;复制单元专注于有效地提供基本上为连续、近乎实时性的远程资料复制。而远程复制单元的行为,就以透过旅程链接206取得资料这点来说,即近似一远程复制服务器,但除此之外则是非常类似一经挂载的磁盘。特别是,该远程复制单元的行为对第二服务器而言,如果有附接上的话,会类似一磁盘或是RAID单元。如果确有必要重新复制,远程复制单元是不需要第二服务器,透过旅程链接206米将资料重新复制回到该本地服务器200。In a transmission step 1316, the replicated material is transmitted to the serverless replication unit. This configuration can be illustrated in Figure 2. A remote replication unit is not a traditional server, although it has the same hardware and features. The server can provide more general functionality than the replication unit; the replication unit is focused on efficiently providing essentially continuous, near real-time replication of remote data. The behavior of the remote replica unit is similar to that of a remote replica server in terms of accessing data via the journey link 206, but otherwise very similar to a mounted disk. In particular, the remote replication unit behaves like a disk or RAID unit to the second server, if one is attached. If re-replication is really necessary, the remote replication unit does not need a second server, and re-replicates the data back to the local server 200 through the journey link 206m.

在数据由本地复制单元204传送到目的地的远程复制单元之后,该远程复制单元可进行任何处理。例如,远程复制单元可将所接收到的数据封包,仅转换为可被写到单一外部磁盘驱动器614的资料区块。该远程复制单元也可以将所接收到的数据封包,转换为数据区块,然后将其写到内部磁盘子系统及或磁盘扇区上。该远程复制单元也可以接收数据封包,将其转换为磁盘数据区块,然后借由内部剥除(Striping)软件(RAID),将该资料剥除到一“非智能型”磁盘子系统的多个磁盘之上,并以外部数据子系统的型式写入RAID单元312。这种由封包转换为磁盘区块数据再转换为剥除(RAID)资料的相同过程,并附带有储存到外部非智能型磁盘子系统的程序,可以借硬件控制卡及相关驱动程序处理。该远程复制单元也可以写到外部智能型RAID子系统312,其磁盘区块以数据流的形式被写入磁盘子系统,并被智能型RAID子系统进行剥除处理。After the data is transferred by the local replication unit 204 to the destination remote replication unit, the remote replication unit may perform any processing. For example, the remote mirroring unit may convert received data packets into only data blocks that can be written to a single external disk drive 614 . The remote replication unit can also convert the received data packets into data blocks, and then write them to the internal disk subsystem and/or disk sectors. The remote replication unit can also receive data packets, convert them into disk data blocks, and then strip the data to multiple disk subsystems of a "non-intelligent" disk subsystem by means of internal stripping (Striping) software (RAID). disks and writes to RAID unit 312 in the style of an external data subsystem. This same process of converting packets into disk block data and then into stripped (RAID) data is accompanied by a program stored in an external non-intelligent disk subsystem, which can be processed by a hardware control card and related drivers. The remote replication unit can also be written to the external intelligent RAID subsystem 312, and its disk blocks are written into the disk subsystem in the form of data streams, and stripped by the intelligent RAID subsystem.

不直接将所收到的数据立刻写到复制单元312或614,首先远程复制单元可将数据写入远程缓冲区,然后送回某种资料“签名”形式的ACK确认信号(例如像总和检查或是CRC值)给本地复制单元。该本地复制单元接着会按照该签名确认结果,要不“确认-确认(ACK-ACK)”,要不就是“确认-回拒(ACK-NAK)”;只有当收到由本地复制单元所传来的ACK-ACK时,远程复制单元才会由远程缓冲区接收数据而写到远程复制档。在该种具体实施例里,如果该远程复制单元不仅接收数据而且也须由本地复制单元接收原始签名,则倘若于原始签名并未正确地认证,则该远程复制单元会回拒该原始数据传输。Instead of directly writing the received data to the replication unit 312 or 614 immediately, first the remote replication unit can write the data into the remote buffer, and then send back an ACK confirmation signal in the form of some material "signature" (such as a sum check or is the CRC value) to the local replication unit. The local replication unit will then confirm the result according to the signature, either "acknowledgement-acknowledgment (ACK-ACK)" or "acknowledgement-rejection (ACK-NAK)"; Only when there is an incoming ACK-ACK, the remote copy unit will receive data from the remote buffer and write to the remote copy file. In this embodiment, if the remote replicating unit not only receives the data but also receives the original signature from the local replicating unit, then the remote replicating unit rejects the original data transmission if the original signature is not properly authenticated .

此外,也可以不同方式来确认资料。例如,可将远程复制单元及本地复制单元视为端点,而不是彼此的子系统。在这种情况下,该远程复制单元方面,ACK信号由该远程复制单元自己发出(或许是由其高速缓存发出);在该本地复制单元方面,ACK信号也是由该本地复制单元自己发出(最好是由其高速缓存发出);但是在该本地复制单元方面,ACK信号并不需要从远程复制单元而来,而在送出ACK信号给主机之前,仅由旅程链接的本地端即可。在该本地复制单元在删除本地缓冲区的数据区块之前,仍需谨慎地等待接收由远程复制单元送来的ACK信号,不过这可在确认给主机之后再进行。In addition, the data can also be confirmed in different ways. For example, a remote replicated unit and a local replicated unit may be considered endpoints rather than subsystems of each other. In this case, on the remote copying unit, the ACK signal is sent by the remote copying unit itself (perhaps by its cache); on the local copying unit, the ACK signal is also sent by the local copying unit itself (at last preferably sent by its cache); but on the part of the local copy unit, the ACK signal does not need to come from the remote copy unit, but only by the local end of the journey link before sending the ACK signal to the host. Before the local replication unit deletes the data block in the local buffer, it still needs to carefully wait for the ACK signal sent by the remote replication unit, but this can be done after confirming to the host.

如本系统内另有至少一个第二服务器,则可进行额外的步骤。例如,远程复制单元可直接将资料透过服务器的网络操作系统中继传送给远程服务器300。该操作系统可为主动或被动状态。在这两种情形下,透过联机302所接收到的资料,都可经由服务器300的操作系统被写到一个内部的本地磁盘子系统。这种方式对每一个在远程位置的操作系统都需要一特定软件。该远程复制单元也可以采用网际网络式的资料窗口来在远程复制单元与第二服务器300之间传送及接收资料。该资料窗口可为由一附接外加式(Plug-In)扩充到浏览接口,或是例如像Microsoft ActiveX扩充一样,由网际网络组件扩充到核心操作系统。Additional steps can be performed if there is at least one second server in the system. For example, the remote replication unit can directly relay the data to the remote server 300 through the server's network operating system. The operating system can be active or passive. In both cases, data received via connection 302 can be written to an internal local disk subsystem by the operating system of server 300 . This approach requires a specific software for each operating system at the remote location. The remote replication unit can also use an Internet-style data window to transmit and receive data between the remote replication unit and the second server 300 . The data window can be extended to the browser interface by a plug-in, or extended to the core operating system by Internet components, such as Microsoft ActiveX extensions.

在任何一种上述的状况里,本地复制单元可具有足够的“智能”,来中继传送复制数据给单一远程复制单元或是多个远程复制单元;一个如图12的“一对多”系统,具有由各自旅程链接206而连上单一个多阜式本地复制单元204的三个远程复制单元,并且多阜式复制单元也可如此单独或是与单阜式复制单元共同合并使用于其它符合本发明的系统里。在某一给定系统里,远程复制单元的个数并没有硬性限制。In any of the above situations, the local replicating unit may be sufficiently "smart" to relay replicated data to a single remote replicating unit or to multiple remote replicating units; a "one-to-many" system as shown in Figure 12 , with three remote replicating units connected to a single multicast local replicating unit 204 by respective journey links 206, and the multicast replicating unit can also be used alone or combined with a single replicating unit for other conforming In the system of the present invention. There is no hard limit to the number of remote replication units on a given system.

远程复制单元也可以中继传送复制数据给附近的复制单元,及/或为容错的因素而传送给另外一个较远的远程复制单元。一远程复制单元可作为两个或更多下列远程复制单元的间的头端,适当地监管复制资料的连续一致性与完整性,以平衡其负载并提供容错能力。将N个远程复制单元彼此连接起来,并维持相同的网络位置或是网域名称系统(DNS)名称,以提供更进一步的容错功能。当然也可以将上述方法一起组合起来应用。A remote replication unit may also relay replicated data to a nearby replication unit, and/or to another remote replication unit for fault tolerance reasons. A remote replication unit may serve as a headend between two or more remote replication units, properly monitoring the continuity and integrity of replicated data to balance its load and provide fault tolerance. Connect N remote replication units to each other and maintain the same network location or Domain Name System (DNS) name to provide further fault tolerance. Of course, the above methods can also be applied in combination.

在具有一个或多个个别完全独立的远程磁盘子系统连接到远程复制单元的具体实施例中,远程复制单元的行为即如同例如像SCSI主端,并且将数据写到远程磁盘。如果存在一个第二伺服端300,该伺服端300即在SCSI链中尾随于远程复制单元及远程磁盘子系统之后。在资料复制的过程中,该第二伺服端300一般为侍从端,及/或属于被动状态。万一被复制的远程服务器200发生故障,该远程服务器300即挂载该外部磁盘目录,并且成为SCSI主端。同时,该远程复制单元卸载其远程磁盘子系统磁盘驱动器而且成为被动状态。In embodiments having one or more individual, completely independent remote disk subsystems connected to the remote mirroring unit, the remote mirroring unit behaves, for example, like a SCSI master and writes data to the remote disks. If there is a second server 300, the server 300 follows the remote mirroring unit and the remote disk subsystem in the SCSI chain. During the data copying process, the second server 300 is generally a slave and/or is in a passive state. In case the replicated remote server 200 breaks down, the remote server 300 mounts the external disk directory and becomes the SCSI master. At the same time, the remote copy unit unmounts its remote disk subsystem disk drive and becomes passive.

特别是说,这可以使用类似于图14的配置方式而实现,其中包括了一项“双主机”连接1400。在许多的传统方式里,只有一张主机适配卡会在SCSI链中被激活,一般设为LUN7。当打开电源或是重置时,该主机轮询所有其它的LUN,以决定连接上了哪些设备。如果系统使用了适用双主机的适配卡,则第二主机一般设定成LUN6,而只对LUN0-5进行重置或查询。如此应可认定LUN7为主要,而LUN6为第二。无论如何,倘以图14的方式连接时,该二主机均可接取至较低层次的目的装置。In particular, this can be accomplished using a configuration similar to that of FIG. 14 , which includes a "dual host" connection 1400 . In many traditional ways, only one host adapter card will be activated in the SCSI chain, usually set to LUN7. When powered on or reset, the host polls all other LUNs to determine which devices are attached. If the system uses an adapter card suitable for dual hosts, the second host is generally set to LUN6, and only reset or query LUN0-5. This should make LUN7 primary and LUN6 secondary. In any case, if connected in the manner shown in FIG. 14, the two hosts can access the lower-level target device.

双主机连接其本身并非为新创。尤其是,具BusLogic EISA卡及NovellNetWare服务器的双主机连接已为众知。然而,因Novell服务器无法按照以要求为基础而更新其档案配置表,使得双主机连接所提供的功能在此情况下无法使用。有关于双主机连接的一般资料,可由公开来源取得,其中也包括一线上SCSI常问问题解答集。如不使用双主机连接,则远程服务器300需要一驱动程序NLM,及/或其它专用于复制的软件,以便该远程服务器300可直接由远程复制单元处接收复制数据,并将其存放以供后续使用。Dual host connections are not new in themselves. In particular, dual host connections with BusLogic EISA cards and Novell NetWare servers are known. However, the functionality provided by the dual-host connection cannot be used in this case because the Novell server cannot update its profile configuration table on a request-by-demand basis. General information on dual host connections is available from open sources, including an online SCSI FAQ. If do not use dual host connection, then remote server 300 needs a driver program NLM, and/or other software dedicated to duplication, so that this remote server 300 can receive duplication data directly from the remote duplication unit place, and it is stored for follow-up use.

在符合本发明并使用双主机配置1400的具体实施例里,远程复制单元208、308、408、508、608或708控制了RAID单元312或是其它远程磁盘子系统,一直到被命令停止以执行切换操作。此时,该远程复制单元执行远程资料复制操作,并且如同文中说明,作为SCSI主端,送出资料给RAID单元312。同时,Novell或是其它第二服务器300仍为被动状态。这可防止因为同时对服务器300、远程复制单元、RAID单元312或者是其它远程磁盘子系统进行如图14的“二对一”方式的写入操作,而或将发生的损害。In an embodiment consistent with the present invention and using dual-host configuration 1400, remote replication unit 208, 308, 408, 508, 608, or 708 controls RAID unit 312 or other remote disk subsystem until commanded to stop to perform Toggle action. At this time, the remote copy unit executes the remote data copy operation, and as described in the text, as the SCSI master, sends the data to the RAID unit 312 . Meanwhile, Novell or other second server 300 is still in a passive state. This prevents damage that may occur due to concurrent write operations to the server 300, remote replication unit, RAID unit 312, or other remote disk subsystems in a "two-to-one" manner as shown in FIG. 14 .

为进行切换操作,该远程复制单元卸载RAID单元312磁盘驱动器,而由服务器300挂载RAID单元312磁盘驱动器。接着,服务器300即成为SCSI主机。由于一般无法预先决定或强制第二服务器SCSI适配卡选择,故远程复制单元具有第二主机位置(LUN6)较佳。当两台机器启动后,远程复制单元在该磁盘驱动器打开通电时,可感受到第二次的重置操作。此为正常现象,但是该远程复制单元应能在装置磁盘驱动器层级即可进行复原。注意到利用双主机(不是双频道)方法,配线方式即为正常结束的SCSI线链;不需要额外的硬件。透过储存子系统及/或磁盘驱动器卸载、挂载、以及相关操作,切换操作可完全由软件来操作。For switching operation, the remote replication unit unmounts the RAID unit 312 disk drive, and the server 300 mounts the RAID unit 312 disk drive. Then, the server 300 becomes a SCSI host. Since it is generally impossible to predetermine or force the selection of the second server SCSI adapter card, it is better for the remote replication unit to have a second host location (LUN6). After both machines are powered on, the remote replication unit senses a second reset when the disk drive is powered on. This is normal, but the remote replication unit should be recoverable at the device disk drive level. Note that with the dual-host (not dual-channel) approach, the wiring is a normally terminated SCSI chain; no additional hardware is required. Switching operations can be performed entirely by software through storage subsystem and/or disk drive unmounting, mounting, and related operations.

前述的讨论可视为已预设远程复制单元与第二服务器300之间为一对一的关系。不过,软件或是机械式SCSI开关(譬如说)可被用来提供远程复制单元与多个潜在主机服务器300之间的连接。在如同光纤频道的协议及/或SAN架构里,并不存在传统的SCSI主从关系。相反地,而是有一透过DNS及/或数码位置而出现的地址关系。在这种系统中,切换操作可借由位置变更而进行切换,而同时该远程复制单元仍保持在被动状态。The foregoing discussion can be regarded as a preset one-to-one relationship between the remote replication unit and the second server 300 . However, software or a mechanical SCSI switch (for example) can be used to provide the connection between the remote mirroring unit and a plurality of potential host servers 300 . In protocols like Fiber Channel and/or SAN architectures, there is no traditional SCSI master-slave relationship. Instead, there is an address relationship that occurs through DNS and/or digital location. In such a system, the switching operation can be switched by changing the location, while the remote mirroring unit remains passive.

该远程复制单元可设定为执行完整的网络操作系统。如出现灾害,则该远程复制单元进入主动状态,并且对于要传送复制资料过去的磁盘子系统上的信息而言,即成为一完整运作的服务器。该远程复制单元也可以执行一仿真程序,来仿真成本地端特定主机操作系统下的服务器。远程复制单元也可以执行一程序,以关闭复制时使用的操作系统与任何相关程序,然后再由一另外的内部磁盘或是扇区,在特定的主机操作系统下重新激活。The remote mirroring unit can be configured to run a complete network operating system. In the event of a disaster, the remote replication unit enters an active state and becomes a fully operational server for information on the disk subsystem to which the replicated data is to be transferred. The remote replication unit can also execute an emulation program to emulate a server under a specific host operating system at the local end. The remote copying unit can also execute a program to shut down the operating system and any related programs used for copying, and then reactivate it under the specific host operating system from an additional internal disk or sector.

该远程复制单元也可以再加强化,以用来连续地作为第二服务器,而非一般地专作为资料复制之用。不过,如此一来将会严重地降低复制性能,并且增加复制失败的风险。The remote replication unit can also be enhanced to continuously serve as a second server, rather than being generally used for data replication. However, doing so will severely degrade replication performance and increase the risk of replication failures.

如果该远程复制单元与本地复制单元204的软件大致相同,则该远程复制单元可作为本地复制单元204使用。例如,当复制是由A地到B地再到C地时,在B地的复制单元相对于A而言为远程复制单元,相对于C而言为本地复制单元。在进行由远程位置回到来源处的复原操作时,该远程复制单元也可以作为本地复制单元204。此即当由A地到B地时,A地的复制单元为本地端,而B地的复制单元为远程,但是当由B地到A地时,A地的复制单元为远程,而B地的复制单元为本地端。If the software of the remote copying unit is substantially the same as that of the local copying unit 204 , then the remote copying unit can be used as the local copying unit 204 . For example, when the replication is from A to B and then to C, the replication unit at B is a remote replication unit relative to A, and a local replication unit relative to C. The remote replication unit can also serve as the local replication unit 204 when restoring from the remote location back to the source. That is to say, when going from A to B, the replication unit of A is local, and the replication unit of B is remote, but when going from B to A, the replication unit of A is remote, and B is remote. The unit of replication for is the local side.

最后,一些新式系统可接纳多个使用者会期(Session);一使用者会期是一复制资料中继或是储存会期。上述各项场景的多重组合及范例可同时或是单独在适宜的状态下出现。同时,或许也需要更多个处理器包括磁盘、内存等等,以便完成特定的组合。Finally, some modern systems can accept multiple user sessions; a user session is a replication data relay or storage session. Multiple combinations and examples of the above-mentioned scenarios can appear simultaneously or individually in a suitable state. At the same time, more processors including disks, memory, etc. may be required to complete a specific combination.

这些各式的工具及技术也可以应用于符合本发明的一对多或是多对一的复制系统。同样地,有关对封包、IP、以太网络、符号环、或是其它封包式资料环境的讨论亦然,并且应可了解到其它被支持的环境,也可以不必使用封包而是资料流的方式将资料写入。These various tools and techniques can also be applied to one-to-many or many-to-one replication systems consistent with the present invention. Likewise, discussions about packet, IP, Ethernet, symbolic ring, or other packetized data environments, and it should be understood that other supported environments can also use streams instead of packets Data is written.

除了在某一步骤需要另一步骤的结果作为输入的情况下之外,上述及其它的步骤也可以不同顺序及/或同时而执行。譬如说,连接步骤1304、1306和1308可依不同顺序及/或同时而执行,但是在测试步骤1310里,即会假定各个指定的联机部分或全部均需出现,至少名目上如此。步骤1312将数据传输到本地复制单元,必然会在步骤1314将该资料透过旅程链接206传输或是传输给本地复制档230之前。另一方面,倘若是传送给无服务器远程复制单元,则传输步骤1316可以执行传输步骤1314的方式而进行。不管是否在本细部描述章节内有明示为可略,除非是在所述的权利要求内,其它各个步骤也可以省略掉。各步骤可以重复、合并或是以不同方式命名。Except in cases where a certain step requires as input the results of another step, the above and other steps may also be performed in different order and/or concurrently. For example, connection steps 1304, 1306, and 1308 may be performed in different orders and/or simultaneously, but in test step 1310, it is assumed that some or all of the respective specified connections are present, at least in name. Step 1312 transfers the data to the local replica unit, necessarily prior to step 1314 transferring the data via the journey link 206 or to the local replica 230 . On the other hand, if the transfer is to a serverless remote replication unit, the transfer step 1316 can be performed in the manner of the transfer step 1314 . Regardless of whether it is expressly omitted in this detailed description section, other steps can also be omitted unless it is in the stated claims. Steps may be repeated, combined, or named differently.

现请参见图15以及以下说明,其将会直接参照于该图,而同时讨论可于本发明具体实施例内善加运用(单独或按各式组合)的额外工具与技术,像是本地-远程角色互换、热待机服务器状态实施方式、数种替代性缓冲器内容及缓冲法则、交易、多对一复制处理(前文中已按图5-10略予说明)、频繁接取资料的识别处理,以及按未授权方式运用第二服务器等。Referring now to FIG. 15 and the following description, which will refer directly to that figure while discussing additional tools and techniques that may be advantageously employed (alone or in various combinations) within embodiments of the invention, such as local- Remote role swap, implementation of hot standby server status, several alternative buffer contents and buffer rules, transactions, many-to-one copy processing (simply explained in the previous article according to Figure 5-10), identification of frequently accessed data processing, and use of secondary servers in an unauthorized manner, etc.

角色互换role reversal

当一例如为服务器200的主服务器变成非运作,并且既已将变动数据完全地送出到远程位置时,例如为单元204及208的复制单元可改变角色,借此让在WAN上例如为服务器200的远程服务器,能够对其各网络端节提供例如像是灾难复原的功能。受让人MiraLink第一份专利,美国专利第5,537,533号,即已讨论到一种连续可用、远程复制、替换网络服务器。但是显然该处并未讨论到角色互换可用性。在角色互换里,整个复制单元架构会按其性质加以反置。如本地及远程复制单元两者皆可存活过任何导致需要灾难复原功能的事件,则在本地一远程角色互换后,原来为远程者会被视为是本地端,且该处所注记的资料变化,就会被复制送返给现已转为远程角色的原始本地端。When a master server, such as server 200, becomes non-operational, and has fully sent the changed data to the remote location, the replica units, such as units 204 and 208, can change roles, thereby allowing the servers on the WAN, such as The remote server of 200 can provide functions such as disaster recovery for each network node. Assignee MiraLink's first patent, US Patent No. 5,537,533, discusses a continuously available, remotely replicated, replacement network server. But obviously there is no discussion about role swap availability. In a role reversal, the entire replication unit architecture is reversed by its nature. If both the local and remote replication units can survive any event that requires a disaster recovery function, after the local-remote role is reversed, the original remote will be regarded as the local end, and the data noted there Changes are replicated and sent back to the original local side, which is now a remote role.

在一具体实施例里,会按下列方式实施角色互换步骤1506。首先,最好是等同地配置设定“机盒”组对(像是单元204、208的复制单元),以利于转换操作。其次,处理SCSI仿真的核心模块会在本地机盒里为作用中,而在远程机盒里为休眠状态。就是这个软件状态会实际地产生后述的“媒体未待机”特性。当该本地机盒既已将所有的变动资料递交给该远程机盒后,使用者可下指令以进行角色互换。这会关闭该本地机盒的复制功能,并激活远程SCSI仿真层级,从而现在可指挥该远程服务器以登注该远程复制单元。如此,在各站台处的复制单元可改变其角色,并供允服务器参与以让这项改变生效。可借位旗标或其它变量,按内部方式表示出该复制单元的目前角色。In one embodiment, the role-swapping step 1506 is implemented in the following manner. First, it is best to configure sets of "box" pairs (such as duplicate units of units 204, 208) identically to facilitate conversion operations. Second, the core module that handles SCSI emulation will be active in the local box and dormant in the remote box. It is this software state that actually produces the "media not standby" feature described below. After the local box has submitted all the change data to the remote box, the user can issue an instruction to perform role swapping. This turns off the replication function of the local box and activates the remote SCSI emulation layer so that the remote server can now be directed to register with the remote replication unit. In this way, the replication unit at each station can change its role and involve the permission server to effect this change. The current role of the replication unit may be indicated internally by means of flags or other variables.

在此,当该复制单元互换角色1506,并开始按远程角色而运作时,可利用一在按本地角色运作的复制单元内作为传送媒体的实体盘片来当作是接收缓冲器。在一本地复制单元里,像是单元204,此盘片为一传送盘片,可储存该旅程链接206的变化资料。在一远程复制单元,相同盘片会是一接收缓冲器,可保存所接收的1504变化资料,一直到既经辨识并传交给远程复制单元盘片或其它的非挥发性储存物为止。在一些具体实施例里,可程序设计该辨识水准及传交时间延迟。Here, when the replicating unit switches roles 1506 and begins to operate in the remote role, a physical disk as the transmission medium in the replicating unit operating in the local role can be used as a receive buffer. In a locally replicated unit, such as unit 204, the disk is a transfer disk that stores change data for the journey link 206. At a remote replication unit, the same disk would be a receive buffer that holds received 1504 change data until it has been identified and delivered to the remote replication unit disk or other non-volatile storage. In some embodiments, the recognition level and delivery time delay are programmable.

对该第二服务器的媒体未待机状态Media not ready for the second server

利用1508,“媒体未待机”状态可让第二服务器300位属“热”待机模式。若无此项,在远程复制单元308确已上线后,或将需要带起该第二服务器,以便该第二服务器可向该SCSI串链查询该远程复制单元308是否出现。在步骤1508里,该远程复制单元的SCSI仿真层会对于来自该远程服务器300的请求项,响应以像是资料大小及数据可用性的资料特征,但是会拒绝该远程服务器300接取至该资料内容。在此,会由单元308利用标准SCSI响应格式来提供这些对该服务器300的限制响应。Using 1508, the "media not standby" state allows the second server 300 to be in a "hot" standby mode. Without this, after the remote copy unit 308 is indeed on-line, it may be necessary to bring up the second server so that the second server can query the SCSI chain to see if the remote copy unit 308 is present. In step 1508, the SCSI emulation layer of the remote mirroring unit responds to the request from the remote server 300 with data characteristics such as data size and data availability, but denies the remote server 300 access to the data content . Here, these restricted responses to server 300 would be provided by unit 308 using the standard SCSI response format.

或者,可带起该第二服务器300,而无需该远程复制单元308缆线连接至该第二服务器300。在一最终失效后,会连接该缆线然后必须执行一SCSI装置串链探寻操作,以侦测出新的硬件。该服务器300然后会登注该装置308。相对地,利用1508的较佳方式,利用一媒体未待命模式,而让该容量308成为“启用”且“侦得”但仍维持未登注,一直到要求失效为止。Alternatively, the second server 300 can be brought up without the remote replication unit 308 being cabled to the second server 300 . After a final failure, the cable is connected and a SCSI device chain discovery operation must be performed to detect new hardware. The server 300 then registers the device 308 . In contrast, using a preferred mode of 1508, a media unarmed mode is used, and the capacity 308 becomes "enabled" and "detected" but remains unregistered until the request expires.

环型缓冲器ring buffer

两个额外的操作模式可借由允许一种“非一致”复制模式(亦即不再是完全可信赖的时间延迟复制),按此可依给定时间及/或频宽进行复原操作,来延扩该缓冲器内的环型资料队列的运用性。此环型队列也称为“可扩充式智能型缓冲器”、“环型缓冲器队列”或“CBQ”。这在一正常模式下会利用盘片空间作为FIFO(先进先出),存放变动“逻辑区块编号(LBN)”,而不是存放真实的变动资料。这代表按CBQ可减少储存大小(128LBN“各为4个字节”相对于一个变动数据区块“各为512字节”),借此减缓CBQ被填满的速度,提供给该旅程链接206更多的复原时间。如该旅程链接206维持停当一段足够长的时间,而该CBQ变成全满,则会要求进行完整复制。然而,系统仅需对既变区块复原一次,使得CBQ会溃散于一虚拟“档案配置表(FAT)”或类似的区块(像是簇集或区段)配置结构内,并对于各个区块将检查总和或“环型冗余检查”数值存入CBQ里。当该旅程链接206被复原后,远程复制单元会被本地复制单元通知1302需要重复制,且其会与该本地复制单元交换CRC等区块,供决定需要送出该盘片的哪个簇集(例如)。例如,超过90%的硬盘可能并未改变,从而不需要透过该旅程链接206送出,这点确与先前的复制方式不同,其中会假定在本地与远程碟机之间100%的资料皆属不同。Two additional modes of operation can be implemented by allowing a "non-consistent" replication mode (i.e. no longer fully reliable time-delayed replication), whereby recovery operations can be performed at a given time and/or bandwidth The availability of the circular data queue within the buffer is extended. This ring queue is also known as a "Scalable Smart Buffer", "Ring Buffer Queue", or "CBQ". This would use disk space as a FIFO (first in, first out) in a normal mode, storing changed "Logical Block Numbers (LBNs)" instead of storing actual changed data. This means that the storage size can be reduced by CBQ (128LBN "4 bytes each" compared to a change data block "512 bytes each"), thereby slowing down the speed at which the CBQ is filled up for the journey link 206 More recovery time. If the journey link 206 remains down for a long enough time, and the CBQ becomes full, a full replication will be required. However, the system only needs to restore the changed blocks once, so that CBQ will collapse in a virtual "file allocation table (FAT)" or similar block (such as cluster or sector) configuration structure, and for each block The block stores the checksum or "ring redundancy check" value in the CBQ. After the journey link 206 is recovered, the remote copying unit will be notified 1302 by the local copying unit that it needs to re-copy, and it will exchange blocks such as CRC with the local copying unit for deciding which cluster of the disk needs to be sent (for example ). For example, more than 90% of the hard disk may not have changed, so there is no need to send it through the journey link 206, which is indeed different from the previous replication method, which assumes 100% of the data between the local and remote drives different.

SCSI探察缓冲处理SCSI probe buffer handling

在一些具体实施例里,在正常模式下的“可扩充式智能型缓冲器”(即如“环型缓冲器队列”)会收存变动区块,一直到触抵一门槛值为止,在此时该复制单元会存放1510变动“逻辑区块编号(LBN)”,而不是真实的变动资料。在一种利用“SCSI探察缓冲处理”的变化方式里,该资料复制系统会缓冲该真实的SCSI指令,而不是切出该区块数据,并缓冲这些SCSI指令。这可按下列方式进行;注意,即如图15中所示,不同的步骤1512具体实施例或会包含或省略掉在此集体标注为部分编号1512之一或更多特定操作。In some specific embodiments, the "scalable intelligent buffer" (i.e., "ring buffer queue") in the normal mode will store the changed blocks until a threshold is reached, here At this time, the replication unit will store the 1510 change "Logical Block Number (LBN)" instead of the real change data. In a variation using "SCSI snoop buffering", the data replication system buffers the actual SCSI commands, rather than cutting out the block data, and buffers the SCSI commands. This can be done in the following manner; note that, as shown in FIG. 15 , different embodiments of step 1512 may include or omit one or more specific operations collectively referenced here as section number 1512 .

在该复制装置204内一目标调接器会按被动方式倾听1512该SCSI总线。“被动”在本文中的意思是该实体装置204并不电子参与于该总线,但确会将在该总线所观察到的加以记录1512。该目标调接器可利用在SCSI分析器中所运用具有类似性质的现存实体硬件,但非其目的。SCSI分析器是一种解析工具,可让使用者监视SCSI总线的活动状况,而无须实际地参与其中。然后,将由本发明目标调接器从该SCSI总线所收集的资料1512,针对源自或朝向该SCSI总线上某一特定真实参与者或“目标”的活动加以解释1512。这种资料包括一组包封SCSI指令集,即如在该SCSI总线上所见者1512。A target adapter within the replica device 204 passively listens 1512 to the SCSI bus. "Passive" herein means that the physical device 204 does not electronically participate in the bus, but does record 1512 what it observes on the bus. The target adapter can utilize existing physical hardware of a similar nature to that used in SCSI analyzers, but is not intended to do so. SCSI Analyzer is an analysis tool that allows users to monitor SCSI bus activity without actually participating in it. The data collected 1512 from the SCSI bus by the target adapter of the present invention is then interpreted 1512 with respect to activity originating from or towards a particular real participant or "target" on the SCSI bus. This profile includes a set of encapsulated SCSI commands, ie, 1512, as seen on the SCSI bus.

指令配对1512过滤标准,即如仅仅和所欲SCSI总线参与者相关的指令,会利用适当的缓冲算法,按照观察到的顺序予以队列1512。在此,并不必然地会对从该SCSI总线上所收集的资料1512分析或解释1512超过对来自该SCSI总线一特定参与者的指令或响应进行辨识1512。不过,可采取操作而将1512分割成为(a)来自该总线上一主机控制器的请求,以及(b)来自该总线上一主机控制器而属写入性质的指令。借缓冲1512写入性质指令,该缓冲器内可含有仅仅与该SCSI总线上目标参与者的变动资料相关的交易项目。Command pairing 1512 filtering criteria, ie commands related only to desired SCSI bus participants, are queued 1512 in the order observed using appropriate buffering algorithms. Here, it is not necessary to analyze or interpret 1512 data collected from the SCSI bus beyond identifying 1512 commands or responses from a particular participant on the SCSI bus. However, operations may be taken to split 1512 into (a) requests from a host controller on the bus, and (b) commands of a write nature from a host controller on the bus. By means of buffer 1512 write property commands, the buffer may contain only transaction entries related to the change data of the target participant on the SCSI bus.

然后,跨于像是旅程链接206的通讯链路,将经缓冲的SCSI指令资料传交1502到第二复制单元208、308等等。在收讫1504后,会依等同或类似参与者,借由重复于一第二实体个别SCSI总线来“回放”1514这些指令,而这些参与者会按照与第一总线上彼此对等者的相同的状态开始。按此方式,当从原先SCSI总线上读出1512各项指令时,即可将位于第二SCSI总线上的复制目标参与者,设置成与原先目标参与者相同的状态下,并令其含有相同资料。在此,可依类似方式运用SCSI以外的其它总线进行指令捕捉及回放,以及本发明其它特性。The buffered SCSI command data is then communicated 1502 to the second replication unit 208, 308, etc. across a communication link such as the journey link 206. Upon receipt 1504, these commands are "played back" 1514 by being repeated on a second entity individual SCSI bus by equal or similar participants following the same protocol as their peers on the first bus. The state starts. In this way, when the 1512 commands are read from the original SCSI bus, the copy target participant on the second SCSI bus can be set to the same state as the original target participant, and make it contain the same material. Here, other buses than SCSI can be used in a similar manner for command capture and playback, as well as other features of the present invention.

当实施本复制系统时,很重要的一点是需注意到读出请求与写入请求间的细微的无意互动。这在若该受关注的SCSI总线参与者保持一暗示,但非随即可见,的内部状态,根据一按先前读出操作的后续写入运算来改变其行为时,将会特别重要。When implementing the replication system, it is important to be aware of subtle inadvertent interactions between read requests and write requests. This is especially important if the SCSI bus participant in question maintains an implicit, but not immediately visible, internal state that changes its behavior in response to a subsequent write operation following a previous read operation.

此外,从在受监视的SCSI总线上各个捕捉到指令的参与者所回报的错误,需按与该第二SCSI总线上一致的方式加以处置1514,但是这并不必然会产生相同的错误。同时,在该第二SCSI总线上所产生的错误条件,可能会令该第二SCSI总线在状态及资料方面与该第一SCSI总线并不一致。In addition, errors reported from participants that capture commands on the monitored SCSI bus need to be handled 1514 in a manner consistent with that on the second SCSI bus, but this does not necessarily result in the same errors. Meanwhile, an error condition generated on the second SCSI bus may cause the status and data of the second SCSI bus to be inconsistent with the first SCSI bus.

暂时交易Temporary deal

暂时交易处理1516利用一复制单元204、208等缓冲器以提供交易档案系统功能性。注意,不同的步骤1516具体实施例或会包含或省略一或更多特定操作,在此共同标注为部分编号1516。借操作系统代理者及/或核心嵌档,可追踪1516档案开启及关闭,以及档案操作时间戳内存,借以支持尚未支持交易操作的档案系统上的操作回溯(roll-back)1516。Temporary transaction processing 1516 utilizes a replication unit 204, 208, etc. buffer to provide transaction file system functionality. Note that different embodiments of step 1516 may include or omit one or more specific operations, collectively labeled as part number 1516 . File opening and closing, and file operation timestamp memory can be tracked 1516 by operating system agents and/or kernel embeddings to support roll-back 1516 on file systems that do not yet support transactional operations.

在此情境里,“核心嵌档”是一种二进制补文件或一原始码补档,可嵌挤入现有二进制程序代码或原始码以修改操作系统。这与装置驱动程序或代理者不同,因为核心嵌文件插置操作会出现在操作系统里,并不专门设计以让额外软件链接连入或另予插置的位置处。借由将程序代码插置1516于操作系统里例如出现会开启及关闭档案等运作的点处,就可按照这些事件来进行操作。In this context, a "kernel embed" is a binary patch or a source code patch that embeds existing binary code or source code to modify the operating system. This is different from a device driver or agent, because the core embedded file insertion operation will occur in the operating system, and is not specifically designed to allow additional software to be linked into or otherwise inserted. These events can be followed by inserting 1516 program code at points in the operating system where operations such as opening and closing files occur.

这种方式可被视为是一种复制及覆写的混合体,原因是覆写会在当关闭档案时复制档案,而复制则是在当写入档案时会复制档案。这种方式会根据何时开启该档案或关闭以供等待,而将一时间戳记或其它标号附接1516至经复制资料。如此,在开启该档案后借一程序而出现的所有变化,皆会被关联1516到该开启/关闭循环,而在重新开启该档案之后的任何后续变动,则并不会被关联到此目前循环。This approach can be thought of as a hybrid of copying and overwriting, since overwriting copies the file when it is closed, and copying copies the file when it is written to. This approach attaches 1516 a time stamp or other label to the copied material depending on when the file was opened or closed for waiting. In this way, all changes made by a program after opening the file will be linked 1516 to the open/close cycle, while any subsequent changes after reopening the file will not be linked to the current cycle .

当完成开启/关闭后,缺少空间或其它因素或会使得不易追踪1516与一档案相关的特定区块,但可随即保持追踪1516当出现一特定开启/关闭事件时的精确时间,并且也可追踪1516当一区块进入该缓冲器的精确时间。如此,在稍后时间,系统管理者可检视由该嵌物所提供的开启/关闭日志文件,并选择性地消除符合一特定时段的经变动资料区块。When opening/closing is done, lack of space or other factors may make it difficult to track 1516 specific blocks associated with a file, but can then keep track of 1516 the precise time when a particular opening/closing event occurs, and can also track 1516 The precise time when a block entered the buffer. Thus, at a later time, the system administrator can review the open/close log files provided by the insert and selectively eliminate changed data blocks matching a specific time period.

注意,如仅运用于像是数据库般开启档案一个长时段和对其等写入资料一个长时段的应用,这种方式只能提供相当小的好处。然而,对于保持档案系统安全或用于复原1516被意外覆写的文书处理器档案,这种方式会相当好用,这是因为这些操作会出现在一短时段内,且通常是愈快愈好。在此,当发生时,即如一文书处理器的档案储存操作,会对一档案系统变化追踪1516至一合理精确的时间点。然后将对应于这些时点的资料变化复制操作加以辨识1516,并且会从执行复制操作的资料变化操作串流中挑出,对所选定的资料变化操作加以编辑1516。Note that this approach provides only relatively small benefit when applied only to applications such as databases that open files for a long period of time and write data to them for a long period of time. However, it works quite well for keeping a file system safe or for restoring 1516 word processor files that were accidentally overwritten, since these operations occur in a short period of time, usually as quickly as possible . Here, a file system change is tracked 1516 to a reasonably precise point in time as it occurs, eg, a file storage operation in a word processor. The data change copy operations corresponding to these points in time are then identified 1516 and selected from the stream of data change operations performing the copy operations, and the selected data change operations are edited 1516 .

可借一远程系统代理者,或是其它将资料变化日志文件保留1516于缓冲器内并能够对该变化回溯1516一个时段的程序,来完成交易1516。该远程系统代理者常驻于像是单元208的远程资料复制单元内,并可在通讯链路206上接收1504、1506来自于该本地资料复制单元204的资料变化信息。The transaction 1516 can be done by a remote system agent, or other program that maintains 1516 a log file of data changes in a buffer and can backtrack 1516 the changes for a period of time. The remote system agent resides in a remote data replication unit such as unit 208 and can receive 1504 , 1506 data change information from the local data replication unit 204 over communication link 206 .

在一些具体实施例里,系统在本地及远程两处皆装设有一复制盘片及一缓冲器盘片,但是除非该远程系统按某些原因而不再需要成为远程而变成本地者,即如当交换1506远程/本地角色,使得可从被复制的位置处将该远程复制数据复原回返时,否则并不会真正地利用像是缓冲器310的远程缓冲器盘片。从而,可利用远程缓冲器盘片来保存1516的交易日志文件。In some embodiments, the system has a replica disk and a buffer disk installed both locally and remotely, but unless the remote system no longer needs to be remote for some reason and becomes local, ie Remote buffer platters such as buffer 310 are not actually utilized as when the remote/local roles are swapped 1506 so that the remotely replicated data can be restored back from the replicated location. Thus, a remote buffer disk may be utilized to save 1516 the transaction log file.

可按类似于交易队列的结构来组织这些日志文件,而能够按一排序方式,来储存1516一资料区块以及关于此者的信息(LBN及时间戳记)。不立即地将资料写入盘片,相反地本发明却是将其储存1516于一缓冲器一段时间,此时段长短是由缓冲器空间可用性,及/或管理者偏好所决定。当超过时间时,资料会被从缓冲器移除1516并被写入至该复制影像。但在此刻,该管理者就不会有撤消(undo)该次写入的选择项。而若该远程208需要成为1506该本地204时,则在可将相同的缓冲器空间310配属给该数据传输操作之前,就会需要把整个远程缓冲器310传交至一盘片,像是RAID单元312。These log files can be organized in a structure similar to a transaction queue, which can store 1516 a block of data and information about it (LBN and timestamp) in an ordered fashion. Data is not written to disk immediately, but instead the present invention stores 1516 it in a buffer for a period of time determined by buffer space availability, and/or administrator preference. When the time expires, data is removed 1516 from the buffer and written to the duplicate image. But at this moment, the manager will not have the option to undo (undo) the writing. And if the remote 208 needs to become 1506 the local 204, then the entire remote buffer 310 will need to be transferred to a disk, such as a RAID, before the same buffer space 310 can be assigned to the data transfer operation Unit 312.

更广义地说,借利用缓冲器及其时间戳记信息,即可有效地重作1516既已发生于该受复制的服务器200,和在接收该复制资料的远程系统缓冲器310上,但是尚未离出该复制影像的缓冲器310而存入像是该RAID单元312上的各项事务。可仅借将这些受询区块移离1516该远程队列,由管理者利用一管理公用程序来执行此重作操作。More broadly, by utilizing the buffer and its timestamp information, effectively redoing 1516 has occurred both at the replicated server 200 and at the remote system buffer 310 receiving the replicated material, but has not yet left Transactions on the RAID unit 312 are stored out of the buffer 310 of the replicated image. This redo operation can be performed by the manager using a management utility by simply moving 1516 the queried blocks off the remote queue.

替代性缓冲法则Alternative Buffer Law

可在一些复制单元204内采用不同的缓冲法则,以相较于简易环型缓冲器能够节省缓冲器空间及时间。假定是在当收讫时会将各区块写入本地复制230,且只会把LBN号码储存在有序队列里。即如本文所用,所谓“有序队列”是指任何队列(queue)、串行(list)、“先进先出(FIFO)”、窗体(table),或是其它能够按照被递入时为相同的顺序撷取项目之一或更多种数据结构集合。特别是,环型队列即为有序队列的一范例。Different buffering algorithms can be used in some replica units 204 to save buffer space and time compared to simple ring buffers. The assumption is that blocks are written to the local replica 230 when received and only the LBN number is stored in the ordered queue. That is, as used herein, the so-called "ordered queue" refers to any queue (queue), serial (list), "first-in-first-out (FIFO)", form (table), or other Retrieves one or more collections of data structures in the same order. In particular, a circular queue is an example of an ordered queue.

在将被复制区块覆写于一既存于该队列内的区块上,且尚未被复制1302至远程站台的情况下,会按照先前所述的具体实施例的相同方式,将预先存在的区块复制到该缓冲器空间内(亦即仅会将一朝向该区块的指针置放在该真实队列内,而该区块本身会被存放在一置换(swap)空间内)。本替代性缓冲法则可让整个缓冲器成为“精简”模式,而同时仍又保持安全性。在此,仅会缓冲各项变化的变动部分。In the case where the copied block is overwritten on a block that already exists in the queue and has not been copied 1302 to the remote station, the pre-existing block will be copied in the same manner as in the previously described embodiment. Blocks are copied into the buffer space (ie only a pointer to the block is placed in the real queue, and the block itself is stored in a swap space). This alternative buffering algorithm allows the entire buffer to be "lite" while still maintaining safety. Here, only the moving parts of the changes are buffered.

“精简模式”与“正常模式”是指缓冲模式。精简模式可实施“竭尽所能”策略,会在当填满该缓冲器时进行。正常模式是平常所用的缓冲方式,一直到管理者-定义或是触抵其它的空置缓冲空间门槛值。按语意来说,此门槛值有时又称为“高水位标记”,因为当水位很高时,则最好是加以处理为妙。在触抵该门槛值后,该缓冲器会按精简模式运作,而这就不再能够于所有情况下保证资料的整合性,因为这仅会保存追踪1510出现变化的LBN,而不是LBN与资料。在此,会按正常方式将资料写入该本地复制230内,而当从该队列里读出该LBN时,就会将待予传送1500的资料从该本地复制230里读出。在许多情况下,这可正常运作-所有资料皆被复制。"Compact mode" and "normal mode" refer to buffered mode. Compact mode implements a "best effort" strategy, which occurs when the buffer is filled. Normal mode is the usual buffering mode until administrator-defined or other thresholds of free buffer space are reached. Semantically, this threshold is sometimes called the "high water mark" because when the water level is high, it's best to handle it. After this threshold is reached, the buffer operates in compact mode, which no longer guarantees data integrity in all cases, as it only saves the LBN tracking 1510 changes, not the LBN and data . Here, data will be written into the local copy 230 in the normal way, and when the LBN is read from the queue, the data to be transmitted 1500 will be read from the local copy 230 . In many cases this works fine - all data is copied.

不过,在有些情况下,档案会被写入,然后又因为一些变动而重新写入。这两种改变都会被放进队列内,但是当从队列中移除第一种变化时,所递送1500的资料其实是来自于该第二种(或后者)变化,并因此会在时间到之前出现于远程复制的盘片310/312上。这会是一个重大的问题,因为这通常会覆写该档案系统标的。不过这是一种“试试看或许成功”法则,且仍可提供某种程度的保护,故较仅缓冲器用尽为佳。However, in some cases files are written and then re-written due to some changes. Both changes will be put into the queue, but when the first change is removed from the queue, the delivered 1500 data is actually from the second (or latter) change, and will therefore be returned when the time is up. Previously appeared on remote replicated discs 310/312. This can be a significant problem, as this usually overwrites the filesystem object. But it's a "try it and it might work" rule and still provide some level of protection, so it's better than just running out of buffers.

可改善这种方法的本发明替代性缓冲法则,是按几乎相同的方式进行。不过,当后续写入一给定资料区块,在本地复制230上的区块会被复制,且被存入该缓冲器里的另外某个位置。将此区块插置返回该队列并不可行;一般说来,将会需要移动太多的队列元素方可获得空间。但是,可在改变该特定LBN的个别项目的位置,以参照在该系统上某个其它位置的资料区块。例如,可由该本地复制单元204利用一第二储存区域来保存这些区块。The alternative buffering algorithm of the present invention, which improves this approach, proceeds in much the same way. However, when a given block of data is subsequently written, the block on the local copy 230 is copied and stored somewhere else in the buffer. It is not feasible to insert this block back into the queue; in general, too many queue elements would need to be moved to make room. However, the location of individual entries for that particular LBN can be changed to reference a data block at some other location on the system. For example, the local replication unit 204 may use a second storage area to store the blocks.

这种替代缓冲法则的一项优点是,多数的时间仅有必要单一次写入操作。偶有需要进行读出/写入/写入操作1518,亦即,从该本地复制230中读出该区块;将其写入暂时性储存内;更新该队列内的LBN项目以朝向该暂时性储存内的区块,而非该复制内的区块;将该新区块写入该复制230内,在此会收存该资料的先前复制;然后将该新区块的LBN项目增加至该队列内。An advantage of this alternative buffering strategy is that most of the time only a single write operation is necessary. Occasionally a read/write/write operation 1518 is required, i.e., read the block from the local copy 230; write it to temporary storage; update the LBN entry in the queue towards the temporary block in permanent storage, but not in the copy; write the new block into the copy 230 where the previous copy of the data would have been stored; then add the LBN entry for the new block to the queue Inside.

远程多对一复制Remote many-to-one replication

这项新法包括本文所述的技术,这可进一步调适以提供一种按属多对一解决方案的硬件/软件平台,此者具有如前述的中央备份站台或服务供货商。本地系统运作概如前述。该本地复制单元204透过该SCSI总线连接到主机服务器系统200,并且按固定碟机的方式出现,而这又会被用来(例如)作为RAID-l复制的一部分。然后,透过该本地复制单元204传输协议,将资料从本地缓冲器210传送1500到远程站台,其等操作状态可如本文其它部分所述。一管理接口可支持该本地系统与,像是单元508、608或708的复制单元内的远程多对一解决方案间的一对一视界(从该本地复制系统的观点)。This new approach includes the technology described herein, which can be further adapted to provide a hardware/software platform as a many-to-one solution with a central backup site or service provider as previously described. The local system works as described above. The local replication unit 204 is connected to the host server system 200 via the SCSI bus and appears as a fixed disk drive, which in turn can be used, for example, as part of RAID-1 replication. Data is then transferred 1500 from the local buffer 210 to the remote station via the local replica unit 204 transport protocol, the operations of which can be as described elsewhere herein. A management interface may support a one-to-one view (from the local replica system's point of view) between the local system and a remote many-to-one solution within a replica unit such as unit 508, 608 or 708.

该远程多对一解决方案可执行1520该复制系统的传送及缓冲器管理软件的多重实施例,即如像是前述的远程复制单元208、308、408软件的多个软件实施例。然而,在这些具体实施例里,核心模块会被使用者空间控制模块所取代,这会仿真1520前述系统的核心接口。多重个“虚拟远程复制单元”(在此又称为“虚拟系统”或“虚拟1.1系统”)可在一服务器300的硬件平台上,或是经修饰的复制单元208、308、408上担任主执1520。该硬件平台可为任何高阶的服务器系统,能够提供一共享及可用Posix/Unix/SRV4环境。范例包括,但不限于此,分别地执行Solaris/Linux或AIX/Linux的升阳或IBM服务器。The remote many-to-one solution can execute 1520 multiple embodiments of the replication system's transport and buffer management software, eg, multiple software embodiments of the remote replication unit 208, 308, 408 software described above. However, in these embodiments, the core module is replaced by a user space control module, which emulates 1520 the core interface of the aforementioned system. Multiple "virtual remote replication units" (also referred to herein as "virtual systems" or "virtual 1.1 systems") can be hosted on a server 300 hardware platform, or on modified replication units 208, 308, 408 Execute 1520. The hardware platform can be any high-end server system that can provide a shared and available Posix/Unix/SRV4 environment. Examples include, but are not limited to, Sun or IBM servers running Solaris/Linux or AIX/Linux, respectively.

为利于实施可按需要所运作1520的虚拟系统传送软件,该软件应按模块方式所撰写,且不对关于资料是如何在装置间流动的提出任何假定,这些装置含有例如一本地缓冲器、一远程缓冲器、一本地复制、一远程复制及该核心。对于资料是从何而来而又前往何处的控制,则是透过核心接口进行,其会维持关于复制情况与使用者激活状态变化的状态信息。To facilitate the implementation of virtual system delivery software that can function 1520 as desired, the software should be written in a modular fashion without making any assumptions about how data flows between devices containing, for example, a local buffer, a remote buffers, a local copy, a remote copy and the core. Control over where data comes from and where it goes is through the core interface, which maintains state information about replication and user activation status changes.

在一些具体实施例里,硬件平台会执行一介接于一复制单元管理层的SAN管理软件,以视需要提供像是对本地装置的SAN储存上的路由装置等功能,实施(对于缓冲器装置、复制装置、变化复制装置等)各种操作状态。该多对一系统的管理接口或可从先前所述的复制单元的管理接口,透过MIB扩充部分与全球信息网而利用SNMP所导出。在该管理层内,可由主(本地)复制系统提供一对一关系,而同时仍然允许远程系统上所需要的状态运作。在此,可利用一SAN管理套装作为类似接口的模型,像是设定检查点、制作多份复制资料的复制,及/或改变会被复制的装置。In some embodiments, the hardware platform executes a SAN management software interfacing with a replication unit management layer to provide functions such as routing devices on the SAN storage of local devices as needed, implementing (for buffer devices, replicator, change replicator, etc.) various operating states. The management interface of the many-to-one system may be derived from the previously described management interface of the replication unit through MIB extensions and the World Wide Web using SNMP. Within this management layer, a one-to-one relationship can be provided by the primary (local) replication system, while still allowing the required state operations on the remote system. Here, a SAN management suite can be used as a model for similar interfaces, such as setting checkpoints, making multiple copies of replicated data, and/or changing devices that will be replicated.

识别频繁接取数据元素而无需应用特定知识Identify frequently accessed data elements without applying specific knowledge

在本节与后续两段中,一数据区块为一“数据元素”范例,而一盘片区段为一“储存单元”范例。一“目前集合”可被视为是一碟机的抽象项。In this section and the following two paragraphs, a data block is an example of a "data element", and a disk section is an example of a "storage unit". A "current collection" can be viewed as an abstraction of a player.

容错系统常见的一个问题是,当在应用程序结束前仅完成一组资料储存操作集合的某部分时,该运用应用程序并不会采取方法以进行复原。经设计以能容错的应用程序通常是拥有一些方法,借此这些方法能够执行一组资料储存操作集合,但却一直要到执行某些最终操作后,才会将这些操作视为有效,从而假使任一操作并未成功,则整个操作不会被视为有效。然而,许多应用程序不是按此方式所设计。A common problem with fault-tolerant systems is that the application does not take steps to recover when only a portion of a set of data storage operations completes before the application terminates. Applications that are designed to be fault tolerant typically have methods whereby a set of data storage operations can be performed, but these operations are not considered valid until some final operation is performed, thus if If either operation is unsuccessful, the entire operation will not be considered valid. However, many applications are not designed this way.

一种用以对并非特定设计的应用程序提供容错功能的方法,为具有其中包含待予执行的各项操作的详细知识的应用-特定信息,以及在该应用程序之外持续追踪该应用程序状态。如整个交易尚未透过一监视该应用程序的外部代理者所递送,则可从该作用资料集合中将其移除。但是这会造成问题,因为该监视代理者需要关于该应用程序行为的特定知识,因此这会对于该应用程序本身以外的资料变化极为敏感。A method for providing fault tolerance to applications that are not specifically designed to have application-specific information that contains detailed knowledge of the operations to be performed and to keep track of the application state outside of the application . If the entire transaction has not been delivered by an external agent monitoring the application, it may be removed from the role data set. But this creates problems because the monitoring agent needs specific knowledge about the application's behavior, so this can be extremely sensitive to changes in data outside of the application itself.

本文所述的一种方式,就是利用一并不拥有这种应用-特定信息的监视代理者来频繁地识别1522所接取数据。该代理者假定1522该应用程序的一组储存交易会出现于一暂时相关的簇集内,假定这通常会将一组对于第一群组邻接资料组件的操作集合,假定这些储存操作会出现在该组对于第一群组邻接资料组件的操作集合之前及/或之后,以及假定这些储存操作会出现在或靠近第二群组邻接资料组件,这些位于除该第一群组邻接资料组件以外的其它位置处,且为不同交易所共享。这些共享组件在此称为“状态区块”。One approach described herein is to frequently identify 1522 received data using a monitoring agent that does not possess such application-specific information. The agent assumes 1522 that a set of store transactions for the application would occur in a temporally related cluster, assuming this would normally group a set of operations on a first group of contiguous data components, assuming the store operations would occur in The set of operations before and/or after the set of operations on the first group of contiguous data elements, and assuming that these store operations will occur at or near a second group of contiguous data elements, which are located in addition to the first group of contiguous data elements In other locations, and shared by different exchanges. These shared components are referred to herein as "state blocks."

以此为例,考虑一档案系统写入操作。该数据文件经一或更多操作集合所更新,通常是牵涉到一组在实体储存媒体上邻近设置的邻接储存组件。然后将这些档案系统表进行更新,这些会被存放在不同但一致参指的位置内,且会位于一组有限数量的实体相关储存组件里。各个保存该档案的使用者资料的区段或簇集会对应到该第一群组的邻接资料组件,而各个保存该档案系统表、位映图或类似档案系统数据结构的区段或簇集,则是会对应到该第二群组的邻接资料组件。For this example, consider a file system write operation. The data file is updated through a set of one or more operations, usually involving a set of contiguous storage components disposed adjacently on a physical storage medium. These file system tables are then updated, which will be stored in different but consistently referenced locations, and will be located in a limited set of entity-related storage components. each segment or cluster holding user data for the file corresponds to a contiguous data element of the first group, and each segment or cluster holding the file system table, bitmap or similar file system data structure, is the adjacent data element corresponding to the second group.

许多应用程序可支持一种类似于此的写入策略。为增加写入性能,一给定操作系统或将尝试着将不相关的写入操作予以簇集成为一单一写入操作。因此,该数据文件更新操作会在根据操作系统而定的时候进行。Many applications can support a write strategy similar to this. To increase write performance, a given operating system may attempt to cluster unrelated write operations into a single write operation. Therefore, the data file update operation will be performed at a time determined according to the operating system.

借由本发明,一种用以识别1522一交易的方法,为在对这些特殊状态区块的更新操作间保持追踪储存写入操作。一交易项会含有在两个状态区块更新操作间,所有写入至(各)数据文件的资料。可借由在跨于其正常运作范围上执行1522一应用程序,并保持追踪1522哪个储存操作既已写入、多频繁且按何种顺序,来实现识别1522状态区块。可利用中性净值、统计分析或其它类似技术及工具以从所获日志文件里撷取1522一状态区块的识别结果。历时所累积的日志文件应会显示出比起其它者,会更为频繁地接取/写入某些储存单元,且因此应被视为1522状态区块。如未发现这种明显的统计相关样式,则本法并不适用于在此所讨论的应用程序。本发明方法并不必然地可运作于每一种储存利用应用程序。With the present invention, a method for identifying 1522 a transaction is to keep track of storage write operations between update operations to these special state blocks. A transaction will contain all data written to the data file(s) between two state block update operations. Identifying 1522 state blocks can be accomplished by executing 1522 an application program across its normal operating range, and keeping track 1522 of which storage operations have been written, how often, and in what order. Neutral net worth, statistical analysis, or other similar techniques and tools can be used to extract 1522 a status block identification result from the obtained log file. Log files accumulated over time should show that some storage units are accessed/written more frequently than others, and should therefore be considered 1522 status blocks. If no such obvious statistically relevant patterns are found, then this law does not apply to the applications discussed here. The method of the present invention does not necessarily work with every storage utilization application.

当适当地采用本法时,如该应用程序失效,且无法复原,则可由未递送1524续接的资料区块,以及未递送1524与各状态区块更新操作间所写入的状态区块更新来协助进行复原,一直到该应用程序能够进行复原其状态为止。为支持此未递送功能性,本发明会按一种非挥发性储存形式,来储存于状态区块更新操作间所被覆写的资料单元。或另外,本发明可在递送储存操作返回至盘片之前先行缓冲,而在侦测到并处理过次一组的状态区块储存操作后,释放出所欲的缓冲器空间。读出操作应从经缓冲的储存装置中读取获得,而不是从所递送的复制中读取获得。可维护一份窗体,以表示出在一缓冲器里或在所递送的储存装置上,某一给定数据单元的位置。When this method is properly adopted, if the application fails and cannot be restored, it can be updated by the data block continued by the undelivered 1524 and the status block written between the undelivered 1524 and each status block update operation to assist in recovery until the application is able to recover its state. To support this uncommitted functionality, the present invention stores data units that are overwritten between state block update operations in a form of non-volatile storage. Alternatively, the present invention may buffer before delivering store operations back to disk, and free up the desired buffer space after the next set of state block store operations are detected and processed. Read operations should be read from buffered storage, not from the delivered copy. A table can be maintained showing the location of a given data unit in a buffer or on delivered storage.

从一第一资料容量对一未授权第二资料容量再同步Resynchronization from a first data volume to an unauthorized second data volume

本发明也可提供用以从一像是本地复制210的第一资料容量,对一像是远程复制盘片子系统312或614的未授权第二资料容量予以再同步的工具及技术,以利于利用该第二资料容量作为第一者一段时间后能够进行灾难复原。The present invention may also provide tools and techniques for resynchronizing an unauthorized second data volume, such as a remotely replicated disc subsystem 312 or 614, from a first data volume such as locally replicated 210 to facilitate utilization The second data capacity is capable of disaster recovery after a period of time as the first one.

在正常运作之下,会将资料单元写入一第一资料容量内,然后再借某种方式,像是复制单元204、208,写入一第二资料容量内。第一资料容量上的资料会被视为是已经授权,而因此当需要接取资料单元时会被查询。在该第一资料容量发生非破坏性失效的情况下(即如电力失效或暂时隔离于既存资料单元的使用应用程序),该使用应用程序会转向该第二资料容量,以便储存新的资料单元与读取数据单元。在此,会维持1526一列表(即如窗体,或其它数据结构),指出当第一资料容量非属可用时,在该第二资料容量上会改变的资料单元。当第一资料容量回属可用时就会查询此列表,以将1526该第二资料容量的内容再同步于该第一容量的内容。此再同步1526程序会从该第一容量中读取相对应资料单元,并将其写入到该第二资料容量内。Under normal operation, data units are written into a first data volume and then written into a second data volume by some means, such as replicating units 204, 208. The data on the first data volume is considered authorized, and therefore is queried when a data unit needs to be accessed. In the event of a non-destructive failure of the first data capacity (i.e., such as a power failure or temporary isolation of a consuming application from an existing data unit), the consuming application will switch to the second data capacity to store a new data unit and read data unit. Here, a list (eg, window, or other data structure) is maintained 1526 indicating the data units that will be changed on the second data volume when the first data volume is not available. The list is queried when the first data volume becomes available to resynchronize 1526 the contents of the second data volume with the contents of the first volume. The resynchronization 1526 process reads the corresponding data unit from the first volume and writes it into the second data volume.

在此情境下,对第二资料容量所作的改变会被假定成非经授权,且正常时是为该再同步1526所覆写。而这或以例如特定于该利用应用程序为理由。In this scenario, changes to the second data volume are assumed to be unauthorized and are normally overwritten by the resync 1526 . And this may be on the grounds that it is specific to the exploiting application, for example.

如此,在适当的情况下,本发明可提供一种用以于两个资料容量间,再建立第一-第二关系的简易方法。此再同步1526不同于角色互换1506;在角色互换里,该第二容量会变成第一授权容量,而在再同步1526中,该第一授权容量仍为授权。Thus, under appropriate circumstances, the present invention can provide a simple method for re-establishing the first-second relationship between two data volumes. This resynchronization 1526 differs from role swapping 1506; in a role swapping, the second capacity becomes the first authorized capacity, while in resynchronization 1526, the first authorizing capacity remains authorized.

于同一个实体储存系统上维护一有序队列及一目前复制Maintain an ordered queue and a current copy on the same physical storage system

即如本文所述,一些具体实施例里,复制单元204是会按一依其经接收的顺序的有序队列的方式,来存放各资料单元写入,以便能够依序地被读返。在一些具体实施例里,会定义一组资料储存单元为“目前复制”,而这些资料储存单元会被按整体方式从该目前复制所读返1528。在该储存装置的一给定资料单元上的新储存操作会更新1528该目前复制内的资料单元,而各资料单元仍属可用1528,以读出而复原回早先的系统状态。That is to say, as described herein, in some embodiments, the replication unit 204 stores the written data units in an ordered queue in the order in which they are received, so that they can be read back sequentially. In some embodiments, a group of data storage units is defined as a "current copy", and these data storage units are read back 1528 from the current copy as a whole. A new store operation on a given data unit of the storage device updates 1528 the data units within the current copy, while each data unit is still available 1528 to be read to restore the previous system state.

这是由维护1528一份该目前复制的储存单元位置的窗体(或其它数据结构)所管理。该窗体可识别出该目前复制里一给定储存单元的最新近资料单元的地址。当处理请求后,会在窗体里查核1528该资料单元,并在参考到该窗体时从该有序队列里读出。在此,是借由按队列前向的方式,从该有序队列里一已知位置进行读取,来处理各项有序读取请求1528。This is managed by a window (or other data structure) that maintains 1528 a copy of the location of the currently replicated storage unit. This window identifies the address of the most recent data unit for a given storage unit in the current replication. When the request is processed, the data unit is checked 1528 in the form and read from the ordered queue when the form is referenced. Here, each ordered read request 1528 is processed by reading from a known position in the ordered queue in a queue-forward manner.

依此,即无令人信服的理由需按实体分割的方式来保存两份相同资料单元的复制。本发明可避免对储存系统写入相同资料单元两次来实施一实体分割系统。注意,在不同的步骤1528具体实施例中,确可包括或略除一或更多共同标注为部分编号1528的特定操作。Accordingly, there is no compelling reason to keep two copies of the same data unit as physical divisions. The present invention can avoid writing the same data unit twice in the storage system to implement a physical partition system. Note that in different embodiments of step 1528 , one or more specific operations collectively labeled as section number 1528 may indeed be included or omitted.

当实体储存系统填满有序队列资料时,最老的有序队列单元会为逾时1528,且那些储存空间会被释放,以供新的有序队列单元运用。如在目前集合内一老的有序队列单元需为逾时,则可将其复制1528到一第二储存装置,并更新该有序集合1528以朝指向这个新的位置。这是否为一常见情境是属应用特定,然在许多情况下,本发明这项特性1528是会倾向于减少为维护目前集合及一组资料单元的有序队列观点两者时所需要的写入操作次数。When the physical storage system is full of ordered queue data, the oldest ordered queue unit will be timed out by 1528, and those storage spaces will be released for use by new ordered queue units. If an old ordered queue unit in the current set needs to be timed out, it can be copied 1528 to a second storage device and the ordered set 1528 updated to point to this new location. Whether this is a common scenario is application specific, but in many cases this feature of the present invention 1528 will tend to reduce the writes required both for maintaining the current collection and for an ordered queue view of a set of data units number of operations.

保持一有序队列的结果是可利用先前的目前集合来作为重新建构1528之用。在此,借由在一时间点上,选取1528一有序队列作为新的目前集合,扫描1528该参考窗体以参指到较新于该既选时点的有序队列的各单元,然后更新1528该参考窗体以参指到较旧的有序队列单元,而这些是会参指到目前集合的正确部分,依此方式来重建一先前目前集合。As a result of maintaining an ordered queue, the previous current set is available for reconstruction 1528 . Here, by selecting 1528 an ordered queue as a new current set at a point in time, scanning 1528 the reference frame to refer to each unit of the ordered queue that is newer at the selected time point, and then The reference window is updated 1528 to refer to older SQUs, and these would refer to the correct part of the current set, thereby recreating a previous current set.

在许多情况下,此款本发明具体实施例1528需为读取操作付出性能打折的代价,这是因为在一些情形里,这不会发生在接续性储存单元上。但是储存操作应在任何顺序下皆为有效,因为储存操作最好总是在按有序队列排置方式的接续性储存单元上为宜,亦即如将该有序队列实施为一跨于一储存系统的储存单元的线性数组。In many cases, this embodiment of the invention 1528 pays a performance penalty for read operations because in some cases, this does not occur on sequential storage units. However, the storage operation should be effective in any order, because the storage operation is preferably always on the consecutive storage units arranged in an ordered queue, that is, if the ordered queue is implemented as a spanning one A linear array of memory cells in the storage system.

配置储存媒体与信号Configure storage media and signals

按本发明范围所制作的对象,包括有一计算机可读取的储存媒体,且合并有一该计算机可读取储存媒体基板所特定的实体配置。该基板配置代表资料与指令,可让计算机如下述依照特定及预设的方式而操作。适合的储存装置包括软盘、硬盘、磁带、CD-ROM、RAM、闪存及其它可由一个或多个计算机所读取的媒体。每个前述媒体均可实施出能够被机器所执行的程序、功能及/或指令,以进行大致于此讨论的弹性化复制方法步骤,包括但不限定于可执行如图13所示的部分或是全部步骤,以及用以安装及/或采用如图2到12系统的方法。本发明也可提供该程序所使用或采用的新式信号。这些信号可以“有线”、RAM、磁盘或其它储存媒体或资料载体实施。Objects made within the scope of the present invention include a computer-readable storage medium incorporating a physical configuration specific to a substrate of the computer-readable storage medium. The substrate configuration represents data and instructions that allow the computer to operate in a specific and predetermined manner as described below. Suitable storage devices include floppy disks, hard disks, magnetic tape, CD-ROMs, RAM, flash memory, and other media that can be read by one or more computers. Each of the aforementioned media can implement programs, functions and/or instructions that can be executed by a machine, so as to perform the steps of the flexible copy method discussed here, including but not limited to the execution of the parts or instructions shown in Figure 13 are all the steps and methods for installing and/or using the systems shown in Figures 2 to 12. The invention may also provide novel signals used or employed by the program. These signals can be implemented by "wire", RAM, disk or other storage medium or data carrier.

额外信息extra information

为更进一步帮助个人及企业了解及适当制作本发明,兹提供额外的相关信息及细节。这些论述以前续的假设,而除非另有说明,任何一种实施例型态(方法、系统或配置储存媒体)的讨论亦适用于其它的实施例。To further assist individuals and businesses in understanding and properly making the present invention, additional relevant information and details are provided. These discussions are based on subsequent assumptions, and unless otherwise stated, any discussion of an embodiment type (method, system, or configuration storage medium) is also applicable to other embodiments.

本发明改良的特定实施例Improved Specific Embodiments of the Invention

对于数据保护问题(磁带备份、区域性丛集、再制、阴影复制、远程大型主机频道扩充等等),许多其它的解决方法均多多少少需直接连接到主机200操作系统并且与其相关。该相关会对客户产生困扰,而使用本方法可加以避免。譬如说,假设软件不能完全在目前的主机操作系统或是该操作系统升级版之下操作的话,那么如果依赖相关的专属软件就可能会造成兼容性问题及错误。依赖专属主机复制软件的软件解决方案也可能产生性能问题,因为其将额外的工作加于主机之上。相关的软件解决方案也可能会造成不稳定性问题。当磁盘目录增大,而且软件与操作系统变得较复杂时,这些问题就更需要相关的软件来解决。此外,如果主机200操作系统当机,则依赖该操作系统的解决方案也就无法操作。Many other solutions to data protection problems (tape backups, regional clustering, replication, shadow replication, remote mainframe channel expansion, etc.) all require more or less direct connection to the host 200 operating system and are associated with it. This correlation can cause confusion for customers and can be avoided using this method. For example, assuming that the software cannot fully operate under the current host operating system or an upgraded version of the operating system, then relying on related proprietary software may cause compatibility problems and errors. Software solutions that rely on dedicated host replication software can also create performance issues because they place extra work on the host. Associated software solutions can also cause instability issues. When the disk directory increases and the software and operating system become more complex, these problems need more related software to solve. Furthermore, if the operating system of the host 200 is down, solutions that rely on the operating system will not work.

相对地,至少在有些具体实施例里,本发明并不使用会造成增加主机计算机(即本地服务器200)负载的软件,也因此降低或避免了上述的问题。如果主机操作系统当机,该复制单元可继续操作并且仍可使用复制资料,因为该复制单元执行其本身的操作系统。与必须在核心部分进行实质性修改的解决方案不同,当磁盘目录增多且软件变得复杂时,本发明可立即扩充。倘若磁盘空间较大,可将较大的磁盘放入复制单元内。如果数据变动率超过了目前写入磁盘的能力,则可使用一快取控制器并且增加系统的内存。某些其它的解决方案需要其它操作系统厂商的合作,以便顺利整合并操作而不会出错。由于所有的操作系统在可预期的未来都支持(譬如说)SCSI及光纤频道,故本发明的安装及使用不需要这种合作。In contrast, at least in some embodiments, the present invention does not use software that would increase the load on the host computer (ie, the local server 200 ), thereby reducing or avoiding the above-mentioned problems. If the host operating system goes down, the replication unit can continue to operate and the replicated material can still be used because the replication unit executes its own operating system. Unlike solutions that have to be substantially modified at the core, the present invention scales instantly as the disk catalog grows and the software becomes complex. If the disk space is large, the larger disk can be placed in the replication unit. If the rate of data change exceeds the current ability to write to disk, use a cache controller and increase the system's memory. Certain other solutions require the cooperation of other operating system vendors in order to integrate smoothly and operate without error. Since all operating systems support (say) SCSI and Fiber Channel for the foreseeable future, installation and use of the present invention does not require such cooperation.

当其它方案失效时,可取用主机200,因为如上述的密切互动关系。由于本系统操作可与主机200无关,因此如果故障也不会严重影响主机计算机。传统的磁盘复制原先是设计来提供区域性的容错能力。以平行的方式写入两个磁盘,而如果一个磁盘故障,该计算机仍可运作。故障的磁盘可在背景模式由操作系统卸载下来。操作系统及计算机可持续运作而不会有任何闪失。因为本发明的复制单元可被视为一SCSI磁盘并且以复制磁盘挂载,因此可提供类似的优点。如果复制单元当机了,需将其卸载即可。例如,如果复制单元上的操作系统或是其它软件失效,则该复制单元会停止仿真成磁盘驱动器的操作。因此,主机200的操作系统不再认得该复制单元。对此,主机200的操作系统只需卸载该复制单元204并继续运作即可。When other solutions fail, the host 200 can be used because of the close interaction as described above. Since the operation of the system is not related to the host computer 200, the host computer will not be seriously affected if a fault occurs. Traditional disk replication was originally designed to provide regional fault tolerance. Write to two disks in parallel, and if one disk fails, the computer can still function. A failed disk can be unmounted by the operating system in background mode. The operating system and computer continue to operate without any failure. Similar advantages are provided because the replica unit of the present invention can be viewed as a SCSI disk and mounted as a replica disk. If the replication unit crashes, simply uninstall it. For example, if the operating system or other software on the replicating unit fails, the replicating unit may stop emulating operation as a disk drive. Therefore, the operating system of the host 200 no longer recognizes the replication unit. In this regard, the operating system of the host 200 only needs to uninstall the replication unit 204 and continue to operate.

至少有部分先前说明的复制系统实施例会使用单一磁盘IDE缓冲区。即使是用欺骗封包的方式,这种智能型缓冲区也无法跟得上具有硬件式剥除功能的高速SCSI RAID单元。之前被传送到远程位置的最重要数据,会被存寄于单一磁盘,而并不具有在智能型缓冲区方面的容错功能。相对地,利用本发明,本地端及远程复制单元可同时复制具容错功能的单一磁盘缓冲区,且可以在多重磁盘之上执行硬件RAID剥除。这点可提供跟得上服务器端高速储存子系统,以及较佳的容错两种能力。万一服务器200磁盘目录或是复制单元磁盘210、310的某个磁盘发生事故,这也可以降低漏失缓冲区资料的风险。At least some of the previously described replication system embodiments use a single disk IDE buffer. Even with spoofed packets, this smart buffer can't keep up with high-speed SCSI RAID units with hardware stripping. The most important data previously sent to the remote location is stored on a single disk without fault tolerance in terms of intelligent buffering. In contrast, using the present invention, the local and remote replication units can simultaneously replicate a single disk buffer with fault tolerance, and can perform hardware RAID stripping on multiple disks. This provides both the ability to keep up with the server-side high-speed storage subsystem and better fault tolerance. This can also reduce the risk of missing buffer data in case of an accident in the disk directory of the server 200 or a certain disk of the replication unit disks 210 , 310 .

先前各种方式的资料输入容量限制,使得提出可获市场接受度的新技术变得非常困难。譬如说,至少在某些先前所述的方法里,没有支持“储存接取网络(SAN)”或是网络接附储存(NAS)。因为需要如同服务器300般的标准远程服务器,使得提供备份与复制日渐流行的SAN及NAS磁盘子系统变成极为困难或是不可能。然而,所有这些子系统可透过以太网络、光纤频道及/或SCSI来执行本地端复制操作。本新式复制单元可接受多种的输入型态,包括SCSI、以太网络与光纤频道输入。The data input capacity limitations of previous methods make it very difficult to propose new technologies that can be accepted by the market. For example, at least in some of the previously described approaches, there is no support for "storage access network (SAN)" or network-attached storage (NAS). The requirement for standard remote servers like server 300 makes it extremely difficult or impossible to provide backup and replication for increasingly popular SAN and NAS disk subsystems. However, all of these subsystems can perform local copy operations over Ethernet, Fiber Channel and/or SCSI. The new duplication unit accepts a variety of input types, including SCSI, Ethernet and Fiber Channel inputs.

本发明也提供对较大型储存子系统的支持。许多较早期的容错解决方案设计适用于即使是6Giga字节储存磁盘容量都算是大型的环境之下。由于储存成本降低,磁盘子系统容量快速增加。现在即使服务器磁盘容量是100Giga字节也是很平常。本发明可容下这些较大型的磁盘目录,部分是借由在例如复制单元背景模式下进行主机服务器200同步处理。将工作负载自主机服务器卸除到复制单元,可使得中央主机服务器200完整复制而不会大幅降低性能。相反地,另外的“丛集式”及/或需要一本地服务器来处理复制所需同步的复制方案,都会降低甚至毁损主要服务器性能。The invention also provides support for larger storage subsystems. Many earlier fault-tolerant solutions were designed for use in environments where even 6Gigabyte storage disk capacities are considered large. Disk subsystem capacity is rapidly increasing due to lower storage costs. Even server disk capacities of 100Gigabytes are not uncommon these days. The present invention accommodates these larger disk directories, in part by performing host server 200 synchronization in, for example, replicated unit background mode. Offloading the workload from the host server to the replication unit allows the central host server 200 to be fully replicated without significantly degrading performance. Conversely, additional "clustering" and/or replication schemes that require a local server to handle the synchronization required for replication can degrade or even destroy the primary server's performance.

虽然具体实施例已尽力避免复制磁盘经由通讯链接上的再同步操作(重新复制),但是至少前述的再复制实施例中有些在当本地缓冲区无法支持整个本地端磁盘目录时,会要求本地服务器200进行干预。再复制操作会减缓中央/主要/主机服务器200为停顿,并且可能要好几天。所以再复制操作一般在使用者较少网络可以较慢的周末时进行。但是当磁盘子系统变大,这就无法接受了。本发明可在不仅是远程而且也适用于本地复制单元204支持非挥发性储存,其容量可装下整个要被复制到远程位置的磁盘目录。这可允许该本地复制单元204对完整的本地磁盘容量预确认到本地式的智能型缓冲区,并且以从服务器200的观点为背景的方式来执行再复制操作。Although the specific embodiments try to avoid re-synchronization operations (re-replication) on the replicated disk via the communication link, at least some of the foregoing re-replicated embodiments require the local server to 200 to intervene. The re-replication operation would slow down the central/primary/master server 200 to a standstill, and possibly for several days. Therefore, the re-copy operation is generally performed on weekends when there are fewer users and the network can be slower. But when the disk subsystem gets bigger, this becomes unacceptable. The present invention can support non-volatile storage not only remotely but also locally in the replication unit 204, which can hold the entire disk directory to be replicated to the remote location. This may allow the local copy unit 204 to pre-commit to a local smart buffer for the full local disk capacity and perform re-copy operations in the context of the server 200 point of view.

至少在某些前述的方法里,T1输出的最大输出限制,不管是对本地或是远程,即使是讯框中继、ATM及/或VSAT网络可供使用,都会减缓再复制操作。相反地,本发明可弹性地提供较大的I/O管线容量以改善性能,因为再复制操作可变得较快,资料布放也会比较有效率。如果无法取得在远程储存的复制数据,则放在该无法取得位置的数据,可借高速私有数据网络以高速传送到另外的设施。这些数据网络一般可支持达OC48(即每秒2.488Giga字节)的频宽。其一例为某顾客一般将资料复制到芝加哥,而如今需使用纽约的设备来进行复原操作。这种型态的需求比起原先预想的还要频繁。In at least some of the aforementioned approaches, the maximum output limitation of the T1 output, whether local or remote, slows down recopy operations even if frame relay, ATM and/or VSAT networks are available. On the contrary, the present invention can flexibly provide larger I/O pipeline capacity to improve performance, because re-copy operations can be made faster and data placement can be more efficient. If the replicated data stored remotely is unavailable, the data placed in the unavailable location can be transferred to another facility at high speed via a high-speed private data network. These data networks generally support bandwidths up to OC48 (ie 2.488 Giga bytes per second). One example is a customer who normally copies data to Chicago, but now needs to use facilities in New York for restore operations. This type of demand is more frequent than originally thought.

早先的Off-SiteServer产品无法提供一开放式“应用程序设计人员接口(API)”。相反地,是完全采用封闭式专用硬件(MiraLink)以及封闭式专用软件(Vinca)。如果某一企业客户具有超出该产品范畴之外的需要,则一般并没有简易方法进行订制修改或是调整。相对于此,本发明可提供一开放式API,以便由客户端程序针对特定的顾客或新兴市场而来进行这些修改。特别是,但不限定于,本发明更具有一种可提供一种或多种呼叫,以对复制单元进行配置设定的API,同时并不会中断服务器200,另外也提供一种呼叫来重新激活该复制单元,而且也不会中断服务器200。Previous Off-SiteServer products could not provide an open "application programming interface (API)". Instead, it is completely closed dedicated hardware (MiraLink) and closed dedicated software (Vinca). If an enterprise customer has needs outside the scope of the product, there is generally no easy way to make custom modifications or adjustments. In contrast, the present invention can provide an open API so that these modifications can be made by client programs for specific customers or emerging markets. In particular, but not limited to, the present invention has an API that can provide one or more calls to configure settings for the replication unit without interrupting the server 200, and also provides a call to restart The replication unit is activated without interrupting the server 200 .

配置资料configuration data

系统配置数据以分散型式存放较佳,以便万一该复制单元漏失配置数据,该配置数据仍可由各单元点而复原。例如像网络信息的基本配置资料最好是存放于非挥发性储存装置(即磁盘上、或是接装干电池的半导体内存),以便即使是失去磁盘上的配置数据,该配置数据仍可由复制单元相对点复原回来。The system configuration data is preferably stored in a decentralized manner, so that in case the replication unit loses the configuration data, the configuration data can still be recovered from each unit point. For example, the basic configuration data such as network information is preferably stored in a non-volatile storage device (that is, on a magnetic disk, or a semiconductor memory with a dry battery), so that even if the configuration data on the disk is lost, the configuration data can still be used by the replication unit The relative point is restored.

全球信息网接口最好是至少能提供下列的配置选项或其对等项目:IP地址(远程/本地)、网络屏蔽(远程/本地)、管理员密码(共享)、缓冲区大小(本地)、缓冲区高水位记号(缓冲区已装满超过一可接受标准)、磁盘容量大小(可配置设定到制造厂商设定的最高值)、SCSI目标“逻辑单元数量(LUN)”、SNMP配置设定(远程/本地)。The World Wide Web interface preferably provides at least the following configuration options or their equivalents: IP address (remote/local), netmask (remote/local), administrator password (shared), buffer size (local), Buffer high water mark (buffer is full beyond an acceptable level), disk capacity size (configurable to a maximum value set by the manufacturer), SCSI target "logical unit number (LUN)", SNMP configuration settings set (remote/local).

该SNMP配置设定本身最好能够包含下列项目:增/删SNMP复制主机(远程/本地)、事件轮询时间间隔、缓冲区装满超过可接受限制、网络联机失效、缓冲区已满、远程已失去同步、增/删电子邮件收信者。The SNMP configuration settings themselves should preferably include the following items: add/remove SNMP replication hosts (remote/local), event polling interval, buffer full beyond acceptable limits, network connection down, buffer full, remote Out of sync, adding/removing email recipients.

网页接口最好是至少能提供下列状态信息:缓冲区内资料区块数、资料区块已送出数、资料区块已接收数、复制单元版本、复制单元序号、磁盘目录大小、本单元为远程或本地。网页接口最好可提供一未挂载远程的公用程序。网页接口最好也可提供一日志倾印报告。SNMP及SMTP陷接一般用以下列事件:缓冲区装满超过可接受限制、缓冲区已满、网络联机失效、远程已失去同步。The web interface should at least provide the following status information: the number of data blocks in the buffer, the number of data blocks that have been sent, the number of data blocks that have been received, the version of the copy unit, the serial number of the copy unit, the size of the disk directory, and whether the unit is remote or locally. The web interface preferably provides an unmounted remote utility. Preferably, the web interface also provides a log dump report. SNMP and SMTP traps are typically used for the following events: buffer full beyond acceptable limits, buffer full, network connection down, remote has lost synchronization.

而管理工具可以电子邮件、呼叫器、或其它方法提供知会操作。知会操作可为实时性及/或合并有自动日志或自动产生的报表。知会操作也可以送到系统管理员及/或贩售厂商。在以执行网页服务器/电子邮件程序包作为接口的具体实施例中,也可利用网页许多的特性。譬如说,使用者可在本地端或远程来存取及管理该复制单元。按个别权限而定,使用者可以公司内部方式及/或由世界任何所在位置来接取该复制单元。复制单元可透过电子邮件和SNMP,来通知使用者(还有该复制单元贩售厂商)该复制单元所发生的问题以及重大事件。也可为该电子邮件撰写专用订制的文件程序文件,以便通知不同的使用者或使用者群组。报表输出并非为必要项目。如果顾客要求管理用的专用报表,而非每个月复制所要求资料并且将资料复写一遍又一遍,则该顾客或受通知的设计厂商可使用HTML、JAVA及/或其它熟悉工具及技术,来让复制单元产生并利用电子邮件寄送该份具有所需格式的报表。The management tool may provide notification operations by e-mail, pager, or other methods. Notification operations may be real-time and/or incorporated with automated logs or automatically generated reports. The notification operation can also be sent to the system administrator and/or the vendor. In embodiments that interface with an executing web server/email package, many of the features of web pages can also be utilized. For example, the user can access and manage the replication unit locally or remotely. Depending on the individual rights, the user can access the reproduction unit internally within the company and/or from any location in the world. The replication unit can notify users (and the replication unit vendor) of problems and major events in the replication unit through email and SNMP. It is also possible to compose a dedicated and customized document program document for the e-mail, so as to notify different users or groups of users. Report output is not a required item. If the customer requires special reports for management purposes, rather than duplicating the required data every month and rewriting the data over and over again, the customer or the notified designer can use HTML, JAVA and/or other familiar tools and techniques to Let the replication unit generate and email the report in the desired format.

基本硬件basic hardware

一般说来,符合本发明的系统应该包括诸如标准PentiumII、PentiumIII、AMD K6-3或AMD K7等级的PC兼容计算机(具有各自厂商的品牌)的基本硬件。各种配置中,该设备最好具有至少64、128或256Mega字节的RAM,以及挂覆计算机外壳。也最好是包含一片100Mb的以太网络卡、FDDI适配卡等等。而磁盘驱动器接口,该设备最好是具有Qlogic SCSI适配卡作为磁盘驱动器仿真之用,以及Adaptec 2940UW适配卡作为缓冲及复制控制之用,或是FreeBSD所支持DPT品牌的RAID适配卡。也可以使用快取,包括RAID或SCSI控制器快取,复制单元的挥发性内存RAM快取,复制单元的非挥发性内存RAM快取(即静态RAM或是电池附接的RAM)等等。熟悉快取方面工具及技术的人士,可即按符合本发明而修正应用的。In general, a system consistent with the present invention should include basic hardware such as a standard Pentium II, Pentium III, AMD K6-3 or AMD K7 class PC compatible computer (with the respective manufacturer's brand). In various configurations, the device preferably has at least 64, 128 or 256 Megabytes of RAM, and a mounted computer case. It is also best to include a 100Mb Ethernet network card, FDDI adapter card, and so on. As for the disk drive interface, the device preferably has a Qlogic SCSI adapter card for disk drive emulation, an Adaptec 2940UW adapter card for buffering and copy control, or a DPT brand RAID adapter card supported by FreeBSD. Cache can also be used, including RAID or SCSI controller cache, volatile memory RAM cache for replicated units, non-volatile memory RAM cache for replicated units (ie, static RAM or battery attached RAM), and the like. Those who are familiar with the tools and techniques of caching can immediately modify the application according to the present invention.

在某些具体实施例中,如果N为欲复制的磁盘目录大小,则包含有本地复制档230的本地复制单元204需具有至少N的储存容量以作为该本地复制档使用。而在某些具体实施例中,座位本地缓冲区210(无论是否具有本地复制文件)伺服之用的磁盘系统,需具有至少五分之六倍N的容量,即1.2倍的N。该远程复制单元具有至少一个容量至少为N的磁盘系统,以提供给远程复制档。在所有的情况下,该本地复制单元缓冲区210或将需要等同于远程复制单元的资料容量,包括缓冲区与热交换RAID子系统,以提供本地再复制之用。In some embodiments, if N is the size of the disk directory to be copied, then the local copy unit 204 including the local copy file 230 must have at least N storage capacity to be used as the local copy file. In some embodiments, the disk system used to serve the local buffer 210 (whether it has a local copy file or not) needs to have a capacity of at least 6/5 times N, that is, 1.2 times N. The remote replication unit has at least one disk system with a capacity of at least N for providing remote replication files. In all cases, the local replication unit buffer 210 will probably require the same data capacity as the remote replication unit, including the buffer and hot-swap RAID subsystem, to provide local re-replication.

套装测试项目Package test items

用以度量符合本发明系统性能的测量项目,最好能包括可用以衡量相对性能的解析性测试,以及涵盖了重点功能规格符合标准的布尔(通过/不通过)测试。如果对所有问题的指定答案均与测试结果正确相符,则算是通过布尔测试。该布尔测试可用来决定传递的适合度。The measurement items used to measure the performance of the system according to the present invention preferably include analytical tests to measure relative performance, and Boolean (pass/fail) tests to cover key functional specification compliance criteria. The Boolean test is passed if the specified answers to all questions correctly match the test results. This Boolean test can be used to determine the suitability of the transfer.

测试时最好是以本地网络配置(其中该旅程链接206处于单一局域网络之内),和以本地与远程配置(其中该本地复制单元204以及远程复制单元在地理上互相远隔)进行。例如,一远程网络配置可包含两个以T1链接206,或者是等同于旅程链接206的公众网际网络频宽所连接的位置。Testing is preferably performed in a local network configuration (where the journey link 206 is within a single local area network), and in a local and remote configuration (where the local replica unit 204 and the remote replica unit are geographically separated from each other). For example, a remote network configuration may include two locations connected by T1 link 206 , or public Internet bandwidth equivalent to tour link 206 .

解析性测试最好采用一标准磁盘硬件套装测试,例如像Bonie(适用于UNIX),或是PCTools(适用于Windows NT以及Novell用户)。该测试可进行原始磁盘驱动器(注记其型式、尺寸与特征值)以及弹性复制单元204之间的性能比较。记录其输出性能以作未来参考。Analytical testing is best done with a standard disk hardware suite such as Bonie (for UNIX), or PCTools (for Windows NT and Novell users). This test allows for a performance comparison between the original disk drive (noting its type, size and characteristics) and the elastic replication unit 204 . Record its output performance for future reference.

最好能询问下列问题,并且进行必要的更正,直到符合所列答案为止。It would be a good idea to ask the following questions and make necessary corrections until the answers listed are met.

主机200操作系统是否将该复制单元204认定为正确容量的磁盘驱动器?(是的)Does the host 200 operating system recognize the replication unit 204 as a disk drive of the correct size? (Yes)

数据是否能被读取及写入该复制单元204而不会有漏失?(是的)Can data be read and written to the replication unit 204 without loss? (Yes)

主机系统200可否对该复制单元204上的资料持续48小时执行任何的档案操作而不会有漏失?(是的)Can the host system 200 perform any file operations on the data on the replication unit 204 for 48 hours without loss? (Yes)

该安装有100Mega字节主机磁盘目录以及一远程网络配置的本地复制单元204,可否以每小时300Mega字节,或是如果有FDDI及其它支持的更高速度,成功地将资料复制到远程复制单元?(是的)注意该每小时300Mega字节的速度低于T1联机最高载送容量的50%;T1容量约为每小时617Mega字节。Can the local copy unit 204 installed with a 100 Mega byte host disk directory and a remote network configuration successfully copy data to the remote copy unit at 300 Mega bytes per hour, or higher if FDDI and others are supported ? (Yes) Note that the 300 Megabytes per hour rate is less than 50% of the maximum capacity of the T1 connection; the T1 capacity is about 617 Megabytes per hour.

该本地复制单元204可否重开机,而完全不会造成附接的主机系统200无法正常操作,换言之,该主机200可继续完成所欲的操作目的而没有明显的性能退化?(是的)Can the local replicating unit 204 be rebooted without causing the attached host system 200 to fail to operate normally at all, in other words, the host 200 can continue to perform the intended purpose without significant performance degradation? (Yes)

当该本地复制单元204重新上线时,是否可自动透过网络或是其它旅程链接206(即使用TCP socket协议),开始传输遗留在该本地复制单元204队列的数据,送出该数据到远程复制单元,而不会产生数据漏失?(是的)注意此项应以当该本地复制单元204附接于主机系统200时,在该本地复制单元204重开机之前与之后,于主机系统200之上挂载该远程复制单元磁盘驱动器的方式来确认。在该事件之后,远程复制档应仍为可挂载,而不会产生明显的档案系统修复需求。不应造成资料漏失,并且应该让产生该资料的应用程序认定为合理。将该复制单元实体挂载至本地主机系统200后,该主机系统200是否能够挂载该复制文件,并且该主机系统200上的应用程序及其客户端是否能够成功地使用该复制文件的资料?(是的)When the local replication unit 204 goes online again, can it automatically start transmitting the data left in the queue of the local replication unit 204 through the network or other journey links 206 (i.e. using the TCP socket protocol), and send the data to the remote replication unit , without data loss? (Yes) Note that this should be done when the local mirroring unit 204 is attached to the host system 200, before and after the local mirroring unit 204 is rebooted, and the remote mirroring unit disk drive is mounted on the host system 200 way to confirm. After this event, remote clones should still be mountable without significant need for filesystem repair. Data should not be lost and should be justified by the application that generated it. After the replication unit entity is mounted to the local host system 200, can the host system 200 mount the replicated file, and can the application program and its client on the host system 200 successfully use the data of the replicated file? (Yes)

对于例如像错误远程IP地址,或是无效的SCSI ID(小于零或大于15)的不正确信息输入,复制系统是否会损毁或当机?(不会)使用者是否可以更正信息,重新起始该软件并且让其正常执行,而不会需要将复制单元重新激活?(是的)所有的软件是否均可显示正确版本号码和版权说明?(是的)Will the replication system crash or crash for incorrect information entered eg wrong remote IP address, or invalid SCSI ID (less than zero or greater than 15)? (No) Can the user correct the information, restart the software and let it run normally without reactivating the replication unit? (Yes) Do all software display the correct version number and copyright notice? (Yes)

对于网络缆线206断线持续约30分钟或更久,而此时主机系统200正进行复制操作或是其它磁盘I/O,该本地复制单元204是否可以继续操作?(是的)而是否会被主机操作系统认定为具有正确设定容量的磁盘驱动器?(是的)是否可以读写数据到该本地复制单元204,而不会产生数据漏失?(是的)Can the local replication unit 204 continue to operate if the network cable 206 is disconnected for about 30 minutes or more while the host system 200 is performing replication operations or other disk I/O? (Yes) and will it be recognized by the host operating system as a disk drive with the correct set capacity? (Yes) Is it possible to read and write data to the local replication unit 204 without data loss? (Yes)

在起始复制操作建立起来之后,将网络缆线断线持续约24小时,然后执行一周期性的再测试操作。该本地复制单元204是否仍会被主机操作系统认定为具有正确设定容量的磁盘驱动器?(是的)是否仍然可以读写数据到该本地复制单元204,而不会产生数据漏失?(是的)After the initial replication operation was established, the network cable was disconnected for about 24 hours, and then a periodic retest operation was performed. Will the local copy unit 204 still be recognized by the host operating system as a disk drive with the correct capacity? (Yes) Is it still possible to read and write data to the local replication unit 204 without data loss? (Yes)

同样地,在强迫该主机系统200的缓冲区210满溢之后(即重复制多次),确认该本地复制单元204仍尽可能正常运作。该本地复制单元204是否仍会被主机操作系统认定为具有正确设定容量的磁盘驱动器?(是的)是否仍然可以读写数据到该本地复制单元204,而不会产生数据漏失?(是的)使用者可否将程序排入队列的操作停止并且重新开始,而不会要求该本地复制单元204重新激活?(是的)使用者可否将程序移除队列的操作停止并且重新开始,而不会要求该本地复制单元204重新激活?(是的)如果至少部分资料已复制一次以上,那么使用者可否选择性地将特定的缓冲区部分排清,即排清中止的复制操作,而不必排清整个复制操作?(是的)Likewise, after forcing the buffer 210 of the host system 200 to overflow (ie, replicate multiple times), it is confirmed that the local replication unit 204 is still functioning as well as possible. Will the local copy unit 204 still be recognized by the host operating system as a disk drive with the correct capacity? (Yes) Is it still possible to read and write data to the local replication unit 204 without data loss? (Yes) Can a user stop and restart the enqueuing of a program without requiring the local replication unit 204 to reactivate? (Yes) Can the user stop and restart the dequeuing of the program without requiring the local replication unit 204 to reactivate? (Yes) If at least some of the data has been copied more than once, can the user selectively flush specific buffer sections, i.e. flush the aborted copy operation, without flushing the entire copy operation? (Yes)

当主机系统200正进行复制操作或是其它的磁盘I/O密集操作时,将网络缆线或是其它旅程链接206断线持续约30分钟。在实体网络链接建立完成后,该本地复制单元204是否仍可开始由队列传送数据到远程复制单元?(是的)自该本地复制单元204到缓冲区状态的有效统计数字(即满溢或非满溢、缓冲区内资料区块数、由缓冲区送出而为远程所接收的资料区块数)是否仍为可用?(是的)When the host system 200 is performing a copy operation or other disk I/O intensive operation, the network cable or other journey link 206 is disconnected for about 30 minutes. After the physical network link is established, can the local replication unit 204 still start sending data to the remote replication unit through the queue? (Yes) Valid statistics from the local replication unit 204 to buffer status (i.e. overflow or not, number of data blocks in the buffer, number of data blocks sent from the buffer and received by the remote) Is it still available? (Yes)

将UPS自该本地复制单元204拔除,关闭该本地复制单元204,并等待该本地复制单元204电力中断。先将该本地复制单元204重新接上电源,然后再将该主机系统200重新接上电源,该主机系统是否正常运作?(是的)该本地复制单元204是否可完整重新激活,而不会造成该附接的主机系统200无法正常操作?(是的)当该本地复制单元204重新上线时,是否可自动透过网络或是其它旅程链接206,开始传输遗留在该本地复制单元缓冲区210内的数据,而不会产生资料漏失?(是的)注意这些远程复制挂载测试的最后两项,应于本电力失效仿真之前及之后共同执行。是否可通过?(是的)Unplug the UPS from the local replica unit 204, shut down the local replica unit 204, and wait for the local replica unit 204 to be powered off. First reconnect the local replication unit 204 to the power supply, and then reconnect the host system 200 to the power supply. Does the host system work normally? (Yes) Can the local replica unit 204 be fully reactivated without rendering the attached host system 200 inoperable? (Yes) When the local replica unit 204 comes back online, can it automatically start transmitting the data left in the local replica unit buffer 210 through the network or other journey link 206 without data loss? (Yes) Note that the last two of these remote copy mount tests should be performed both before and after this power failure simulation. Is it passable? (Yes)

此外,如主机磁盘目录容量为200Giga字节时,前述各项测试是否可通过?(是的)In addition, if the capacity of the host disk directory is 200Giga bytes, can the aforementioned tests pass? (Yes)

远程复制单元可否被关闭,且远程复制档可否被另一执行相同操作系统,而作为第一主机系统200的待命服务器所挂载?(是的)Can the remote replication unit be turned off, and can the remote replication file be mounted by another standby server running the same operating system as the first host system 200? (Yes)

然后该远程主机可否正常操作,而不会对其性能产生影响?(是的)注意前述两项测试操作由附接于与该远程复制单元与其远程复制磁盘子系统312或614同一个SCSI链上的远程备份主机来支持。Can the remote host then operate normally without impacting its performance? (Yes) Note that the preceding two test operations are supported by a remote backup host attached to the same SCSI chain as the remote copy unit and its remote copy disk subsystem 312 or 614 .

结语epilogue

本发明可提供本地端及/或远程的资料复制工具及技术。特别是一符合本发明的远程资料复制计算机系统,其中包括一个或多个弹性复制特征。本地复制系统(即其中来源与目的地距离少于10英哩)也可以具有这种弹性复制特征。The present invention can provide local and/or remote data replication tools and techniques. In particular, a remote data replication computer system according to the present invention includes one or more flexible replication features. Local replication systems (ie, where the source and destination are less than 10 miles away) can also have this elastic replication feature.

例如,该系统可具备无服务器终端设置,即本系统的一个实施例透过本地复制单元204从作为源端的本地服务器200到作为终端的远程复制单元208,408,508,608或708,不需要用到装于远程复制单元的远程服务器。For example, the system may have a serverless terminal setup, i.e. an embodiment of the system goes from a local server 200 as a source to a remote replication unit 208, 408, 508, 608 or 708 as a terminal via a local replication unit 204 without requiring Used on remote servers housed in remote replication units.

例如,该系统也可以非挥发性设置,因此不需在本地服务器200上安装专为远程资料复制设计的软件。同样,不需在包含第二服务器300系统内的第二服务器300上安装这种软件。相反地,每个复制单元均执行其操作系统以及一个或多个远程资料复制应用程序(包括执行者、程序、任务等等)。譬如说,由复制单元而非服务器来对要被复制的资料提供缓冲,产生及监控旅程链接206的联机,并且在旅程链接206上传输/接收复制资料,然后解除该服务器的操作。同样地,本系统也具有磁盘仿真的特征,使得本系统透过一标准储存子系统总线,由本地服务器200处将资料复制到本地复制单元204处。适合的标准储存子系统总线包括SCSI、光纤频道、USB以及其它非专属的总线。这些总线于此也视为到本地复制单元204处的“联机”。For example, the system can also be configured as non-volatile, so that no software designed for remote data replication needs to be installed on the local server 200 . Also, there is no need to install such software on the second server 300 included in the system containing the second server 300 . Instead, each replication unit executes its operating system and one or more remote data replication applications (including executors, programs, tasks, etc.). For example, it is the replication unit rather than the server that buffers the data to be replicated, makes and monitors connections to the journey link 206, and transmits/receives replicated data on the journey link 206, and then decommissions the server. Similarly, the system also has the feature of disk emulation, so that the system replicates data from the local server 200 to the local replication unit 204 through a standard storage subsystem bus. Suitable standard storage subsystem buses include SCSI, Fiber Channel, USB, and other non-proprietary buses. These buses are also considered herein as "connections" to the local replication unit 204 .

本系统也具有TCP旅程链接206及/或以太网络旅程线路特征的特性。例如,该系统由本地服务器200,透过作为旅程链接206的TCP客户端的本地复制单元204处来复制数据;该远程复制单元208、308、408、508、608或708作为TCP服务器端。普遍来说,旅程线路的特征值表示SCSI高频宽低迟延的要求,而原本的Off-SiteServer序列式联机、SAN联机等等均未出现于本地复制单元204和远程复制单元之间的联机206上。The system also features TCP journey links 206 and/or Ethernet journey lines. For example, the system uses the local server 200 to replicate data through the local replication unit 204 as the TCP client of the journey link 206; the remote replication unit 208, 308, 408, 508, 608 or 708 as the TCP server. Generally speaking, the characteristic value of the journey line represents the requirement of high bandwidth and low delay of SCSI, and the original Off-SiteServer serial connection, SAN connection, etc. do not appear on the connection 206 between the local replication unit 204 and the remote replication unit.

本系统也可以被视为具有多重性特征。换言之,本系统可提供由两个或更多本地(主要)服务器200到单一远程复制单元208、308、408、508、608或708的多对一复制操作。然后,该远程复制单元非挥发性储存装置的资料复制系统,可对每一个主要网络服务器200包含一磁盘扇区,而每个磁盘扇区掌握有各相对的服务器200的复制资料,供每个服务器200的一外部硬盘614,供每个服务器200的一个RAID单元312,或是如此的组合。各式各样的主要(本地)服务器200可使用相同的操作系统,或是采用不同操作系统的组合。在某些情况下,目标非挥发性储存装置容量已经足够来存装所有主要服务器200现有合并的非挥发性资料。至于另一项多重性特征,即为本系统可提供由给定的本地(主要)服务器200,到两个或更多远程复制单元208、308、408、508、608或708一对多复制操作。The system can also be viewed as having multiplicity features. In other words, the system can provide many-to-one replication operations from two or more local (primary) servers 200 to a single remote replication unit 208 , 308 , 408 , 508 , 608 or 708 . Then, the data replication system of the non-volatile storage device of the remote replication unit can include a disk sector for each main network server 200, and each disk sector holds the replication data of each relative server 200 for each An external hard disk 614 for the server 200, a RAID unit 312 for each server 200, or a combination thereof. The various primary (local) servers 200 may use the same operating system, or a combination of different operating systems. In some cases, the target non-volatile storage device capacity is sufficient to store the existing consolidated non-volatile data of all primary servers 200 . As for another multiplicity feature, the system can provide one-to-many replication operation from a given local (primary) server 200 to two or more remote replication units 208, 308, 408, 508, 608 or 708 .

本系统也可提供多种包括了安装弹性复制单元、使用该单元以及两者同时的方法。例如,提供弹性资料复制的方法,包括了至少两个由群组1300来安装的步骤。另外一种弹性资料复制的方法,则包括有一个或多个传输步骤1302。The system can also provide a variety of methods including installing the elastic replication unit, using the unit, and both. For example, a method for providing elastic data replication includes at least two steps implemented by group 1300 . Another method for elastic data replication includes one or more transmission steps 1302 .

安装步骤其中之一牵涉到以标准磁盘子系统总线202,来从本地服务器200连接到本地复制单元204,借此允许本地复制单元204仿真磁盘子系统来在链接202上进行通讯。步骤1306牵涉到连接本地复制单元204到旅程链接206,以便由至少一个以太网络联机及TCP联机进行数据传输。步骤1308处则牵涉到连接远程复制单元208、308、408、508、608或708到旅程链接206,以便由至少一个以太网络联机及TCP联机进行资料接收。而当至少前述连接步骤其中之一的某部分已完成之后,测试步骤1310即至少会测试远程复制单元208、308、408、508、608或708其中一个。One of the installation steps involves connecting from local server 200 to local mirroring unit 204 with a standard disk subsystem bus 202 , thereby allowing local mirroring unit 204 to emulate a disk subsystem to communicate over link 202 . Step 1306 involves connecting the local replication unit 204 to the journey link 206 for data transmission by at least one Ethernet connection and a TCP connection. Step 1308 involves connecting the remote replication unit 208, 308, 408, 508, 608 or 708 to the journey link 206 for data reception via at least one Ethernet connection and TCP connection. And when at least some part of one of the aforementioned connecting steps is completed, the testing step 1310 is to test at least one of the remote mirroring units 208 , 308 , 408 , 508 , 608 or 708 .

传输步骤1302其中之一即为步骤1312,而当本地复制单元204仿真一磁盘子系统时,该步骤将资料由本地服务器200处,透过标准磁盘子系统总线202而传输到本地复制单元204。步骤1314将资料由本地复制单元204处,透过旅程链接206而传输到远程复制单元208、308、408、508、608或708。而在当远程复制单元属于无服务器时,换言之,如果没有附接于第二服务器300,步骤1316(也可以如同步骤1314的数据传输执行)则是将数据由本地复制单元204,透过旅程链接206而传输到远程复制单元208、308、408、508、608或708处。One of the transmission steps 1302 is step 1312, which transfers data from the local server 200 to the local replication unit 204 through the standard disk subsystem bus 202 when the local replication unit 204 emulates a disk subsystem. Step 1314 transmits the data from the local replication unit 204 to the remote replication unit 208 , 308 , 408 , 508 , 608 or 708 through the journey link 206 . And when the remote replication unit belongs to serverless, in other words, if it is not attached to the second server 300, step 1316 (which can also be performed as the data transfer in step 1314) is to transfer the data from the local replication unit 204 through the journey link 206 and transmitted to the remote replication unit 208, 308, 408, 508, 608 or 708 place.

在这些及其它具体实施例中,本发明可拥有额外特征,像是针对于角色互换1506;热待机服务器实施方式1508;各式缓冲与其它储存特征1510、1518、1528;在SCSI或其它总线上的指令捕捉1512及回放1514;交易1516;在单一硬件平台上执行多个远程复制单元软件的实施例1520;根据于时间上的观察结果,而非一给定应用程序的储存操作的详细新进知识,的频繁接取资料识别处理1522,以支持应用程序状态复原1524;以及利用1526未授权第二服务器。In these and other embodiments, the present invention may have additional features, such as those directed to role swapping 1506; hot standby server implementation 1508; various buffering and other storage features 1510, 1518, 1528; Instruction Capture 1512 and Replay 1514 on the Internet; Transaction 1516; Executing Multiple Remote Copy Unit Software Embodiments 1520 on a Single Hardware Platform; Based on Observations Over Time, Rather Than Detailed Updates on Storage Operations for a Given Application Frequent access data identification processing 1522 for advanced knowledge, to support application state restoration 1524;

本发明具体实施例可在即使是对该远程复制单元为相当低频宽联机的情形下,亦能够遮除旅程链接206的延迟,从而能够在像是先前即使是专属光纤亦无法获用复制功能的情况下有助于得到长距离的离处(off-site)复制,以及有助于在低成本网络联机上进行复制操作等优点。即使是这种低成本联机仅具有足够支持平均盘片资料交换速度的频宽,而不是支持峰值速度的频宽,也可以利用无误。本发明具体实施例不仅适用于备份与复原,同时也可作为一高可用度的第一储存系统。在远程多对一具体实施例里,该核心模块,或一介接于该缓冲器即SCSI或其它传输协议的软件接口,可被替换成一更为一般化的使用者空间控制模块,来仿真该系统的接口而无需真实的SCSI或其它的传输协议处理层。这些装置可包括像是本地缓冲器、远程缓冲器、本地复制、远程复制及SCSI或其它传输协议层。执行SAN管理软件的硬件平台可为集中方式。Embodiments of the present invention can mask the delay of the journey link 206 even on relatively low-bandwidth connections to the remote replication unit, thereby enabling replication where previously even dedicated fiber was not available. In this case, it helps to obtain long-distance off-site (off-site) replication, and helps to perform replication operations on low-cost network connections. Even if this low-cost connection only has enough bandwidth to support the average platter data exchange speed, but not the bandwidth to support the peak speed, it can be used without error. The specific embodiments of the present invention are not only suitable for backup and recovery, but also can be used as a high-availability first storage system. In remote many-to-one embodiments, the core module, or a software interface to the buffer i.e. SCSI or other transport protocol, can be replaced with a more generalized user space control module to emulate the system interface without the need for real SCSI or other transport protocol processing layers. These devices may include, for example, local buffers, remote buffers, local replication, remote replication, and SCSI or other transport protocol layers. The hardware platform executing the SAN management software can be in a centralized manner.

兹对本发明的特别具体实施例(方法、配置设定的储存媒体、以及系统)再加以说明与描述。为避免不必要的重复,凡是可适用于一具体实施例的观念与细节,即不会在其它具体实施例上另行叙述。然而,除非有特别说明,否则此处本发明的特别具体实施例的描述仍可适用于其它具体实施例。例如,对本发明系统的讨论也属于适合其方法,反之亦然;并且创新方法的描述,亦合配于相对应的配置设定储存媒体,反之亦然。Specific embodiments (methods, storage media for configuration settings, and systems) of the present invention are illustrated and described hereafter. In order to avoid unnecessary repetition, the concepts and details applicable to one specific embodiment will not be separately described in other specific embodiments. However, descriptions herein of particular embodiments of the invention are applicable to other embodiments unless otherwise indicated. For example, the discussion of the system of the present invention is also suitable for its method, and vice versa; and the description of the innovative method is also suitable for the corresponding configuration setting storage medium, and vice versa.

本文内所撰写的“一”与“该”,以及指定项目例如“复制单元”一般皆为包括一个或多个该指定的项目。本发明亦可按其它特定型式实施的,而不会悖离其基本特性。所描述的具体实施例由各方面而言均应被视为仅具范例性而非局限于此。标题仅为便于理解。故本发明范畴由随附的权利要求所指明,并非由前述的各项描述。所有因语言文义及范围而生的变更均包括在其范畴内。As used herein, "a" and "the", as well as specified items such as "a unit of reproduction", generally include one or more of the specified items. The present invention may also be embodied in other specific forms without departing from its essential characteristics. The described embodiments are to be considered in all respects as illustrative only and not restrictive. Headings are for ease of understanding only. The scope of the invention is therefore indicated by the appended claims rather than by the foregoing description. All changes in context and scope of language are included within its scope.

Claims (13)

1.一种远程资料弹性复制方法,包括如下步骤:1. A method for elastically duplicating remote data, comprising the steps of: 在本地和远程各复制单元之间复制资料的步骤;Steps for replicating material between local and remote replication units; 执行各复制单元的本地-远程角色互换操作的步骤;Execute the steps of the local-remote role exchange operation of each replication unit; 将一第二服务器设置于热待机模式的步骤;the step of setting a second server in hot standby mode; 将变动逻辑区块编号储存于一环型缓冲器内,而非变动资料存放于环形缓冲器内的步骤;the step of storing changing logical block numbers in a ring buffer instead of changing data in the ring buffer; 探察一总线以及缓冲至少一由探察步骤所获的指令的步骤;the steps of probing a bus and buffering at least one command obtained by the probing step; 利用一核心嵌物以提供交易档案系统功能的步骤;The steps of utilizing a core insert to provide transaction file system functionality; 执行读/写/写复制操作的步骤,从一本地复制单元读取资料区块,将该资料区块按一新区块写入至一暂时性储存装置,更新一队列内的逻辑区块编号项,将新区块写入一组复制资料,以及将一新区块逻辑区块编号项增附至该队列内;The steps of performing a read/write/write copy operation are to read a data block from a local copy unit, write the data block into a temporary storage device as a new block, and update a logical block number item in a queue , write the new block into a set of replicated data, and append a new block logical block number entry to the queue; 读取数据并提供一虚拟远程复制单元的步骤;the step of reading data and providing a virtual remote copy unit; 读取资料并识别出频繁存取的资料单元,而无需关于一应用程序的储存操作顺序与频繁性的先前应用特定知识的步骤;the step of reading data and identifying frequently accessed data units without prior application-specific knowledge about the sequence and frequency of an application's storage operations; 读取资料并在利用该第二资料容量代替该第一资料容量后,从一第一资料容量对一非授权的第二资料容量予以再同步的步骤;the steps of reading data and resynchronizing an unauthorized second data volume from a first data volume after replacing the first data volume with the second data volume; 读取资料并于相同实体储存系统上维护一经复制资料单元的有序队列及一经复制资料单元的目前复制的步骤,借此即无必要将相同的资料单元写入到该储存系统两次来实施一实体分割系统。The process of reading data and maintaining an ordered queue of copied data units and a current copy of a copied data unit on the same physical storage system, thereby eliminating the need to write the same data unit to the storage system twice to implement An entity segmentation system. 2.如权利要求1所述的复制方法,其特征在于,所述的储存变动逻辑区块编号的步骤进一步包含在当一对应于该逻辑区块编号的区块被覆写时,于缓冲器内现场改变一逻辑区块编号,以参指到位于第二位置的数据,而非参指到位于第一位置的数据的步骤,该第一位置保存该区块被覆写之前对于该区块的数据,而该第二位置则保存该区块被覆写之后对于该区块的资料。2. The duplication method according to claim 1, wherein the step of storing the changed logical block number further comprises: when a block corresponding to the logical block number is overwritten, in the buffer the step of changing a logical block number in situ to refer to data at a second location rather than to data at a first location that holds data for the block before the block was overwritten , and the second location stores the data of the block after the block is overwritten. 3.如权利要求1所述的复制方法,其特征在于,所述的将一第二服务器设置于热待机模式的步骤包含如下步骤,开机激活该第二服务器,然后从一现属远程复制单元,将一“媒体未待机”信号提供给该第二服务器,由此该复制单元的仿真层会响应来自于该第二服务器关于大小及可用性的请求,但会回拒该第二服务器接取至资料内容,一直到该复制单元的角色既已改变为止。3. The duplication method according to claim 1, wherein said step of setting a second server in hot standby mode comprises the following steps: booting to activate the second server, and then from an existing remote duplication unit , providing a "media not standby" signal to the second server, whereby the emulation layer of the replication unit responds to requests from the second server for size and availability, but rejects the second server's access to data content until the role of the replication unit has changed. 4.如权利要求1所述的复制方法,其特征在于,该探察一总线以及缓冲至少一由探察步骤所获的指令的步骤进一步包含一步骤,将读取性质指令从写入性质指令切割离出,这些读取性质指令来自于一属读取性质的探察总线主机控制器的请求,而这些写入性质指令则是来自于一属写入性质的探察总线主机控制器的请求,且其中该缓冲步骤会缓冲写入性质指令。4. The replication method of claim 1, wherein the step of probing a bus and buffering at least one command obtained by the probing step further comprises a step of separating read-type commands from write-type commands It can be seen that the read-type commands are from a read-type probe bus master controller request, and the write-type commands are from a write-type probe bus master controller request, and wherein the The buffer step buffers write-nature instructions. 5.如权利要求1所述的复制方法,其特征在于,该探察一总线以及缓冲至少一由探察步骤所获的指令的步骤进一步包含一将经缓冲的指令从一本地复制单元越过一通讯链路而传送到一远程复制单元的步骤。5. The replication method of claim 1, wherein the step of probing a bus and buffering at least one command obtained by the probing step further comprises passing the buffered command from a local replication unit across a communication link The step of transferring to a remote replication unit. 6.如权利要求1所述的复制方法,其特征在于,该探察一总线以及缓冲至少一由探察步骤所获的指令的步骤进一步包含一从一远程复制单元回放指令的步骤,该指令为一本地复制单元所缓冲。6. The replication method of claim 1, wherein the step of probing a bus and buffering at least one command obtained by the probing step further comprises a step of replaying a command from a remote replication unit, the command being a buffered by the local replication unit. 7.一种远程资料弹性复制系统,其特征在于:所述的系统包含:7. A remote data elastic replication system, characterized in that: the system includes: 至少两个复制单元;该复制单元包含一缓冲器,以及一将经变动逻辑区块编号储存于缓冲器内而非将变动资料存放于缓冲器内的储存装置;该复制单元还包含一资料储存媒体及一仿真层,该仿真层具有一以借由提供存放在该储存媒体内的资料的特征并借回拒该第二服务器接取至该资料的内容的请求以响应来自于该第二服务器的请求的装置;At least two replication units; the replication unit includes a buffer, and a storage device that stores changed logical block numbers in the buffer instead of storing changed data in the buffer; the replication unit also includes a data storage media and an emulation layer, the emulation layer has a response from the second server by providing the characteristics of the data stored in the storage medium and by rejecting the second server's request to access the content of the data the requested device; 设定于该系统内执行一复制单元的本地-远程角色互换操作的装置;means configured within the system to perform a local-remote role-swap operation of a replication unit; SCSI总线,以及至少一用以复制资料、用以探察该SCSI总线、用以缓冲至少一由探察所获的SCSI指令的装置;a SCSI bus, and at least one device for copying data, probing the SCSI bus, and buffering at least one SCSI command obtained by probing; 一核心嵌物,以于资料复制过程中提供交易档案系统功能;A core insert to provide transactional file system functionality during the data replication process; 一读/写/写复制执行装置,用以从一本地复制读取资料区块,将该资料区块按一新区块写入至一暂时性储存装置,更新一数据结构内的逻辑区块编号项,将新区块写入一组复制资料,以及将一新区块逻辑区块编号项增附至该数据结构内;A read/write/write copy execution device for copying and reading a data block from a local, writing the data block as a new block to a temporary storage device, and updating a logical block number in a data structure entry, writes the new block into a set of replicated data, and appends a new block logical block number entry to the data structure; 一用以识别出频繁存取的资料单元而无需应用特定知识的装置;a means for identifying frequently accessed data units without the application of specific knowledge; 利用该第二资料容量代替该第一资料容量后从一第一资料容量对一非授权的第二资料容量予以再同步的装置;means for resynchronizing an unauthorized second data volume from a first data volume after replacing the first data volume with the second data volume; 于相同实体储存系统上维护一经复制资料单元的有序队列及一经复制资料单元的目前复制的装置。An ordered queue of replicated data units and a currently replicated device of the replicated data units are maintained on the same physical storage system. 8.如权利要求7所述的复制系统,其特征在于,该储存装置包含一虚拟区块配置结构。8. The replication system of claim 7, wherein the storage device comprises a virtual block configuration structure. 9如权利要求8所述的复制系统,其特征在于,该虚拟区块配置结构包括区块检查总和,而不是区块资料。9. The replication system of claim 8, wherein the virtual block configuration structure includes block checksums instead of block data. 10.如权利要求9所述的复制系统,其特征在于,该复制系统在两个复制单元的再同步过程中,将区块检查总和越过一旅程链接而传送到另一复制单元,而不是越过该旅程链接传送区块资料。10. The replication system of claim 9, wherein the replication system transfers block checksums across a journey link to another replication unit during resynchronization of two replication units, rather than across The journey link transmits block data. 11.如权利要求7所述的复制系统,其特征在于,包含一本地系统及一远程系统,其中该远程系统包含一装置,该装置可接收来自该本地系统的资料变动信息,保持一经缓冲于该远程系统上的资料变动的日志文件,以及可按一管理者请求支持改变角色互换。11. The replication system as claimed in claim 7, comprising a local system and a remote system, wherein the remote system comprises a device which can receive data change information from the local system and keep a buffered Log files of data changes on the remote system, and support for changing role swaps at the request of an administrator. 12.如权利要求7所述的复制系统,其特征在于,包含在单一硬件平台上至少两个虚拟远程复制单元。12. The replication system of claim 7, comprising at least two virtual remote replication units on a single hardware platform. 13.如权利要求7所述的复制系统,其特征在于,该用以识别出频繁存取的资料单元而无需应用特定知识的装置,可借由在各状态区块更新操作间,未递送接续资料区块及未递送状态区块更新内容,一直到该应用程序状态确已复原为止的方式,来协调用以协助复原该应用程序状态的装置。13. The replication system of claim 7, wherein the means for identifying frequently accessed data units without application of specific knowledge can be performed by undelivering continuation between state block update operations The data block and the undelivered state block are updated until the application state is indeed restored to coordinate means for assisting in restoring the application state.
CNB018131956A 2000-06-05 2001-06-02 Remote data flexible replication system and method Expired - Fee Related CN1256672C (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US20946900P 2000-06-05 2000-06-05
US60/209,469 2000-06-05
US22393400P 2000-08-09 2000-08-09
US60/223,934 2000-08-09
US26214301P 2001-01-16 2001-01-16
US60/262,143 2001-01-16

Publications (2)

Publication Number Publication Date
CN1457457A CN1457457A (en) 2003-11-19
CN1256672C true CN1256672C (en) 2006-05-17

Family

ID=27395378

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB018131956A Expired - Fee Related CN1256672C (en) 2000-06-05 2001-06-02 Remote data flexible replication system and method

Country Status (10)

Country Link
EP (1) EP1305711A4 (en)
JP (1) JP4945047B2 (en)
KR (1) KR20030066331A (en)
CN (1) CN1256672C (en)
AU (2) AU2001265335B2 (en)
BR (1) BR0111422A (en)
CA (1) CA2449984A1 (en)
IL (2) IL153163A0 (en)
MX (1) MXPA02012065A (en)
WO (1) WO2001097030A1 (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0227786D0 (en) 2002-11-29 2003-01-08 Ibm Improved remote copy synchronization in disaster recovery computer systems
JP4598387B2 (en) 2003-09-17 2010-12-15 株式会社日立製作所 Storage system
US7219201B2 (en) 2003-09-17 2007-05-15 Hitachi, Ltd. Remote storage disk control device and method for controlling the same
US7162551B2 (en) * 2003-10-31 2007-01-09 Lucent Technologies Inc. Memory management system having a linked list processor
JP4401895B2 (en) * 2004-08-09 2010-01-20 株式会社日立製作所 Computer system, computer and its program.
US7464124B2 (en) * 2004-11-19 2008-12-09 International Business Machines Corporation Method for autonomic data caching and copying on a storage area network aware file system using copy services
US7568056B2 (en) 2005-03-28 2009-07-28 Nvidia Corporation Host bus adapter that interfaces with host computer bus to multiple types of storage devices
US9195397B2 (en) 2005-04-20 2015-11-24 Axxana (Israel) Ltd. Disaster-proof data recovery
JP4977688B2 (en) * 2005-04-20 2012-07-18 アクサナ・(イスラエル)・リミテッド Remote data mirroring system
US7707453B2 (en) 2005-04-20 2010-04-27 Axxana (Israel) Ltd. Remote data mirroring system
WO2009047751A2 (en) 2007-10-08 2009-04-16 Axxana (Israel) Ltd. Fast data recovery system
US8037240B2 (en) * 2007-10-24 2011-10-11 International Business Machines Corporation System and method for using reversed backup operation for minimizing the disk spinning time and the number of spin-up operations
EP2286343A4 (en) 2008-05-19 2012-02-15 Axxana Israel Ltd Resilient data storage in the presence of replication faults and rolling disasters
US8289694B2 (en) 2009-01-05 2012-10-16 Axxana (Israel) Ltd. Disaster-proof storage unit having transmission capabilities
CN101997902B (en) * 2009-08-28 2015-07-22 云端容灾有限公司 Remote online backup system and method based on station segment transmission
WO2011067702A1 (en) 2009-12-02 2011-06-09 Axxana (Israel) Ltd. Distributed intelligent network
GB2499747B (en) * 2010-11-22 2014-04-09 Seven Networks Inc Aligning data transfer to optimize connections established for transmission over a wireless network
US8909996B2 (en) * 2011-08-12 2014-12-09 Oracle International Corporation Utilizing multiple storage devices to reduce write latency for database logging
US9135164B2 (en) * 2013-03-15 2015-09-15 Virident Systems Inc. Synchronous mirroring in non-volatile memory systems
US10769028B2 (en) 2013-10-16 2020-09-08 Axxana (Israel) Ltd. Zero-transaction-loss recovery for database systems
KR102157396B1 (en) 2013-12-11 2020-09-17 주식회사 알티캐스트 System and method of providing a related service using still image or moving picture
KR102157399B1 (en) 2013-12-19 2020-09-17 주식회사 알티캐스트 System and method of providing a related service using consecutive query images
US10379958B2 (en) 2015-06-03 2019-08-13 Axxana (Israel) Ltd. Fast archiving for database systems
US10003835B2 (en) * 2015-06-24 2018-06-19 Tribune Broadcasting Company, Llc Device control in backup media-broadcast system
CN107015887A (en) * 2017-02-21 2017-08-04 深圳市中博睿存信息技术有限公司 Object stores remote copy method and system
US10592326B2 (en) 2017-03-08 2020-03-17 Axxana (Israel) Ltd. Method and apparatus for data loss assessment
RU2726318C1 (en) * 2020-01-14 2020-07-13 Юрий Иванович Стародубцев Method for backing up complex object state
CN113742129B (en) * 2020-05-28 2024-05-28 珠海信核数据科技有限公司 Data backup method and device
US11537633B2 (en) 2020-11-06 2022-12-27 Oracle International Corporation Asynchronous cross-region block volume replication
JP7556593B1 (en) 2023-03-13 2024-09-26 Necプラットフォームズ株式会社 Control system, control device, control method, and program

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994000816A1 (en) * 1992-06-18 1994-01-06 Andor Systems, Inc. Remote dual copy of data in computer systems
JPH07146810A (en) * 1993-09-27 1995-06-06 Toshiba Corp Computer system
KR0128271B1 (en) * 1994-02-22 1998-04-15 윌리암 티. 엘리스 Remote data duplexing
US5574950A (en) * 1994-03-01 1996-11-12 International Business Machines Corporation Remote data shadowing using a multimode interface to dynamically reconfigure control link-level and communication link-level
US5592618A (en) * 1994-10-03 1997-01-07 International Business Machines Corporation Remote copy secondary data copy validation-audit function
US5870537A (en) * 1996-03-13 1999-02-09 International Business Machines Corporation Concurrent switch to shadowed device for storage controller and device errors
US6052797A (en) * 1996-05-28 2000-04-18 Emc Corporation Remotely mirrored data storage system with a count indicative of data consistency
US6101497A (en) * 1996-05-31 2000-08-08 Emc Corporation Method and apparatus for independent and simultaneous access to a common data set
US5794254A (en) * 1996-12-03 1998-08-11 Fairbanks Systems Group Incremental computer file backup using a two-step comparison of first two characters in the block and a signature with pre-stored character and signature sets
JPH1139273A (en) * 1997-07-17 1999-02-12 Chubu Nippon Denki Software Kk Remote backup system
JPH1185594A (en) * 1997-09-01 1999-03-30 Hitachi Ltd Information processing system for remote copy
US6065018A (en) * 1998-03-04 2000-05-16 International Business Machines Corporation Synchronizing recovery log having time stamp to a remote site for disaster recovery of a primary database having related hierarchial and relational databases
JPH11305947A (en) * 1998-04-17 1999-11-05 Fujitsu Ltd Remote transfer method by magnetic disk controller

Also Published As

Publication number Publication date
BR0111422A (en) 2004-02-10
CA2449984A1 (en) 2001-12-20
EP1305711A4 (en) 2007-05-02
WO2001097030A1 (en) 2001-12-20
IL153163A (en) 2008-11-26
MXPA02012065A (en) 2003-04-25
AU2001265335B2 (en) 2007-01-25
JP2004523017A (en) 2004-07-29
IL153163A0 (en) 2003-06-24
JP4945047B2 (en) 2012-06-06
AU6533501A (en) 2001-12-24
EP1305711A1 (en) 2003-05-02
KR20030066331A (en) 2003-08-09
CN1457457A (en) 2003-11-19

Similar Documents

Publication Publication Date Title
CN1256672C (en) Remote data flexible replication system and method
US7203732B2 (en) Flexible remote data mirroring
TW454120B (en) Flexible remote data mirroring
US8706694B2 (en) Continuous data protection of files stored on a remote storage device
JP5420242B2 (en) System and method for high performance enterprise data protection
EP1999584B1 (en) Method for improving mean time to data loss (mtdl) in a fixed content distributed data storage
US10185583B1 (en) Leveraging snapshots
US9606881B1 (en) Method and system for rapid failback of a computer system in a disaster recovery environment
AU2001265335A1 (en) Flexible remote data mirroring
CN102187311B (en) Methods and systems for recovering a computer system using a storage area network
CN1581105A (en) Remote copy system
KR20010041762A (en) Highly available file servers
CN1906593A (en) System and method for failover
AU2009324800A1 (en) Method and system for managing replicated database data
US7047261B2 (en) Method for file level remote copy of a storage device
US12235867B2 (en) Replication progress tracking technique
US12038817B2 (en) Methods for cache rewarming in a failover domain and devices thereof
WO2024174477A1 (en) Synchronous remote replication method and apparatus for storage system
Asami et al. Designing a self-maintaining storage system
JP2008033967A (en) EXTERNAL STORAGE DEVICE, DATA RECOVERY METHOD FOR EXTERNAL STORAGE DEVICE, AND PROGRAM
JPH09218840A (en) Information processing method, its apparatus, and information processing system
Kleiman et al. Using NUMA interconnects for highly available filers
US8941863B1 (en) Techniques for image duplication optimization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee