[go: up one dir, main page]

CN102652423B - Method and system for cluster selection and cooperative replication - Google Patents

Method and system for cluster selection and cooperative replication Download PDF

Info

Publication number
CN102652423B
CN102652423B CN201080055666.7A CN201080055666A CN102652423B CN 102652423 B CN102652423 B CN 102652423B CN 201080055666 A CN201080055666 A CN 201080055666A CN 102652423 B CN102652423 B CN 102652423B
Authority
CN
China
Prior art keywords
cluster
family
clusters
family member
members
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201080055666.7A
Other languages
Chinese (zh)
Other versions
CN102652423A (en
Inventor
J·斯文格勒
T·W·毕施
R-J·Y·特维托
能田毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN102652423A publication Critical patent/CN102652423A/en
Application granted granted Critical
Publication of CN102652423B publication Critical patent/CN102652423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0686Libraries, e.g. tape libraries, jukebox
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0682Tape device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)
  • Developing Agents For Electrophotography (AREA)

Abstract

公开了创建用于集群选择和协作复制的集群族的装置、系统和方法。基于它们的关系和角色,将集群分组到集群族的族成员。集群族的成员确定哪个族成员在获得复制信息的最佳位置并且变为与他们的集群族内的累积一致。一旦集群族变为累积一致,在集群族内共享数据使得集群族内所有拷贝是一致的。

Disclosed are apparatus, systems, and methods for creating cluster families for cluster selection and collaborative replication. Clusters are grouped into family members of a cluster family based on their relationships and roles. Members of a cluster family determine which family member is best positioned to obtain replication information and become cumulatively consistent within their cluster family. Once the cluster family becomes cumulatively consistent, data is shared within the cluster family so that all copies within the cluster family are consistent.

Description

用于集群选择和协作复制的方法和系统Method and system for cluster selection and cooperative replication

技术领域technical field

本发明涉及与数据存储系统相关的数据存储,并且更特别地涉及存储系统中的集群。The present invention relates to data storage in relation to data storage systems, and more particularly to clustering in storage systems.

背景技术Background technique

存储系统可以包括多个磁带设备,所述磁带设备用于使用库管理器来访问多个磁带。磁带可以被布置在录音带盒中。控制器可以指示传动器将盒式磁带从存储区域转移到磁带驱动器,从而访问在磁带上写入的数据和/或将数据写入到磁带中。A storage system may include multiple tape devices for accessing multiple tapes using a library manager. The magnetic tape may be arranged in an audio cassette. The controller may instruct the actuator to transfer the tape cartridge from the storage area to the tape drive, thereby accessing data written on the tape and/or writing data to the tape.

存储系统可以位于包括多个地理上不同的站点的多个站点。存储系统可以通过一个或多个网络进行通信。每个存储系统可以包括多个集群。每个集群可以包括多个磁带驱动器。将磁带安装到磁带驱动器,从而从磁带读取数据并且将数据写入磁带。Storage systems may be located at multiple sites including multiple geographically distinct sites. Storage systems can communicate over one or more networks. Each storage system can include multiple clusters. Each cluster can include multiple tape drives. Mounts a tape to a tape drive, reading data from the tape and writing data to the tape.

可以将每个磁带组织成一个或多个逻辑卷,此处被称为卷。卷可以对主机呈现为不同的存储设备。可以将卷逻辑地“安装”在虚拟磁带驱动器上。如此处所使用的,虚拟磁带驱动器是对主机呈现为磁带驱动器的逻辑构造。Each tape can be organized into one or more logical volumes, referred to here as volumes. Volumes can appear to hosts as different storage devices. Volumes can be logically "mounted" on virtual tape drives. As used herein, a virtual tape drive is a logical construct that appears to a host as a tape drive.

US 2009/0132657(Sutani,M R,等)公开了分布式结构中跨集群的数据分区,其中缓存节点的动态复制基于伙伴复制的概念。伙伴复制允许由集群内有限数量的节点来复制数据并且提供降低的网络复制业务。US 2009/0132657 (Sutani, MR, et al.) discloses data partitioning across clusters in a distributed architecture, where dynamic replication of cache nodes is based on the concept of buddy replication. Buddy replication allows data to be replicated by a limited number of nodes within a cluster and provides reduced network replication traffic.

US 2009/0030986A1(Bstes,J W)公开了在复制集群内实现的远程异步数据复制过程,其实现点到点数据复制。通过复制网络上站点之间的双向数据,点到点拓扑允许局部站点的主要存储将数据分布到远程站点。US 2009/0030986A1 (Bstes, J W) discloses a remote asynchronous data replication process implemented in a replication cluster, which realizes point-to-point data replication. Point-to-point topology allows primary storage at a local site to distribute data to remote sites by replicating bi-directional data between sites on the network.

在多集群配置中,每个集群平等地独立于所有其它集群。在没有关系意识的一些分类的情况下,集群不能基于他们角色和/或与其它集群的距离按最有效率的方式来操作。在典型的网格配置中,这种对关系的没有意识极大地影响了在安装处理期间选择集群作为卷的源的手段以及集群兑现卷复制的能力。例如,网格可以与城域远程集群相比倾向于选择全球远程源集群用于安装和/或拷贝处理。由于集群之间的网络距离,全球远程集群的效率要低得多。尽管可以将实时延迟检查引入以检测所述距离,广域网(WAN)的不规则和随机性使得非常难以可靠地测量相对距离。更深一步地,如果组一起工作并且累积地复制数据到组中,并且然后彼此复制,跨组中两个或多个集群距离的自身复制能够更有效率。In a multi-cluster configuration, each cluster is equally independent of all other clusters. Without some classification of relationship awareness, clusters cannot operate in the most efficient manner based on their role and/or distance from other clusters. In typical grid configurations, this lack of awareness of relationships greatly affects the means by which a cluster is selected as a source for a volume during the mount process and the cluster's ability to honor volume replication. For example, a grid may prefer to select a global remote source cluster for installation and/or copy processing over a metro remote cluster. Global remote clusters are much less efficient due to the network distance between clusters. Although real-time latency checks can be introduced to detect such distances, the irregularities and randomness of wide area networks (WANs) make it very difficult to reliably measure relative distances. Further, self-replication across two or more cluster distances in a group can be more efficient if the groups work together and cumulatively replicate data into the group and then each other.

因此,现有技术中存在解决上述问题的需要。Therefore, there is a need in the prior art to solve the above-mentioned problems.

发明内容Contents of the invention

提供用于创建集群族、选择集群族成员或多个集群族以及族成员和不同族之间的协作复制的方法、装置和系统。例如,基于集群的关系将集群分组为集群族的族成员。集群族的成员确定哪个族成员处于最好的位置,以获得外部数据对象并且变为与他们的集群族内的外部数据对象累积地一致。一旦集群族变为累积地一致,在集群族内共享数据对象,使得集群族内所有的集群具有每个外部数据对象的已知拷贝。Methods, apparatus and systems are provided for creating a cluster family, selecting a cluster family member or cluster families, and collaborative replication between family members and different families. For example, cluster-based relationships group clusters into family members of cluster families. Members of the cluster family determine which family member is in the best position to obtain and become cumulatively consistent with the external data objects within their cluster family. Once a cluster family becomes cumulatively consistent, data objects are shared within the cluster family such that all clusters within the cluster family have a known copy of each external data object.

从一个方面来看,本发明提供一种包括计算机可使用媒介的计算机程序产品,所述计算机可读媒介包括计算机可读程序。当在计算机上执行计算机可读程序时,促使计算机:将多个集群分组到集群族的族成员;确定哪个族成员处于获得来自源的外部数据的最佳位置;选择集群族中的一个或多个族成员以获得数据;将数据复制到集群族;在至少两个外部数据对象的集群族内实现累积一致性;并且共享集群族内的数据,使得集群族内的所有集群具有每个外部数据对象的一致拷贝。Viewed from one aspect, the invention provides a computer program product comprising a computer-usable medium comprising a computer-readable program. The computer readable program, when executed on a computer, causes the computer to: group a plurality of clusters into cluster members; determine which family member is in the best position to obtain external data from a source; select one or more of the cluster clusters family members to obtain data; copy data to a cluster family; achieve cumulative consistency within a cluster family of at least two external data objects; and share data within a cluster family such that all clusters within a cluster family have each external data A consistent copy of the object.

从另一方面来看,本发明提供一种用于多个集群的协作复制的方法。所述方法包括将多个集群安排到集群族的族成员中;在族成员之间进行协商,确定哪个族成员位于从源获得数据的最佳位置;选择集群族的一个或多个族成员,以获得数据;协作地将数据复制到集群族中;在集群族内实现累积一致性;并且在集群族内共享数据,使得集群族内的所有数据拷贝是一致的。Viewed from another aspect, the invention provides a method for cooperative replication of multiple clusters. The method includes arranging a plurality of clusters into family members of a cluster family; negotiating among the family members which family member is best positioned to obtain data from a source; selecting one or more family members of the cluster family, obtain data; cooperatively copy the data into the cluster family; achieve cumulative consistency within the cluster family; and share data within the cluster family so that all copies of the data within the cluster family are consistent.

从另一方面来看,本发明提供一种创建集群族和族成员以执行协作复制的装置。所述装置包括多个模块,被配置为功能性地执行以下步骤:创建集群族和族成员;应用协作复制;以及基于集群关系来选择集群族和族成员。所介绍的实施方式中的这些模块可以包括关系模块、创建模块、协作复制模块、安装处理模块、通信模块以及策略模块或他们的任意组合。Viewed from another aspect, the present invention provides an apparatus for creating cluster families and family members to perform cooperative replication. The apparatus includes a plurality of modules configured to functionally perform the steps of: creating a cluster family and family members; applying collaborative replication; and selecting a cluster family and family members based on cluster relationships. These modules in the described embodiments may include a relationship module, a creation module, a collaborative replication module, an installation processing module, a communication module, and a policy module or any combination thereof.

关系模块包括在处理器上执行的计算机可读程序并且保持定义角色、规则和集群族和族成员之间的关系的因素。所述集群通过网络进行通信。每个集群包括至少一个缓存,例如,虚拟卷缓存。The relationship module includes a computer readable program that executes on the processor and maintains factors that define roles, rules, and relationships between cluster families and family members. The clusters communicate over a network. Each cluster includes at least one cache, eg, a virtual volume cache.

创建模块包括在处理器上执行的计算机可读程序并且创建用于集群选择和协作复制的集群族。在本发明的优先实施方式中,创建模块通过基于集群的关系和角色将集群分组到族成员中来创建集群族。在可替换的实施方式中,创建模块在配置上将集群分配到族中。在可替换的实施方式中,创建模块创建不同集群之间的关系并且将集群分组到族中。The creation module includes a computer readable program that executes on the processor and creates a family of clusters for cluster selection and cooperative replication. In a preferred embodiment of the invention, the creation module creates cluster families by grouping clusters into cluster members based on cluster relationships and roles. In an alternative embodiment, the creation module configures clusters into families. In an alternative embodiment, the creation module creates relationships between different clusters and groups clusters into families.

协作复制模块包括在处理器上执行的计算机可读程序,并且跨集群族中集群族成员且跨不同集群族协作地复制数据。The cooperative replication module includes a computer readable program executing on a processor and cooperatively replicating data across cluster family members in the cluster family and across different cluster families.

安装处理模块在处理器上执行的计算机可读程序,并且支持集群族内的族成员比其它集群族更用于生产目的。The installation processing module executes a computer readable program on a processor and supports family members within a cluster family for production purposes more than other cluster families.

从另一方面来看,本发明提供了一种用于多个集群的协作复制的系统。所述系统包括网络;通过网络进行通信的多个站点,每个集群包括至少一个主机和存储系统,所述存储系统包括多个集群,每个集群包括被配置为访问在磁带上存储的卷的至少一个磁带驱动器,至少一个磁带卷缓存,以及集群管理器,被配置为使用处理器和存储器来执行计算机可读程序,其中软件可读程序包括:创建模块,被配置为建立集群组以及将集群组安排到集群族的族成员中;以及协作复制模块,被配置为选择族成员以协作地复制外部数据对象到集群族中并且实现累积族一致性。Viewed from another aspect, the present invention provides a system for cooperative replication of multiple clusters. The system includes a network; a plurality of sites in communication over the network, each cluster including at least one host computer and a storage system, the storage system including a plurality of clusters, each cluster including a at least one tape drive, at least one tape volume cache, and a cluster manager configured to execute a computer readable program using a processor and memory, wherein the software readable program includes: a creation module configured to establish a cluster group and A cluster group is arranged into the family members of the cluster family; and a cooperative replication module is configured to select the family members to cooperatively replicate the external data object into the cluster family and achieve cumulative family consistency.

附图说明Description of drawings

下面将仅通过实例的方式并参考如附图中所示的优选实施方式来介绍本发明,其中:The invention will be described below, by way of example only, with reference to a preferred embodiment as shown in the accompanying drawings, in which:

图1是示出了根据本发明的分布式站点的优选实施方式的示意性框图;Figure 1 is a schematic block diagram illustrating a preferred embodiment of a distributed site according to the present invention;

图2A和2B是示出了根据本发明的存储系统的优选实施方式的示意性框图;2A and 2B are schematic block diagrams illustrating a preferred embodiment of a storage system according to the present invention;

图3是示出了本发明的集群的优选实施方式的示意性框图;Figure 3 is a schematic block diagram illustrating a preferred embodiment of the cluster of the present invention;

图4是示出了本发明的集群族装置的优选实施方式的示意性框图;Figure 4 is a schematic block diagram illustrating a preferred embodiment of the cluster cluster device of the present invention;

图5是示出了本发明的协作复制方法和集群族选择的优选实施方式的示意性流程图;以及Figure 5 is a schematic flow diagram illustrating a preferred embodiment of the collaborative replication method and cluster family selection of the present invention; and

图6A和6B是示出了本发明的协作复制方法和集群族选择的优选实施方式的示意性流程图。6A and 6B are schematic flowcharts illustrating a preferred embodiment of the collaborative replication method and cluster family selection of the present invention.

具体实施方式Detailed ways

本说明书中对特征、优点或相似语言的参考并不意味着,可以利用本发明来实现的所有特征和优点应当在或全部在本发明的任意单个实施方式中。而是,将涉及特征和优点的语言理解为表示:结合优选实施方式介绍的特定特征、优点或特性包括在本发明的至少一个实施方式中。因此,整个所述说明书中的特征、优点和相似语言的讨论可以,但不是必须地,指代相同的实施方式。Reference in this specification to features, advantages or similar language does not imply that all or all of the features and advantages that can be achieved with the invention should be in any single embodiment of the invention. Rather, language referring to the features and advantages is understood to mean that a particular feature, advantage, or characteristic described in connection with a preferred embodiment is included in at least one embodiment of the present invention. Thus, discussions of the features, advantages, and similar language throughout this specification may, but do not necessarily, refer to the same embodiment.

此外,本发明的上述特征、优点和特性可以在一个或多个实施方式中按任意合适方式来结合。所属领域的技术人员将认识到,本发明可以在没有特殊实施方式的一个或多个特定特征或优点的情况下来实现。Furthermore, the above-described features, advantages, and characteristics of the invention may be combined in any suitable manner in one or more embodiments. Those skilled in the art will recognize that the invention can be practiced without one or more of the specific features or advantages of a particular embodiment.

通过参考附图在下面的说明的实施方式中介绍本发明,其中相同的附图标记表示相同或相似的元件。虽然根据用于实现本发明的目标的最佳模式来介绍本发明,所属领域的技术人员应了解的是,在不脱离本发明的范围的情况下根据这些教导可以完成改变。The present invention is described in the following illustrative embodiments with reference to the accompanying drawings, in which like reference numerals designate like or similar elements. While the invention has been described in terms of the best mode for carrying out the objects of the invention, those skilled in the art will appreciate that changes may be made in light of these teachings without departing from the scope of the invention.

将本说明书中介绍的许多功能单元标记为模块,以更特殊地强调他们的实现方式的独立性。例如,可以将模块实现为硬件电路,包括惯用的超大规模集成电路(VLSI)或门阵列,现成的半导体,例如逻辑芯片、晶体管或其它不相关联组件。还可以在可编程硬件设备中实现模块,可编程硬件设备例如是现场可编程门阵列(FPGA)、可编程阵列逻辑、可编程逻辑设备等。还可以由各种类型的处理器执行的软件中实现模块。可执行模块的识别模块例如包括一个或多个物理或逻辑计算机指令块,所述指令块可以例如被组织成对象、过程或功能。然而,可执行的识别模块不需要在物理位置上处在一起,但是可以包括在不同位置处存储的完全不同的指令,当将上述指令逻辑上连接在一起时,包括模块并且实现用于模块的所表述的目的。Many of the functional units introduced in this specification are marked as modules to more specifically emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising conventional very large scale integration (VLSI) or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. Modules may also be implemented in programmable hardware devices, such as field programmable gate arrays (FPGAs), programmable array logic, programmable logic devices, and the like. Modules may also be implemented in software that is executed by various types of processors. An identified module as an executable module, for example, comprises one or more blocks of physical or logical computer instructions which may, for example, be organized as an object, procedure or function. However, executable identified modules need not be physically located together, but may include disparate instructions stored at different locations which, when logically linked together, comprise the modules and implement the functions for the modules. stated purpose.

实际上,可执行节点的模块可以是单个指令、或多个指令,并且甚至可以在几个不同的代码段上、在不同的程序间、以及跨几个存储器设备上分布。相似地,此处在模块中可以识别和说明可操作的数据。可以将可操作的数据收集作为单个数据集合,或可以在包括不同存储设备的不同位置上分布。Indeed, a module of an executable node may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, actionable data can be identified and described in modules here. The operational data collection can be as a single collection of data, or can be distributed across different locations including different storage devices.

本说明书中队“优选实施方式”、“优选实施方式”或相似语言的参考意味着,结合本实施方式介绍的特殊特征、结构或特点包括在本发明的至少一个优选实施方式中。因此,本说明书中出现的短语“在优选实施方式中”、“在优选实施方式中”以及相似语音可以但无须全部指代相同的实施方式。Reference in this specification to "the preferred embodiment," "preferred embodiment" or similar language means that a particular feature, structure, or characteristic described in connection with this embodiment is included in at least one preferred embodiment of the invention. Thus, appearances of the phrases "in a preferred embodiment," "in a preferred embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

此外,本发明的所述特征、结构或特性可以在一个或多个实施方式中按任意合适的方式来结合。在下面的说明中,提供大量的具体细节,例如,编程、软件模块、用户选择、网络事务、数据库查询、数据库结构、硬件模块、硬件电路、硬件芯片等的实例,以提供本发明的实施方式的彻底理解。然而,所属领域的技术人员将认识到,可以在不使用一个或多个具体细节、或其它方法、组件、材料等来实现本发明。在其他实例中,没有详细地介绍和示出已知的结构、材料或操作,以避免混淆本发明的方面。Furthermore, the described features, structures or characteristics of the invention may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of programming, software modules, user selections, network transactions, database queries, database structures, hardware modules, hardware circuits, hardware chips, etc., to provide embodiments of the invention. a thorough understanding. One skilled in the art will recognize, however, that the invention may be practiced without one or more of the specific details, or other methods, components, materials, or the like. In other instances, well-known structures, materials, or operations are not described or shown in detail to avoid obscuring aspects of the invention.

图1是示出了根据本发明的分布式站点100的优选实施方式的示意性框图。分布式站点100包括多个站点105。每个站点105可通过网络110与其它站点105进行通信。网络110可以是互联网、局域网(LAN)、广域网(WAN)、专用网络、网络的结合等。Fig. 1 is a schematic block diagram illustrating a preferred embodiment of a distributed site 100 according to the present invention. Distributed site 100 includes multiple sites 105 . Each site 105 may communicate with other sites 105 over network 110 . Network 110 may be the Internet, a local area network (LAN), a wide area network (WAN), a private network, a combination of networks, or the like.

每个站点105可以包括一个或多个存储系统,如此后将介绍的。此外,每个站点105可以包括将存储系统连接到网络110的网桥、路由器等。Each site 105 may include one or more storage systems, as will be described hereinafter. Additionally, each site 105 may include bridges, routers, etc. to connect the storage systems to the network 110 .

图2A和2B是示出了根据本发明的存储系统200的优选实施方式的示意性框图。一个或多个存储系统200可以体现在图1的每个站点105中。2A and 2B are schematic block diagrams illustrating a preferred embodiment of a storage system 200 according to the present invention. One or more storage systems 200 may be embodied in each site 105 of FIG. 1 .

存储系统200可以将数据存储在不同物理媒介中,包括但不限于存储盒带、磁盘驱动器、固态磁盘(SSD)、磁盘直接存取存储设备(DASD)、磁带驱动器、库、以及磁盘驱动器阵列,例如独立磁盘(RAID)冗余阵列或磁盘簇。存储盒带的实例是盒式磁带,其包括枢纽卷轴上缠绕的可重写磁带,以及盒带存储器。盒式磁带的一个实例包括基于线性磁带开放(LTO)技术的盒带。线性磁带开放LTO以及LTO标志是HP、IBM公司和Quantum在美国或其它国家的商标。Storage system 200 may store data on various physical media including, but not limited to, storage cartridges, disk drives, solid state disks (SSDs), disk direct access storage devices (DASDs), tape drives, libraries, and disk drive arrays, such as Redundant array or cluster of independent disks (RAID). Examples of storage cartridges are cassette tapes, which include rewritable magnetic tape wound on pivoted reels, and cassette memory. One example of a cassette includes a cassette based on Linear Tape Open (LTO) technology. Linear Tape Open LTO and the LTO logo are trademarks of HP, IBM Corporation and Quantum in the US or other countries.

存储系统200可以按不同形式来存储数据,例如逻辑或虚拟数据。此处,可以按各种形式中的任意一种来组织数据,称为“卷”或“对象”,在不参考数据的任意特殊尺寸或安排的情况下选择的数据。Storage system 200 may store data in different forms, such as logical or virtual data. Here, data may be organized in any of various forms, referred to as "volumes" or "objects", data selected without reference to any particular size or arrangement of the data.

如图2A和2b中所示,存储系统200为多个主机系统210提供存储器。例如,存储系统200包括多个主机210、多个集群220、以及网络215。尽管为了简化的目的,图2A中示出了两个(2)主机210a、210b,四个(4)集群220a、220b、220c、220d以及一个(1)网络215,但是可以使用任意数量的主机210、集群220以及网络215。因此,存储系统200中可以包括任意数量的集群220。As shown in FIGS. 2A and 2b , a storage system 200 provides storage for a plurality of host systems 210 . For example, the storage system 200 includes multiple hosts 210 , multiple clusters 220 , and a network 215 . Although two (2) hosts 210a, 210b, four (4) clusters 220a, 220b, 220c, 220d, and one (1) network 215 are shown in FIG. 2A for purposes of simplicity, any number of hosts may be used 210 , cluster 220 and network 215 . Therefore, any number of clusters 220 may be included in the storage system 200 .

如图2A中所示,存储系统200可以使用通过网络215连接的四个(4)集群220a、220b、220c、220d,每个集群220包括用于为主机210a仿真磁带驱动器或磁带库的虚拟节点(“VN”)260和存储设备230。在优选的实施方式中,集群220a、220b、220c、220d是虚拟磁带服务器集群。As shown in FIG. 2A, the storage system 200 may utilize four (4) clusters 220a, 220b, 220c, 220d connected by a network 215, each cluster 220 including a virtual node for emulating a tape drive or tape library for the host 210a (“VN”) 260 and storage device 230 . In a preferred embodiment, the clusters 220a, 220b, 220c, 220d are virtual tape server clusters.

每个集群220包括分层存储节点(“HSN”)250,用于本地移动和/或在存储设备230和库240之间传递数据。在优选的实施方式中,存储系统200包括磁盘存储器230和磁盘库240。在优选的实施方式中,库240是自动磁盘库(“ATL”)。HSN 250可以用于在本地磁盘存储器230和远程磁盘存储器230之间远程地传递数据。例如,磁盘存储器230可以包括被安排为RAID、JBOD、SSD或他们的任意组合的一个或多个磁盘驱动器。Each cluster 220 includes hierarchical storage nodes (“HSNs”) 250 for locally moving and/or passing data between storage devices 230 and libraries 240 . In a preferred embodiment, the storage system 200 includes a disk storage 230 and a disk library 240 . In a preferred embodiment, library 240 is an automated disk library ("ATL"). HSN 250 may be used to remotely transfer data between local disk storage 230 and remote disk storage 230. For example, disk storage 230 may include one or more disk drives arranged in RAID, JBOD, SSD, or any combination thereof.

每个集群220包括如图3所示的具有磁带的库管理器370,将在下面进行介绍。主机210可以发起或运行任务或工作,例如磁带工作,其中从集群族280和/或族成员220中的磁带读取数据,并且将数据写入集群族280和/或族成员220中的磁带。主机210可以是大型计算机、服务器等。主机210可以具有运行或支持多个操作系统的能力。例如,主机210可以运行或可以支持多个操作系统,例如 等。Linux是Linus Toralds在美国和/或其它国家的注册商标。Java和所有基于Java的商标和标志是Oracle和/或其分支机构的商标或注册商标。Microsoft、Windows以及Windows标志是微软公司在美国和/或其它国家内的商标。存储系统200的主机210中的每一个可以用作单个大型计算机,一个或多个服务器、或多个虚拟机。主机210可以提供三个级别的虚拟化:通过处理器资源/系统管理器(PR/SM)工具的逻辑分区(LPAR);通过操作系统的虚拟机;以及操作系统,尤其是具有密钥保护的地址空间和面向目标的工作量调度的IBM,z/VM以及z/OS是国际商业机器公司在许多全球管辖区内注册的商标。Each cluster 220 includes a library manager 370 with tapes as shown in FIG. 3 and described below. Host 210 may initiate or run a task or job, such as a tape job, in which data is read from and written to tape in cluster family 280 and/or family members 220 . Host 210 may be a mainframe computer, a server, or the like. Host 210 may have the capability to run or support multiple operating systems. For example, host 210 may run or may support multiple operating systems, such as wait. Linux is a registered trademark of Linus Toralds in the United States and/or other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Microsoft, Windows and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries. Each of the hosts 210 of the storage system 200 can function as a single mainframe computer, one or more servers, or multiple virtual machines. Host 210 can provide three levels of virtualization: Logical Partitions (LPARs) through Processor Resource/System Manager (PR/SM) tools; operating system virtual machines; and operating systems, especially those with key-protected address spaces and goal-oriented workload scheduling IBM, z/VM and z/OS are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide.

主机210可以通过网络215与集群220进行通信,以通过下面将介绍的集群族成员220访问多个磁带驱动器、磁盘驱动器、或其它存储设备。例如,第一主机210a可以在网络215上进行通信,以通过第一集群220a访问存储设备和磁带。Host 210 may communicate with cluster 220 over network 215 to access multiple tape drives, disk drives, or other storage devices through cluster family members 220 as described below. For example, first host 210a may communicate over network 215 to access storage devices and tapes through first cluster 220a.

每个集群220可以包括分层存储控制器,例如分层存储节点315,如图3所示。集群220可以提供用于要被读取和存储的单点管理,聚集了可以容易地将存储器分配给不同主机210的存储工具,通过增加存储器或存储器控制节点来扩展存储系统200,以及用于实现高级功能的平台,例如快写缓存、时间点拷贝、透明数据迁移以及远程拷贝。Each cluster 220 may include a tiered storage controller, such as tiered storage nodes 315 , as shown in FIG. 3 . The cluster 220 can provide a single point of management for being read and stored, gathers storage tools that can easily allocate storage to different hosts 210, expand the storage system 200 by adding storage or storage control nodes, and for implementing A platform for advanced features such as write caching, point-in-time copy, transparent data migration, and remote copy.

集群220可以遵循“带内”方法。带内方法可以导致通过集群族成员220来处理所有的输入/输出(I/O)请求和所有的管理和配置请求。Cluster 220 may follow an "in-band" approach. The in-band approach may result in all input/output (I/O) requests and all management and configuration requests being processed by cluster family members 220 .

集群220中的每一个可以在他们自己之间连接并且可以通过网络215连接到主机220,以访问在磁带上写入的数据和/或将数据写入到磁带中。多个集群220可以形成存储系统200的域205。域205可以代表多个集群或网格配置。域205可以包括两个或多个集群220。Each of clusters 220 may be connected among themselves and may be connected to hosts 220 through network 215 to access data written on and/or write data to tape. Multiple clusters 220 may form domain 205 of storage system 200 . Domain 205 may represent multiple cluster or grid configurations. Domain 205 may include two or more clusters 220 .

存储系统200的网络215可以是存储区域网络(SAN),令牌环网络、局域网(LAN)、广域网(WAN)、互联网、专用网络和网络的结合等。SAN可以包含一种“构造”,主机210可以通过所述“构造”在网络215上与集群220进行通信。构造可以包括光纤通道网络、以太网等。所有的元件不能共享用于通信的相同构造。第一主机210a可以通过一种构造与第一集群220a进行通信。此外,第一主机210a可以通过另一构造与第三集群220c进行通信。The network 215 of the storage system 200 may be a storage area network (SAN), a token ring network, a local area network (LAN), a wide area network (WAN), the Internet, a dedicated network, a combination of networks, and the like. A SAN may comprise a "fabric" through which hosts 210 may communicate with clusters 220 over network 215 . Fabrics can include Fiber Channel networks, Ethernet, and more. All elements cannot share the same fabric for communication. The first host 210a may communicate with the first cluster 220a through one configuration. Additionally, the first host 210a may communicate with the third cluster 220c through another configuration.

每个存储系统200可以包括集群族280。集群族280可以包括多个集群族成员220,将所述多个集群族成员220安排、配置、组织和/或分组到集群族280中。例如,如图2B中所示,存储系统200包括集群族280(1)和集群族280(2)。集群族280(1)包括被分组到集群族280(1)的族成员中的多个集群220(a)、220(b)。集群族280(2)包括被分组到集群族280(2)的族成员中的多个集群族成员220(b)、220(c)。集群族280(1)和集群族280(2)经由网络(例如网络110、215)彼此进行通信。可以为每个集群族280指定或分配名称。例如,可以将集群族280(1)命名为城市A,并且可以将集群族280(2)命名为城市B。Each storage system 200 may include a cluster family 280 . Cluster family 280 may include a plurality of cluster family members 220 arranged, configured, organized, and/or grouped into cluster family 280 . For example, as shown in FIG. 2B, storage system 200 includes cluster family 280(1) and cluster family 280(2). Cluster family 280(1) includes a plurality of clusters 220(a), 220(b) grouped into cluster members of cluster family 280(1). Cluster family 280(2) includes a plurality of cluster family members 220(b), 220(c) grouped into the cluster members of cluster family 280(2). Cluster family 280(1) and cluster family 280(2) communicate with each other via a network (eg, networks 110, 215). Each cluster family 280 may be assigned or assigned a name. For example, cluster family 280(1) may be named City A, and cluster family 280(2) may be named City B.

尽管为了简化的目的,图2B示出了具有两个集群族280的存储系统200。可以使用任意数量的存储系统200、集群族280以及集群族成员220。Although for purposes of simplicity, FIG. 2B shows storage system 200 with two cluster families 280 . Any number of storage systems 200, cluster families 280, and cluster family members 220 may be used.

存储系统200的实例是IBM TS7700虚拟磁带服务器。An example of storage system 200 is an IBM TS7700 virtual tape server.

图3是示出了本发明的集群220的优选实施方式的示意性框图。例如,集群220可以代表图2A和2B的集群族280的集群族成员220。集群220的说明引用图1至2的元件,相同的数字表示相同的元件。集群220可以包括虚拟化节点310、分层存储节点315、卷缓存365以及库管理器370。Figure 3 is a schematic block diagram illustrating a preferred embodiment of a cluster 220 of the present invention. For example, cluster 220 may represent cluster family member 220 of cluster family 280 of FIGS. 2A and 2B . The description of the cluster 220 refers to elements of FIGS. 1-2, like numbers denoting like elements. Cluster 220 may include virtualization nodes 310 , tiered storage nodes 315 , volume cache 365 , and librarian 370 .

例如,存储设备230可以包括被安排为独立磁盘冗余阵列(RAID)或磁盘簇(JBOD)、或固态磁盘(SSD)等的一个或多个磁盘驱动器。存储设备230可以包括卷存储器365。卷缓存365可以用作虚拟卷缓存和/或磁带卷缓存(TVC)。For example, storage device 230 may include one or more disk drives arranged as a redundant array of independent disks (RAID) or a bunch of disks (JBOD), or a solid state disk (SSD), or the like. Storage device 230 may include volume storage 365 . Volume cache 365 may function as a virtual volume cache and/or a tape volume cache (TVC).

例如,存储设备230包括虚拟卷缓存365。虚拟卷缓存365可以用作TVC,其中TVC包括快速访问存储器设备,例如硬盘驱动器。在优选的实施方式中,集群220用于缓存到TVC 365的数据。For example, storage device 230 includes virtual volume cache 365 . Virtual volume cache 365 can be used as a TVC, where a TVC includes a fast access memory device, such as a hard drive. In a preferred embodiment, cluster 220 is used to cache data to TVC 365.

TVC 365可以缓存从逻辑卷读取的数据和/或缓存要被写入到逻辑卷的数据。主机210可以重复地写入到逻辑卷。TVC 365可以在硬盘驱动器230上存储写入的数据,而不将数据写入到逻辑卷的磁带。在稍后的时间,TVC 365可以将缓存的数据写入到磁带库240内的磁带。因此,可以通过TVC 365来路由诸如用于安装逻辑卷的虚拟磁带驱动器的读取操作和写入操作的操作。TVC 365 can cache data read from logical volumes and/or cache data to be written to logical volumes. Host 210 may repeatedly write to the logical volume. The TVC 365 can store written data on the hard disk drive 230 without writing the data to tape of the logical volume. At a later time, TVC 365 may write the cached data to tape within tape library 240. Thus, operations such as read operations and write operations for virtual tape drives that mount logical volumes can be routed through the TVC 365.

主机210可以发起和运行集群220上的任务和/或工作。例如,第一主机210a访问可能导致库管理器370的传动器由物理磁带管理器335控制,将盒式磁带从存储区域传递到磁带驱动器,以访问在磁带上写入的数据和/或将数据写入磁带和/或TVC 365。Host 210 may initiate and run tasks and/or jobs on cluster 220 . For example, first host 210a access may cause the transport of librarian 370, controlled by physical tape manager 335, to transfer the tape cartridge from the storage area to the tape drive to access the data written on the tape and/or transfer the data Write to tape and/or TVC 365.

虚拟化节点310可以是具有到网络215的多个连接的独立的基于处理器的服务器。虚拟化节点310可以包括电池备份单元(BBU)和/或可以访问不中断电源(UPS)。虚拟化节点310可以包含看门狗定时器。看门狗定时器可以确保能够重启不能和/或花费较长时间来恢复的故障虚拟化节点310。Virtualization node 310 may be a stand-alone processor-based server with multiple connections to network 215 . Virtualization node 310 may include a battery backup unit (BBU) and/or may have access to an uninterruptible power supply (UPS). Virtualization node 310 may include a watchdog timer. A watchdog timer can ensure that a failed virtualized node 310 that cannot and/or takes a long time to recover can be restarted.

虚拟化节点310可以包括一个或多个磁带后台程序312。磁带后台程序312可以将集群220到主机210的磁带驱动器仿真为虚拟磁带驱动器。磁带后台程序312可以在TVC 365上操作文件,和/或可以通过远程文件访问在另一集群220的远程TVC 365中操作文件。Virtualization node 310 may include one or more tape daemons 312 . The tape daemon 312 can emulate the tape drives of the cluster 220 to the host 210 as virtual tape drives. Tape daemon 312 may operate files on TVC 365, and/or may operate files in a remote TVC 365 of another cluster 220 through remote file access.

分层存储节点315可以包括集群管理器320、远程文件访问325、数据移动器330、物理磁带管理器335、缓存管理器340、回调管理器345、数据库350、管理接口355以及媒体管理器360。集群管理器320可以在多个集群或网格拓扑中的多个集群220之间协调操作。Hierarchical storage node 315 may include cluster manager 320 , remote file access 325 , data mover 330 , physical tape manager 335 , cache manager 340 , callback manager 345 , database 350 , management interface 355 , and media manager 360 . Cluster manager 320 may coordinate operations among multiple clusters or multiple clusters 220 in a mesh topology.

集群管理器320可以使用令牌来确定哪个集群220具有数据的当前拷贝。可以将令牌存储在数据库350中。集群管理器320还可以协调集群220之间的拷贝数据。集群管理器320可以包括一个或多个处理器,被配置为执行所属领域的技术人员所了解的计算机可读程序。Cluster manager 320 may use the token to determine which cluster 220 has the current copy of the data. The tokens may be stored in database 350 . Cluster manager 320 may also coordinate copying data between clusters 220 . Cluster manager 320 may include one or more processors configured to execute computer-readable programs known to those skilled in the art.

远程文件访问325可以是服务器、一个或多个处理器等。远程文件访问325可以提供到TVC 365的用于由任意远程集群220访问的链接。集群管理器320可以包括计算机可读程序。Remote file access 325 may be a server, one or more processors, or the like. Remote file access 325 may provide a link to TVC 365 for access by any remote cluster 220. Cluster manager 320 may include a computer readable program.

数据移动器330可以控制用于在集群220之间执行的拷贝的实际数据传递操作,并且还可以在物理磁带媒体和TVC 365之间传递数据。数据移动器330可以包括计算机可读程序。Data mover 330 may control the actual data transfer operations for the copies performed between clusters 220 and may also transfer data between physical tape media and TVC 365. Data mover 330 may include a computer readable program.

物理磁带管理器335可以控制集群220中的物理磁带。物理磁带管理器335可以管理多个池中的物理磁带、改造、从共同的暂存池借卷以及将卷返回给暂存池,并且在池之间传递磁带。物理磁带管理器335可以包括计算机可读程序。Physical tape manager 335 may control physical tapes in cluster 220 . The physical tape manager 335 can manage physical tapes in multiple pools, revamp, borrow volumes from and return volumes to a common scratch pool, and transfer tapes between pools. Physical tape manager 335 may include a computer readable program.

缓存管理器340可以控制从TVC 365到物理磁带的数据拷贝,以及随后的从TVC 365的数据冗余拷贝的移除。缓存管理器340可以提供控制信号以不同组件和TVC 365之间的数据流。缓存管理器340可以计算机可读程序。Cache manager 340 may control the copying of data from TVC 365 to physical tape, and the subsequent removal of redundant copies of data from TVC 365. Cache manager 340 may provide control signals to flow data between the various components and TVC 365. Cache manager 340 may be a computer readable program.

回调管理器345可以对从物理媒体到TVC 365的数据的回调进行排队和控制,用于集群管理器320所请求的虚拟磁带驱动器或拷贝。回调管理器345可以包括计算机可读程序。Callback manager 345 may queue and control callbacks of data from physical media to TVC 365 for virtual tape drives or copies requested by cluster manager 320. Callback manager 345 may comprise a computer readable program.

数据库350可以是在硬盘驱动器上存储的记录的结构收集。记录可以包括磁带上的数据的位置。主机210可以使用数据库地址将数据写入集群220的磁带和/或可以从磁带访问数据,以将数据提供给用户。Database 350 may be a structured collection of records stored on a hard drive. A record may include the location of data on tape. Host 210 may use the database address to write data to and/or access data from tape in cluster 220 to provide the data to users.

管理接口355可以提供与集群220相关的信息给用户。同样,管理接口355可以允许用户控制和配置集群220。管理接口355可以包括计算机阴极射线管(CRT)、液晶显示器(LCD)屏幕、键盘等,或作为基于网络的接口而存在。Management interface 355 may provide information related to cluster 220 to users. Likewise, management interface 355 may allow a user to control and configure cluster 220 . Management interface 355 may include a computer cathode ray tube (CRT), liquid crystal display (LCD) screen, keyboard, etc., or exist as a web-based interface.

媒体管理器360可以管理集群220的磁带的物理处理。同样,媒体管理器360可以管理集群220的磁带的错误恢复。媒体管理器360可以诊断错误且可以确定错误是否是由物理磁带驱动器或由物理磁带媒体引起的。此外,媒体管理器360可以采取用于错误恢复的适当动作。Media manager 360 may manage the physical handling of tapes of cluster 220 . Likewise, media manager 360 may manage error recovery for tapes of cluster 220 . Media manager 360 can diagnose the error and can determine whether the error is caused by the physical tape drive or by the physical tape media. Additionally, media manager 360 can take appropriate actions for error recovery.

库管理器370可以包括多个物理磁带驱动器、机器人存取器以及多个物理磁带媒体。库管理器370的机器人存取器可以将磁带传递到被分配给TVC 365的磁带驱动器。虚拟磁带驱动器可以是对主机210来说是物理磁带驱动器的逻辑结构。如所属领域技术人员公知的,可以通过读取/写入通道从磁带驱动器的磁带读取数据,或将数据写入磁带驱动器的磁带。Librarian 370 may include multiple physical tape drives, robotic accessors, and multiple physical tape media. The robotic accessor of the librarian 370 can deliver the tapes to the tape drives assigned to the TVC 365. A virtual tape drive may be a logical structure that appears to host 210 as a physical tape drive. As known to those skilled in the art, data may be read from or written to the tape of the tape drive through the read/write channel.

多个集群220中的每一个磁带驱动器可以使用一个或多个磁带以存储数据。磁带可以用作存储系统200中数据的存储媒体。集群220可以使用任意数量的磁带驱动器和磁带。例如,存储系统200可以使用两个(2)磁带驱动器以及两百五十六(256)个虚拟驱动器。Each tape drive in number of clusters 220 may use one or more tapes to store data. Magnetic tape may be used as a storage medium for storing data in system 200 . Cluster 220 may use any number of tape drives and tapes. For example, storage system 200 may use two (2) tape drives and two hundred and fifty six (256) virtual drives.

TVC 365可以包含来自被操作的磁带卷的数据并且可存储用于快速存取的附加卷数据。可以通过TVC 365来路由安装卷的虚拟磁带驱动的诸如读取操作和写入操作的操作。因此,选择集群220可以选择集群的TVC365。可以将磁带驱动器的所有磁带组织为一个或多个逻辑卷。可以使用先入先出(FIFO)和/或最近使用(LRU)算法来管理TVC 365中的卷。TVC 365 can contain data from tape volumes being manipulated and can store additional volume data for fast access. Operations such as read operations and write operations of the mounted volume's virtual tape drive can be routed through the TVC 365. Thus, select cluster 220 may select a TVC 365 of the cluster. All tapes of a tape drive can be organized into one or more logical volumes. Volumes in the TVC 365 may be managed using first-in-first-out (FIFO) and/or most recently used (LRU) algorithms.

TVC 365可以是快速存取存储器设备。例如,TVC 365可以是具有五千四百吉比特(5400GB)存储器容量的硬盘驱动器等。在存储系统200中,磁带驱动器可以缓存从逻辑卷读取的去往TVC 365的数据和/或可以缓存要被写入到逻辑卷的数据。例如,主机210可以重复地写入虚拟磁带驱动器。TVC 365可以在硬盘驱动器上存储写入的数据,而不将数据写入虚拟磁带。在稍后的时间,缓存管理器340可以将缓存的数据写入到集群220的磁带。TVC 365 may be a fast access memory device. For example, the TVC 365 may be a hard drive or the like with a memory capacity of fifty-four hundred gigabytes (5400GB). In storage system 200, tape drives may cache data read from logical volumes to TVC 365 and/or may cache data to be written to logical volumes. For example, host 210 may repeatedly write to a virtual tape drive. TVC 365 can store written data on the hard drive without writing the data to virtual tape. At a later time, cache manager 340 may write the cached data to cluster 220 tape.

可以将存取卷的虚拟化节点310称为安装点。选择用于逻辑卷的最近安装点的远程集群TVC 365可以改进对卷的存取。TVC 365的高可使用性、快速写入存储器允许主机210将数据写入TVC 365,而不必等待要被写入到物理磁盘的数据。A virtualization node 310 that accesses a volume may be referred to as a mount point. Selecting a remote cluster TVC 365 for the nearest mount point of a logical volume can improve access to the volume. The high availability, fast write memory of the TVC 365 allows the host 210 to write data to the TVC 365 without having to wait for the data to be written to the physical disk.

在优选的实施方式中,每个站点105包括存储系统200。每个存储系统200包括被分组在一起的两个或多个集群族成员220,以创建集群族280。例如,集群族280(1)包括集群族成员220(a)和220(b)的组,并且集群族280(2)包括集群族成员220(c)和220(d)的组。集群族281(1)可以用于生产目的,并且例如,集群族280(2)可以用于灾难(DR)或归档的目的。因此,集群族280可以实现与其它集群族28相关的不同角色。此外,集群族280的集群族成员220可以实现集群族280内彼此相关的不同角色。因此,集群族280的集群族成员220可以实现与非族成员的不同角色。In a preferred embodiment, each site 105 includes a storage system 200 . Each storage system 200 includes two or more cluster family members 220 that are grouped together to create a cluster family 280 . For example, cluster family 280(1) includes the set of cluster family members 220(a) and 220(b), and cluster family 280(2) includes the set of cluster family members 220(c) and 220(d). Cluster family 281(1) may be used for production purposes, and cluster family 280(2), for example, may be used for disaster (DR) or archival purposes. Thus, cluster families 280 may fulfill different roles in relation to other cluster families 28 . In addition, cluster family members 220 of cluster family 280 may fulfill different roles within cluster family 280 that are related to each other. Thus, cluster family members 220 of cluster family 280 may fulfill different roles than non-cluster members.

在优选的实施方式中,可以在全球距离、都市距离或其组合上配置集群族280。相似地,可以在全球距离、都市距离或其组合上配置集群族成员220。此外,在集群族280中,集群族成员220可以具有彼此不同的远离分级。相似地,集群族280可以具有彼此之间的不同的远离分级。虽然可以将远离分级用作定义角色和集群族820和集群族成员220之间的关系的因素,但这仅是带来集群族成员220和基站族280之间的关系感知的因素。因此,将集群220安排或分组到集群族280的集群族成员中并不被限制为距离。In a preferred embodiment, cluster clusters 280 may be configured on global distances, metropolitan distances, or a combination thereof. Similarly, cluster family members 220 may be configured on global distances, metropolitan distances, or a combination thereof. Furthermore, in cluster family 280, cluster family members 220 may have different distance ratings from each other. Similarly, cluster families 280 may have different distance classes from one another. While distance hierarchy can be used as a factor in defining roles and relationships between cluster family 820 and cluster family members 220 , it is only a factor that brings about a sense of relationship between cluster family members 220 and base station families 280 . Accordingly, arranging or grouping clusters 220 into cluster family members of cluster family 280 is not limited by distance.

此外,由于每个存储系统200包括通过将两个或多个集群220分组到族成员中所创建的集群族280,每个存储系统200或存储系统200的结合可以代表多集群配置或网格。Furthermore, since each storage system 200 includes a cluster family 280 created by grouping two or more clusters 220 into family members, each storage system 200 or a combination of storage systems 200 may represent a multi-cluster configuration or grid.

此外,存储系统200的集群220可以形成分布式存储配置。例如,第二集群220(b)可以创建卷的第二实例。第二实例可以与第一集群220(a)上的第一拷贝同步,其中在更新第一拷贝的任意时间更新第二拷贝。可以在位于远程站点105的另一集群族280中存储第二实例,以确保在第一实例变为不可使用的情况下的数据可用性。未来的安装点存取可以选择第二拷贝作为第一拷贝。当增加、移除和/或重新平衡数据到磁带时,可以使用透明数据迁移。Furthermore, clusters 220 of storage systems 200 may form a distributed storage configuration. For example, the second cluster 220(b) may create a second instance of the volume. The second instance may be synchronized with the first copy on the first cluster 220(a), wherein the second copy is updated at any time the first copy is updated. The second instance may be stored in another cluster family 280 at the remote site 105 to ensure data availability in the event the first instance becomes unavailable. Future mount point accesses may select the second copy as the first copy. Transparent data migration can be used when adding, removing and/or rebalancing data to tape.

尽管通过参考图1至2来讨论本发明的优选实施方式,但是这仅用于说明的目的。所属领域的技术人员将明白的是,本发明并不限于任意特定的网格配置并且可以在任意多集群或网格配置中实现。例如,可以将来自站点105(a)的一个或多个集群220与来自不同站点105(站点105(b))的一个或多个集群220分为一组,以创建第一集群族280。同样地,可以将来自站点105(c)和站点105(a)的一个或多个集群220分组到族成员中,以创建第二集群族280。因此,可以将集群220的任意组合分组到组成员中,以创建集群族280。Although a preferred embodiment of the present invention is discussed with reference to Figures 1-2, this is for illustration purposes only. Those skilled in the art will appreciate that the present invention is not limited to any particular grid configuration and may be implemented in any multi-cluster or grid configuration. For example, one or more clusters 220 from site 105(a) may be grouped with one or more clusters 220 from a different site 105 (site 105(b)) to create a first cluster family 280 . Likewise, one or more clusters 220 from site 105(c) and site 105(a) may be grouped into family members to create a second cluster family 280. Thus, any combination of clusters 220 may be grouped into group members to create a cluster family 280 .

图4是示出了本发明的集群族装置400的优选实施方式的示意性框图。装置400可以体现在主机210和/或集群220中。在优选的实施方式中,装置400体现在集群管理器320中。装置400的说明指图1至3的元件,相同的数字指代相同的元件。装置400可以包括关系模块405、创建模块410、协作复制模块415、安装处理模块420、通信模块425和策略模块430或其任意组合。FIG. 4 is a schematic block diagram illustrating a preferred embodiment of a cluster cluster device 400 of the present invention. Apparatus 400 may be embodied in host 210 and/or cluster 220 . In a preferred embodiment, the apparatus 400 is embodied in the cluster manager 320 . The description of the apparatus 400 refers to elements of FIGS. 1-3, like numerals referring to like elements. Apparatus 400 may include relationship module 405, creation module 410, collaborative replication module 415, installation processing module 420, communication module 425, and policy module 430, or any combination thereof.

关系模块405包括在处理器上执行的计算机可读程序,处理器例如是集群管理器320的处理器。此外,集群关系模块405包括定义集群族280和族成员220之间的关系和角色的因素。例如,与哪些族成员属于哪些族相关的因素,相邻族和/或组成员之间的距离分级,以及哪些族成员用于生产目的且哪些族成员用于DR(灾难恢复)和/或实现目的。Relationship module 405 includes a computer readable program that executes on a processor, such as the processor of cluster manager 320 . Additionally, cluster relationship module 405 includes factors that define relationships and roles between cluster families 280 and family members 220 . For example, factors related to which family members belong to which families, distance grading between adjacent family and/or group members, and which family members are used for production purposes and which are used for DR (disaster recovery) and/or fulfillment Purpose.

集群族成员220通过诸如网络110和/或网络215的网络进行通信。每个集群族成员220可包括具有至少一个磁带驱动器的库管理器370,所述至少一个磁带驱动器被配置为存取在磁带和至少一个TVC 365上的卷。Cluster family members 220 communicate over a network such as network 110 and/or network 215 . Each cluster family member 220 may include a library manager 370 having at least one tape drive configured to access volumes on tape and at least one TVC 365.

创建模块410包括在处理器上执行计算机可读程序,所述处理器例如是集群管理器320的处理器。通过将集群22分组到一起以通过准则、规则和/或目的的共同集合来进行操作,创建模块410选择集群220并且将集群220安排到集群族280的族成员中。Creation module 410 includes executing a computer readable program on a processor, such as a processor of cluster manager 320 . Creation module 410 selects and arranges clusters 220 into family members of cluster family 280 by grouping clusters 22 together to operate by a common set of criteria, rules, and/or objectives.

创建模块410将集群220分组到集群族280中,以允许族成员220遵从规则或准则的共同集合。这样允许集群组,例如族280(1)、280(2)一起工作以更有效率地完成特殊任务,或允许不同组的集群220和/或族280具有网格内的不同目的。Creation module 410 groups clusters 220 into cluster families 280 to allow family members 220 to follow a common set of rules or guidelines. This allows groups of clusters, such as families 280(1), 280(2) to work together to more efficiently accomplish a particular task, or allows different groups of clusters 220 and/or families 280 to have different purposes within the grid.

创建模块410可以被用于通过配置属性允许族280内族成员220的可定制行为。例如,参考图2B,可以允许集群族成员220(a)、220(b)的组充当遵从对生产工作量有益的规则集合的生产族280(1)。可以允许域205中的另一组集群族成员220(c)、220(d)充当归档或灾难恢复族280(2),具有使族成员220(c)、220(d)在从生产族280(1)复制数据时更有效地进行操作的规则。Creation module 410 may be used to allow customizable behavior of family members 220 within family 280 through configuration properties. For example, referring to FIG. 2B, a group of cluster family members 220(a), 220(b) may be allowed to act as a production family 280(1) following a set of rules beneficial to production workloads. Another set of cluster family members 220(c), 220(d) in domain 205 may be allowed to act as archive or disaster recovery family 280(2), with family members 220(c), 220(d) in slave production (1) Rules to operate more efficiently when copying data.

此外,创建模块410管理族280的族成员220的关系以及不同集群族280之间的关系。例如,创建模块410可以基于集群族成员的关系和角色来管理集群族成员220。在优选的实施方式中,关系模块405可提供这种信息给创建模块410。基于族成员和相邻族的关系和/或角色,集群族成员220将在彼此间进行协调以确定,哪个族成员220在从族280外部的多个集群获得外部数据的最佳位置。创建模块410还可使用这个信息来支持族280的成员220作为TVC集群,或允许相对于仅集群或全网格的族280上的存取限制或其它特殊情况行为。Furthermore, the creation module 410 manages the relationships of the cluster members 220 of the clusters 280 as well as the relationships between different cluster clusters 280 . For example, creation module 410 can manage cluster family members 220 based on their relationships and roles. In a preferred embodiment, relationship module 405 may provide such information to creation module 410 . Cluster family members 220 will coordinate with each other to determine which family member 220 is in the best position to obtain external data from multiple clusters outside of family 280 based on the relationship and/or roles of the family members and neighboring families. Creation module 410 may also use this information to support members 220 of family 280 as TVC clusters, or to allow access restrictions or other special case behavior on families 280 with respect to cluster-only or full-mesh.

创建模块410可使用管理接口355来显示页面,其中用户(例如,客户)可以创建具有字符名称的集群族,例如8个字符名称。然后,用户可以使用创建模块410增加一个或多个集群到族。创建模块410可以在集群持续的重要产品数据内存储所述信息,使得多个集群或网格配置中的所有集群知道他们集群的角色以及其所驻留的族。创建模块410可以确定正在选择用于族的集群已经被选择用于另一族。为了避免使任意一个集群在相同时间出现在两个族中,创建模块410可以通知用户被选择的集群已经存在于另一族成员中。此外,创建模块410可以使用规则集合来阻止同一时间将一个集群选择到两个族中。The creation module 410 can use the management interface 355 to display a page where a user (eg, customer) can create a cluster family with a character name, such as an 8 character name. The user can then use the creation module 410 to add one or more clusters to the family. Creation module 410 may store this information within the cluster's persistent vital product data so that all clusters in multiple clusters or grid configurations know their cluster's role and the family in which it resides. Creation module 410 may determine that a cluster being selected for a family has already been selected for another family. To avoid having any one cluster appear in two families at the same time, the creation module 410 can notify the user that the selected cluster already exists in another family member. In addition, creation module 410 can use rule sets to prevent a cluster from being selected into two families at the same time.

策略模块430包括在处理器(诸如集群管理器320的处理器)上执行的计算机可读程序。在优选的实施方式中,策略模块430可包括与应当将哪个集群族成员220用于生产以及应当将哪个族成员用于DR/归档目的有关的特定策略。这些策略可以包括管理数据复制的数据集合。用户可以经由管理接口355输入用于管理多个集群族280和族成员220的策略。Policy module 430 includes a computer readable program executing on a processor, such as the processor of cluster manager 320 . In a preferred embodiment, policy module 430 may include specific policies regarding which cluster family member 220 should be used for production and which family member should be used for DR/archive purposes. These policies may include data collection that manages data replication. A user may enter policies for managing multiple cluster families 280 and family members 220 via management interface 355 .

参考图2A和2B,集群族创建模块410可以用于创建名称为“城市A”的集群族280(1)以及创建名称为“城市B”的集群族280(2)。集群族280(1)可以包括集群族成员220(a)、220(b)的组,并且集群族280(2)可以包括集群族成员220(c)、220(d)的组。此外,创建模块410可以用于增加族成员220到集群族280或从集群族280移除族成员220,以及将集群族成员重新分组到不同的集群族280中。2A and 2B, cluster family creation module 410 may be used to create cluster family 280(1) named "City A" and to create cluster family 280(2) named "City B". Cluster family 280(1) may include a set of cluster family members 220(a), 220(b), and cluster family 280(2) may include a set of cluster family members 220(c), 220(d). Additionally, the creation module 410 may be used to add or remove cluster members 220 to or from cluster clusters 280 , as well as to regroup cluster cluster members into different cluster clusters 280 .

由于创建模块410基于族的创建期间集群彼此间的关系和/或角色建立集群并将集群安排到族组中,网格或多集群配置中的所有集群直到彼此的角色以及他们所驻留的族。因此,创建模块410可通过管理接口355警告或通知用户,被增加到族280(1)的集群220(d)当前例如是另一族280(2)的族成员。然后,用户可以从族280(2)取消选择220(d),并且为族280(1)增加或重新选择220(d)。因此,创建模块410允许域205(例如,网格)中的所有集群220直到他们自己的角色以及与他们所驻留的族的关系、与其它族成员的关系以及与驻留在其它族中的非族成员的关系。Since the creation module 410 builds the clusters and arranges the clusters into family groups based on their relationships and/or roles to each other during the creation of the clusters, all clusters in a grid or multi-cluster configuration respect each other's roles and the families in which they reside . Accordingly, creation module 410 may alert or notify the user via management interface 355 that cluster 220(d) added to family 280(1) is currently, for example, a family member of another family 280(2). The user can then deselect 220(d) from family 280(2), and add or reselect 220(d) for family 280(1). Thus, the creation module 410 allows all clusters 220 in the domain 205 (e.g., a grid) to understand their own roles and relationships with the families they reside in, with other family members, and with Relationships among non-ethnic members.

在优选的实施方式中,创建模块410可为集群族指派名称。例如,在配置期间,用户可使用管理接口355为集群族指派名称。In a preferred embodiment, the creation module 410 may assign a name to the cluster family. For example, during configuration, a user may use management interface 355 to assign a name to a cluster family.

协作复制模块415包括在处理器上执行的计算机可读程序,所述处理器例如是集群管理器320的处理器。此外,协作复制模块415加强现有拷贝的管理以使属于集群族280的集群220的组能够更有效率地一起工作,以实现族280以及族280内(例如,族成员220)各个集群220之间的一致性。Cooperative replication module 415 includes a computer readable program that executes on a processor, such as the processor of cluster manager 320 . In addition, collaborative replication module 415 enhances the management of existing copies to enable groups of clusters 220 belonging to cluster family 280 to more efficiently work together to achieve consistency between.

协作复制模块415允许族280(例如,DR或归档族)内两个或多个集群族成员220共享入境复制工作量。因此,当为复制选择了源集群时,使用协作复制模块415的DR/归档集群族成员220的族280能够从改进的TVC选择获益。The collaborative replication module 415 allows two or more cluster family members 220 within a family 280 (eg, a DR or archive family) to share inbound replication workloads. Thus, the family 280 of DR/archive cluster family members 220 using the cooperative replication module 415 can benefit from improved TVC selection when a source cluster is selected for replication.

协作复制模块415允许集群族成员在属于相同族的其它集群族成员之间共享拷贝工作量。例如,在优选的实施方式中,域205包括Y个集群220,其中Y代表域205中包括的集群220的数量。将集群分组到具有N(两个或多个)集群族成员220的集群族280中。因此,域205由Y个集群220组成,其中将集群220中的一些分组到集群族280的N个集群族成员中。The collaborative replication module 415 allows cluster family members to share copy workloads among other cluster family members belonging to the same family. For example, in a preferred embodiment, domain 205 includes Y clusters 220 , where Y represents the number of clusters 220 included in domain 205 . Clusters are grouped into cluster families 280 having N (two or more) cluster family members 220 . Thus, domain 205 consists of Y clusters 220 , with some of clusters 220 grouped into N cluster family members of cluster family 280 .

例如,参照图2B,域205中存在四个集群220(a)、220(b)、220(c)、和220(d),因此Y代表4个集群(Y=4)。将两个集群220(a)和220(b)分组到第一集群族280(1)的N=2的集群族成员中,并且将两个集群220(c)和220(d)分组到第一集群族280(2)的N=2的集群族成员中。在这种网格配置中,域205由Y(4)个集群组成,其中将N=2的集群的子集分组到集群族280的族成员中。因此,N=2,作为族中集群族成员的数量。For example, referring to FIG. 2B, there are four clusters 220(a), 220(b), 220(c), and 220(d) in domain 205, so Y represents 4 clusters (Y=4). The two clusters 220(a) and 220(b) are grouped into the N=2 cluster family members of the first cluster family 280(1), and the two clusters 220(c) and 220(d) are grouped into the first cluster family 280(1). Among the N=2 cluster cluster members of a cluster cluster 280(2). In this grid configuration, domain 205 consists of Y(4) clusters, with a subset of N=2 clusters grouped into cluster family 280 cluster members. Therefore, N=2 as the number of cluster cluster members in the cluster.

通过当第一次将集群带入到族中时序列化任意一个卷的复制,协作复制模块45协作地复制基站的族组。例如,协作复制模块415指示族280(2)中的每个集群成员复制外部卷中的1/N,其中N是需要拷贝的族中的集群数量。一旦将所有的外部卷复制到族280(2)中且族280(2)是累积一致的,于是相同族280(2)内不一致的集群在每一个间共享外部数据。The cooperative replication module 45 cooperatively replicates clusters of base stations by serializing the replication of either volume when the cluster is first brought into the cluster. For example, cooperative replication module 415 instructs each cluster member in family 280(2) to replicate 1/N of the outer volume, where N is the number of clusters in the family that need to be copied. Once all external volumes are replicated into a family 280(2) and the family 280(2) is cumulatively consistent, then non-consistent clusters within the same family 280(2) share the external data between each.

例如,在没有本发明的情况下,如果可能,从微代码的级别,由于集群220不知道彼此之间的关系和角色,每个集群220彼此独立的工作。例如,如果我们假设集群220(a)包括需要被复制到集群220(c)和220(d)的20个卷。由于集群220(c)和220(d)彼此独立地工作,集群220(c)、220(d)中的每一个可以在网络215上拉取20卷的原始数据。For example, without the present invention, each cluster 220 works independently of each other, since the clusters 220 do not know the relationships and roles between each other, if possible, from the microcode level. For example, if we assume that cluster 220(a) includes 20 volumes that need to be replicated to clusters 220(c) and 220(d). Since clusters 220(c) and 220(d) work independently of each other, each of clusters 220(c), 220(d) can pull 20 volumes of raw data on network 215.

现在参照图2A和2B,在优选的实施方式中,例如,存在四个集群,其中通过创建模块410将两个集群220(a)、220(b)分组到族280(1)中并且将两个集群220(c)、220(d)分组到族280(2)中。所有的族成员220知道彼此并且知道他们所属的族280,并且知道需要被复制到族中相邻集群的所有卷。Referring now to FIGS. 2A and 2B , in a preferred embodiment, for example, there are four clusters, where two clusters 220 ( a ), 220 ( b ) are grouped into clusters 280 ( 1 ) by creation module 410 and two Clusters 220(c), 220(d) are grouped into families 280(2). All family members 220 know each other and the family 280 they belong to, and know all volumes that need to be replicated to adjacent clusters in the family.

例如,集群族成员220(c)和220(d)知道彼此并且存在来自不同集群族280(1)内非族成员220(a)的需要被复制到它们的族280(2)的20个卷。使用协作复制模块415,族成员220(c)拉取10个唯一的卷并且族成员220(d)拉取其它10个唯一的卷。即,每个集群族成员220(c)、220(d)拉取卷的1/N,其中N=族中集群族成员的数量。由于在这个实例中存在属于族集群280(2)的两个集群族成员220(c)、220(d),每个族成员拉取1/2的卷(例如,每个拉取10个唯一的卷)以获得整体20个卷。然后,集群族成员220(c)、220(d)彼此共享10个唯一的卷。For example, cluster family members 220(c) and 220(d) know each other and there are 20 volumes from non-family member 220(a) within a different cluster family 280(1) that need to be replicated to their family 280(2) . Using collaborative replication module 415, family member 220(c) pulls 10 unique volumes and family member 220(d) pulls the other 10 unique volumes. That is, each cluster family member 220(c), 220(d) pulls 1/N of the volume, where N = the number of cluster family members in the cluster. Since in this example there are two cluster family members 220(c), 220(d) belonging to cluster cluster 280(2), each family member pulls 1/2 of the volumes (e.g., each pulls 10 unique rolls) to obtain a total of 20 rolls. Cluster family members 220(c), 220(d) then share 10 unique volumes with each other.

通过经由协作复制模块415协作地复制,由于任意一个卷仅通过远程链路110/235拉取一次而不是N次,集群族280(2)或DR位置可以变得累积地一致更快N倍。于是,由于它们之间的相对距离,集群族成员220可更快地变得彼此间可用性一致。因此,通过每个集群220独立地源自从相同远程生产集群220,变得DR一致和高度可用(HA)一致的整体时间可以极大地增强。By cooperatively replicating via the cooperative replication module 415, the cluster family 280(2) or DR location can become cumulatively consistent N times faster since any one volume is only pulled over the remote link 110/235 once instead of N times. Thus, cluster family members 220 may more quickly become consistent with each other in availability due to their relative distances. Thus, by having each cluster 220 independently originate from the same remote production cluster 220, the overall time to become DR consistent and highly available (HA) consistent can be greatly enhanced.

因此,可以优化拷贝吞吐量以及提升整体时间,以实现集群族280内的卷一致性。例如,在受限带宽系统或具有多个存档站点的网格中,协作复制模块415允许族280中的每个族成员220参与到用于所有入境拷贝的复制过程,而不复制任何努力。一旦族280内集群220的组(族成员)达到聚集一致状态时,在相同族内的对等集群间共享族280内各个集群220中的一致性拷贝。Thus, copy throughput can be optimized and overall time improved to achieve volume consistency within the cluster family 280 . For example, in a bandwidth-constrained system or a grid with multiple archival sites, the collaborative replication module 415 allows each family member 220 in a family 280 to participate in the replication process for all inbound copies without duplicating any effort. Once a group (family member) of clusters 220 within a family 280 reaches an aggregate consensus state, a consistent copy in each cluster 220 within the family 280 is shared among peer clusters within the same family.

此外,协作复制模块415通过推迟复制来处理持续复制源觉察。例如,可以通知具有一致源的集群成员220以保持缓存中所述源卷,从而使其容易地可用于对等复制的其它族成员220。具有一致性源的集群继承原始安装源集群或包含主机创建/修改的原始拷贝的集群的角色。一旦族中的一个集群复制其1/N卷中的一个,协作复制模块415首先通知原始安装源集群,说明其族中所有其它集群(包括其自己)的理由并且生产集群可以解除代表目标族中集群的其自己的角色。这样免除生产集群以组织后端磁带外的卷(假设没有其它族或需要拷贝的生产集群),因此提供更多的缓存可用性。第二,发起用于卷的复制的DR族集群继承角色并且想起其族内的哪些集群仍需要拷贝。通过这种继承,可以在缓存中支持卷,直到所有的其对等族集群完成了拷贝。Additionally, the collaborative replication module 415 handles ongoing replication source awareness by deferring replication. For example, a cluster member 220 with a consistent source may be notified to keep the source volume in cache, making it readily available to other family members 220 for peer-to-peer replication. A cluster with a consistent source inherits the role of the original install source cluster or the cluster containing the original copy created/modified by the host. Once a cluster in the family replicates one of its 1/N volumes, the cooperative replication module 415 first notifies the original installation source cluster, stating the reason for all other clusters in its family (including itself) and the production cluster can de-represent the target cluster. The cluster's own role. This frees the production cluster to organize volumes out of backend tape (assuming there are no other families or production clusters that need to be copied), thus providing more cache availability. Second, the DR family cluster that initiated the replication for the volume inherits the role and remembers which clusters within its family still need to be copied. Through this inheritance, a volume can be backed in cache until all its peer clusters have completed the copy.

在优选的实施方式中,协作复制模块415可以使用级联拷贝需求标志。例如,随着集群族变为一致的,协作复制模块415将拷贝需求标志的所有权从一个集群族移动到另一个。通过级联拷贝需求标志,协作复制模块415可以允许标志的益处从一个族移动到另一个族,因此释放其参与的原始TVC。通过从TVC继承拷贝需求标志,例如,一旦族成员获得拷贝,其可以允许TVC集群迁移卷并且在缓存中为其他新工作量分配空间。In a preferred embodiment, the collaborative replication module 415 may use a cascaded copy requirement flag. For example, as cluster families become consistent, cooperative replication module 415 moves ownership of a copy requirement flag from one cluster family to another. By cascading copying demand tokens, the collaborative replication module 415 can allow the benefits of tokens to be moved from one family to another, thus releasing the original TVC it participated in. By inheriting the copy demand flag from the TVC, for example, once a family member gets a copy, it can allow the TVC cluster to migrate volumes and allocate space in cache for other new workloads.

一个实例可以是包括产品或与DR/归档族相连的默认族的域。TVC集群可以是生产或默认族的成员,并且可以开始管理拷贝需求标志。一旦DR/归档族的成员从TVC集群获得了拷贝,DR/归档族可以通知TVC集群清除与DR/归档族的成员相关的所有拷贝需求标志。与此相结合,DR/归档族可以继承管理用于其族成员的这些拷贝需求标志的责任。An instance could be a domain that includes a product or a default family connected to a DR/Archive family. A TVC cluster can be a member of the production or default family and can start managing copy demand flags. Once a member of the DR/archive family has obtained a copy from the TVC cluster, the DR/archive family may notify the TVC cluster to clear all copy required flags associated with the member of the DR/archive family. In conjunction with this, the DR/Archive family can inherit responsibility for managing these copy requirement flags for its family members.

例如,在存储系统200(未示出)的另一实施方式中,域205可以包括:第一族集群280(1),其包括集群族成员220(a)、220(b)、220(c);第二族集群280(2),其包括集群族成员220(d)、220(e)、220(f);以及第三族集群280(3),其包括集群族成员220(g)、220(h)、220(i)。每个族280包括三个集群族成员220并且每个族成员代表一个比特。由于存在三个族且每个族具有三个族成员(3比特),比特集合中总共有9个比特。例如,族集群280(1)包括需要被复制到族集群280(2)、280(3)中的原始数据对象。For example, in another embodiment of storage system 200 (not shown), domain 205 may include: a first family cluster 280(1) including cluster family members 220(a), 220(b), 220(c ); a second cluster 280(2) comprising cluster cluster members 220(d), 220(e), 220(f); and a third cluster cluster 280(3) comprising cluster cluster members 220(g) , 220(h), 220(i). Each family 280 includes three cluster family members 220 and each family member represents a bit. Since there are three families and each family has three family members (3 bits), there are a total of 9 bits in the bit set. For example, cluster 280(1) includes original data objects that need to be copied into clusters 280(2), 280(3).

可能的是,集群220(a)可以持有缓存中的卷,直到所有九个集群220在网络110或215之间拉取了拷贝为止。例如,集群220(a)可以包括9比特集合并且,当每个集群220拉取拷贝时,集群220(a)可清除其掩码中的比特。由于集群220(a)在其缓存中持有用于所有九个集群220的拷贝,集群220(a)不能为附加工作量分配空间。It is possible that cluster 220(a) may hold the volume in cache until all nine clusters 220 have pulled copies across network 110 or 215 . For example, cluster 220(a) may include a 9-bit set and, when each cluster 220 pulls a copy, cluster 220(a) may clear bits in its mask. Since cluster 220(a) holds copies for all nine clusters 220 in its cache, cluster 220(a) cannot allocate space for the additional workload.

通过允许每个集群族280继承管理用于其族成员的这些拷贝需求标志的责任,集群220(a)可以清除用于这些集群族280(2)、280(3)的剩余6个比特,并且在其缓存中仅保持用于其自己的驻留在族280(1)中的两个族成员220(b)、220(c)的拷贝。一旦其自己的族成员220(b)和220(c)具有拷贝,集群220(a)于是可清除其掩码以为更多的工作量分配空间。By allowing each cluster family 280 to inherit responsibility for managing these copy demand flags for its family members, the cluster 220(a) can clear the remaining 6 bits for these cluster families 280(2), 280(3), and Only keeps in its cache for its own copy of the two family members 220(b), 220(c) residing in the family 280(1). Once its own family members 220(b) and 220(c) have copies, cluster 220(a) can then clear its mask to allocate room for more workloads.

在本实例中,族280(2)的集群220(d)在网络215上拉取拷贝并且通知族集群280(1)的集群220(a)由于220(d)将在其缓存中保持拷贝直到其族成员220(e)、220(f)接收了拷贝位置,则其不再需要在缓存中持有用于族280(2)的拷贝。这样使集群220(a)无需在缓存中持有用于集群族280(2)的所有集群成员的拷贝。相似地,属于族280(3)的族成员220(g)指示集群220(a),将在其缓存中保持其族成员220(h)、220(i)的拷贝。因此,族220(a)免除在其缓存中持有用于属于280(3)的所有族成员的拷贝。In this example, cluster 220(d) of family 280(2) pulls a copy on network 215 and informs cluster 220(a) of family cluster 280(1) that since 220(d) will keep the copy in its cache until Its family member 220(e), 220(f) receives the copy location, it no longer needs to hold a copy for family 280(2) in cache. This eliminates the need for cluster 220(a) to hold copies in cache for all cluster members of cluster family 280(2). Similarly, family member 220(g) belonging to family 280(3) indicates that cluster 220(a) will maintain a copy of its family members 220(h), 220(i) in its cache. Thus, family 220(a) is exempt from holding copies in its cache for all family members belonging to 280(3).

此外,协作复制模块415可以通过在域中使用更多链接来执行拷贝而不是主要依赖于来自TVC的拷贝,在低带宽环境中增加性能,并且改进了用于族内集群变为一致的整体时间。例如,使用协作复制模块415的族280合作,从而实现跨族的一致性。在接收到全族的一致性时,族成员220于是可以一起工作以在族成员之间共享数据,以将每个各自成员带至族的一致性级别。Furthermore, the cooperative replication module 415 can increase performance in low-bandwidth environments and improve the overall time for intra-cluster clusters to become consistent by using more links in the domain to perform the copy rather than relying primarily on the copy from the TVC . For example, families 280 using collaborative replication module 415 cooperate to achieve cross-family consistency. Upon receiving a family-wide consensus, family members 220 can then work together to share data among the family members to bring each respective member to the family's level of consistency.

安装处理模块420包括在处理器上执行的计算机可读程序,处理器例如是集群管理器320的处理器。当利用集群的逻辑卷发生安装时,安装处理模块420支持和选择在其族外部的集群上的其自己的族内的集群族成员。例如,对产品集群的安装可通过主要用于DR或电子跳马远程集群来支持相同族280(1)中的另一生产集群。当生产数据需要保持局部且快速复制的高度可用性,安装处理模块420可被用于支持通过灾难恢复的可用性,并且因此通过DR族在生产族内选择族成员。Installation processing module 420 includes a computer readable program executing on a processor, such as a processor of cluster manager 320 . The mount processing module 420 supports and selects cluster family members within its own family on clusters external to its family when a mount occurs using the cluster's logical volumes. For example, an installation to a production cluster may support another production cluster in the same family 280(1) by being primarily used for DR or an e-vault remote cluster. When production data needs to remain highly available with local and fast replication, the installation processing module 420 can be used to support availability through disaster recovery, and thus select family members within the production family by the DR family.

安装处理模块420可以通过在需要远程安装时支持集群族成员来改进控制和性能。可以配置族和/或族成员(例如,使用创建模块410)以在选择远程TVC时相比于其它基站更喜欢特定集群。这样在从非生产集群区别产品集群集合时是有益处的。优选相同产品族内的族成员可以在生产集群内保持TVC选择,而不是潜在地选择旨在DR或归档目的的距离上远程的集群。The installation processing module 420 can improve control and performance by supporting cluster family members when remote installation is required. A family and/or family members may be configured (eg, using creation module 410 ) to prefer a particular cluster over other base stations when selecting a remote TVC. This is beneficial in differentiating the set of production clusters from non-production clusters. Preferably, family members within the same product family can maintain TVC selection within a production cluster, rather than potentially selecting clusters that are distant in distance for DR or archival purposes.

此外,由于集群族280内的集群族成员220是安装的目标并且在相同集群族280中得到支持,可以改进TVC选择处理。Furthermore, since cluster family members 220 within a cluster family 280 are targeted for installation and are supported in the same cluster family 280, the TVC selection process may be improved.

在优选的实施方式,存储系统200可以包括多个集群,其中将两个或多个集群220的子集分组到第一集群族280中,并且将两个或多个集群220的子集分组到第二集群族280中。族组的分组可以基于族成员的角色、关系和/或彼此间和/或与其它非族成员集群之间的距离。集群族280的每个集群族成员知道他们彼此之间的关系。族成员之间的这种关系意识允许组一起有效率地工作,以累积地复制数据到组中并且然后在彼此间进行复制。In a preferred embodiment, storage system 200 may include multiple clusters, wherein two or more subsets of clusters 220 are grouped into a first cluster family 280, and two or more subsets of clusters 220 are grouped into In the second cluster family 280 . The grouping of clusters may be based on the roles, relationships, and/or distances of the cluster members to each other and/or to other non-clan member clusters. Each cluster family member of cluster family 280 is aware of their relationship to each other. This awareness of relationships among family members allows groups to work efficiently together to cumulatively replicate data into the group and then among each other.

站点105可以包括集群族280或集群族280的组合。例如,站点105(a)可以包括第一集群族280和第二集群族280。第一集群族280可以包括生产集群220(a)、220(b),并且第二集群族280可以包括生产集群220(c)、220(d)。此外,可以从站点105的结合中选择集群220以创建集群族280。例如,可以通过选择不同站点105(例如,105(a)和105(b)处的集群220来创建集群族280,其中站点105(a)处集群220(a)、220(b)用于生产目的并且站点105(b)处集群220(c)、220(d)用于DR和/或归档目的。A site 105 may include a cluster family 280 or a combination of cluster families 280 . For example, site 105(a) may include a first cluster family 280 and a second cluster cluster family 280 . The first cluster family 280 may include production clusters 220(a), 220(b), and the second cluster cluster 280 may include production clusters 220(c), 220(d). Additionally, clusters 220 may be selected from a combination of sites 105 to create a cluster family 280 . For example, cluster family 280 may be created by selecting clusters 220 at different sites 105 (e.g., 105(a) and 105(b) where clusters 220(a), 220(b) are used for production purpose and the clusters 220(c), 220(d) at site 105(b) are used for DR and/or archival purposes.

在优选的实施方式中,集群220(c)、220(d)用于归档数据。在优选的实施方式中,集群220(c)、220(d)用于DR。在另一实施方式中,诸如集群220(c)的一个集群用于DR,并且诸如集群220(d)的另一集群用于归档。In a preferred embodiment, clusters 220(c), 220(d) are used for archiving data. In a preferred embodiment, clusters 220(c), 220(d) are used for DR. In another embodiment, one cluster, such as cluster 220(c), is used for DR, and another cluster, such as cluster 220(d), is used for archiving.

一般地,下面的示例性流程图详述了逻辑流程图。这样,所描述的顺序和标记的步骤指示本方法的优选实施方式。可以将功能上、逻辑上或效果上等价的其它步骤和方法设想为所述方法的一个或多个步骤,或其部分。此外,提供所使用的格式和符号以解释所述方法的逻辑步骤,并且将所使用的格式和符号理解为不限制本方法的范围。尽管可以在流程图使用各种箭头类型和线类型,不将他们理解为限制相应方法的范围。实际上,一些箭头和其它连接器可以用于仅指示方法的逻辑流。例如,箭头可以指示所描述的方法的列举的步骤之间非特定持续的等待或监控时间段。此外,特定方法发生的顺序可以限制为附着的相应示出步骤的顺序,或不限制为附着的相应示出步骤的顺序。Generally, the following exemplary flow diagrams detail logical flow diagrams. As such, the depicted order and labeled steps are indicative of the preferred embodiment of the method. Other steps and methods which are equivalent in function, logic, or effect may be conceived as one or more steps of the described method, or portions thereof. Additionally, the format and symbols used are provided to explain the logical steps of the methods and are understood not to limit the scope of the methods. Although various arrow types and line types may be used in the flowchart, they are not to be construed as limiting the scope of the corresponding method. In fact, some arrows and other connectors can be used to just indicate the logical flow of the method. For example, an arrow may indicate a waiting or monitoring period of no specific duration between enumerated steps of the depicted method. Furthermore, the order in which a particular method occurs may or may not be limited to the order of the corresponding illustrated steps attached.

图5是示出了本发明的集群族选择和协作复制方法的优选实施方式的示意性流程图。方法500实质上包括执行上面与图1至4的介绍的装置和方法的操作相关呈现的共更能的步骤。在优选的实施方式中,利用计算机程序产品来实现所述方法,所述计算机程序产品包括具有计算机可读程序的计算机可读媒介。可以将计算机可读程序集成到计算机系统中,所述计算机系统例如集群管理器320和/或主机210,其中与计算系统相结合的程序能够执行所述方法500。Fig. 5 is a schematic flowchart showing a preferred embodiment of the cluster family selection and cooperative replication method of the present invention. The method 500 essentially includes performing the steps presented above in relation to the operation of the apparatus and method presented in the presentation of FIGS. 1 to 4 . In a preferred embodiment, the method is implemented by means of a computer program product comprising a computer readable medium with a computer readable program. A computer readable program may be integrated into a computer system, such as the cluster manager 320 and/or the host computer 210 , wherein the program associated with the computing system is capable of performing the method 500 .

方法500开始并且在步骤510,将一组集群安排到集群族的族成员中。例如,基于集群间彼此的关系以及与域中其它集群的关系来对集群进行分组。基于各种因素和/或功能来创建集群族,所述功能包括角色(例如,生产源、DR、归档等)、范围、距离(例如,族之间的距离比率)等。此外,用户可以将字符名称指派给集群族。例如,如图2B中所示,可以使用字符名称(“城市A”)来创建集群族并且可以使用字符名称“城市B”来创建另一族。Method 500 begins and at step 510 a set of clusters is arranged into cluster members of cluster families. For example, grouping clusters based on their relationship to each other and to other clusters in the domain. Cluster families are created based on various factors and/or functions, including role (eg, production source, DR, archive, etc.), scope, distance (eg, distance ratio between families), and the like. Additionally, users can assign character names to cluster families. For example, as shown in FIG. 2B, a family of clusters may be created using the character name ("City A") and another family may be created using the character name "City B".

在优选的实施方式中,创建模块410用于创建集群族,其中用户可以使用管理接口355在配置期间创建集群族,以创建集群族名称、增加一个或多个集群到族、在相邻族间分配角色和/或距离比率、以及使用配置属性来教导集群。例如,创建模块410可以使用这些持续设置以为一个或多个集群或族成员带来关系意识以及带来族之间的相关属性,例如距离。In a preferred embodiment, the creation module 410 is used to create a cluster family, wherein the user can use the management interface 355 to create a cluster family during configuration to create a cluster family name, add one or more clusters to a family, Assign roles and/or distance ratios, and use configuration properties to teach the cluster. For example, the creation module 410 may use these persistent settings to bring awareness of relationships to one or more cluster or family members as well as related attributes between families, such as distance.

在优选的实施方式中,关系模块405保持用于集群族和族成员的这些持续设置。In a preferred embodiment, relationship module 405 maintains these persistent settings for cluster families and family members.

此外,可以使用自主功能来检测角色和集群间的关系。例如,在创建模块410中执行自主功能簇。In addition, autonomous functions can be used to detect relationships between roles and clusters. For example, in the creation module 410 an autonomous function cluster is performed.

在步骤515中,族成员在彼此间进行协商,以确定族的哪个族成员位于获得外部数据对象的最佳位置。例如,如图2B所示,集群族280(1)包括两个或多个族成员220(a)、220(b),可以在城市距离上配置集群族并且集群族可以用于生产目的。集群族280(2)包括两个或多个族成员220(c)、220(d),可以在全球距离上参照族280(1)来配置集群族并且集群族可以用于DR目的。集群族280(1)可以经由准备好拷贝数据对象的网络110、215与集群族280(2)进行通信。由于每个族的集群成员以及集群自身知道其它每个集群的角色和彼此间的关系,族成员220(c)、220(d)可以在彼此间进行协商以确定族280(2)的哪一个族成员位于获得数据对象外部的拷贝的最佳位置。In step 515, the family members negotiate among themselves to determine which family member of the family is in the best position to obtain the external data object. For example, as shown in FIG. 2B , a cluster family 280 ( 1 ) including two or more cluster members 220 ( a ), 220 ( b ) may be configured at urban distances and may be used for production purposes. The cluster family 280(2), comprising two or more family members 220(c), 220(d), may be configured with reference to the family 280(1) at global distances and may be used for DR purposes. Cluster family 280(1) may communicate with cluster family 280(2) via network 110, 215 ready to copy data objects. Since the cluster members of each family, as well as the cluster itself, know each other's role and relationship to each other, the family members 220(c), 220(d) can negotiate among themselves to determine which of the family 280(2) Family members are in the best position to obtain a copy of the data object's exterior.

在优选的实施方式中,例如,属于集群族280的集群族成员220使用共同拷贝工作队列按FIFO顺序工作。在拷贝的工作之前,每个集群族成员220首先确保集群族280中没有其它集群族成员已经在拷贝或已经拷贝了。如果不是,一个或多个集群族成员220可执行拷贝。如果另一族成员已经正在发生拷贝或另一族成员已经发生拷贝,一个或多个集群族成员可以将拷贝移动到延迟队列。一些时间之后,毕竟已经将激活的生产内容拷贝到集群族280中,族成员开始在他们应当彼此共享内容的延迟队列上进行工作。如果原始获得拷贝的对等族成员不在,那么其仍然可从外部集群或另一族成员获得拷贝。In a preferred embodiment, for example, cluster family members 220 belonging to cluster family 280 work in FIFO order using a common copy work queue. Before working on the copy, each cluster family member 220 first ensures that no other cluster family member in the cluster family 280 is already copying or has copied. If not, one or more cluster family members 220 may perform the copy. One or more cluster family members may move the copy to the delay queue if another family member is already copying or if another family member has already taken the copy. After some time, after all the active production content has been copied into the cluster family 280, the family members start working on the delay queues where they should share content with each other. If the peer family member that originally obtained the copy is not present, it can still obtain the copy from an external cluster or another family member.

在步骤520,一个或多个集群族成员获得和复制信息或源卷。例如,选择属于集群族280的一个或多个集群族成员220以在远程网络110、215上拉取数据或源卷,拷贝/复制数据或源卷、并且将其带入到集群族280中。例如,族280(2)的族成员220(c)将外部数据对象通过网络110、215拉到族280(2)。族成员220(c)现在具有一致的源并且可以要求族成员220(c)在缓存(例如,TVC 365)中保持源卷,以使其容易地可用于对等复制。In step 520, one or more cluster family members obtain and replicate information or source volumes. For example, one or more cluster family members 220 belonging to the cluster family 280 are selected to pull data or source volumes on the remote network 110 , 215 , copy/duplicate the data or source volumes, and bring them into the cluster family 280 . For example, family member 220(c) of family 280(2) pulls an external data object over network 110, 215 to family 280(2). Family member 220(c) now has a consistent source and may ask family member 220(c) to keep the source volume in cache (e.g., TVC 365) so that it is readily available for peer-to-peer replication.

在步骤525中,在族的族成员之间协作地复制源卷。例如,当第一次将集群带入到族中,集群的族组通过序列化任意一个卷的复制来进行协作。族中的每个集群均可用作复制1/N卷的角色,其中N是族内需要拷贝的集群的数量。In step 525, the source volume is collaboratively replicated among family members of the family. For example, when a cluster is first brought into a family, the cluster's family groups collaborate by serializing the replication of any one volume. Each cluster in the family can be used to replicate 1/N volumes, where N is the number of clusters within the family that need to be replicated.

通过协作地复制,由于仅跨远距离链路拉取任意一个卷一次而不是多次,集群族或DR位置可以N倍更快地变为累积一致。然后,由于它们之间的相对距离,集群可以更快地变为用于可用性的彼此间一致。变为DR一致和HA(高可用性)一致的整体时间可在每个集群上极大地增加,而与相同远程生产集群的源相独立。By replicating cooperatively, a cluster family or DR location can become cumulatively consistent N times faster due to pulling any one volume only once rather than multiple times across long-distance links. Then, due to their relative distance, the clusters can become consistent with each other for availability more quickly. The overall time to become DR consistent and HA (High Availability) consistent can be greatly increased on each cluster, independent of sources from the same remote production cluster.

在步骤530,集群族实现累积一致性。即,完成需要被复制到集群族的集群族外部的所有卷。集群族作为整体与所有外部数据对象一致。现在,集群族成员可以在彼此间共享,使得集群族内每个单独族成员具有其自己的拷贝。At step 530, the cluster family achieves cumulative consistency. That is, complete all volumes outside the cluster family that need to be replicated to the cluster family. The cluster family as a whole is consistent with all external data objects. Cluster family members can now be shared among each other such that each individual family member within the cluster family has its own copy.

在步骤535,在将所有的卷复制到族中并且族是累积一致的之后,相同族内不一致的集群于是彼此间共享卷(即,数据对象)。At step 535, after all volumes are replicated into the cluster and the cluster is cumulatively consistent, inconsistent clusters within the same cluster then share volumes (ie, data objects) with each other.

因此,本发明的实现方法500协作地执行复制,其中由于仅跨远距离链路拉取任意一个卷一次而不是多次,集群族或DR位置可以N倍更快地变为累积一致。然后,由于它们之间的相对距离,集群可以更快地变为用于可用性的彼此间一致。变为DR一致和HA(高可用性)一致的整体时间可在每个集群上极大地增加,而与相同远程生产集群的源相独立。Thus, the implementation method 500 of the present invention performs replication cooperatively, where a cluster family or DR location can become cumulatively consistent N times faster due to pulling any one volume across long-distance links only once rather than multiple times. Then, due to their relative distance, the clusters can become consistent with each other for availability more quickly. The overall time to become DR consistent and HA (High Availability) consistent can be greatly increased on each cluster, independent of sources from the same remote production cluster.

此外,当客户仅需要N个拷贝以及这些N个拷贝彼此远离时,方法500使用复制到X个集群(其中,X代表集群的数量)的更有效方法的族。这样允许客户在距离/族上传播拷贝,而不明确集群接收拷贝。例如,用户可能不关心哪个集群包含拷贝,只要存在N个拷贝(其中N小于X);并且客户要求N个拷贝全部存在于独立的族中。因此,域中的所有基站可以协作,以确保来自每个族的至少一个成员复制卷,并且于是剩余的集群可满足其复制要求。如是,可以在N个族中以N个拷贝作为结束,而不会在任意一个区域中具有N个拷贝中的过多拷贝。Furthermore, method 500 uses a family of more efficient methods of replicating to X clusters (where X represents the number of clusters) when customers only need N copies and those N copies are far from each other. This allows clients to spread copies across distances/clusters without specifying which clusters receive the copies. For example, a user may not care which cluster contains a copy, as long as there are N copies (where N is less than X); and a client requires that N copies all exist in a separate family. Thus, all base stations in the domain can cooperate to ensure that at least one member from each family replicates the volume, and then the remaining clusters can meet their replication requirements. As such, one can end up with N copies in N families without having too many of the N copies in any one region.

可以在安装处理中按任意结合来使用方法500的步骤。例如,通过在步骤510中配置的集群族,使用步骤515到535,方法500可以支持在其族外部的集群上其自己族的集群。例如,对生产集群的安装可以支持远程集群上的另一生产集群(在系统族中),所述远程集群最初用于灾难恢复(电子跳马)。由于用户可能趋于想要产品数据,以保持本地的和快速复制的高可用性(支持通过灾难恢复的可用性),根据短期目标且仍然不影响长期目标,获得产品集群更为有效。The steps of method 500 may be used in any combination during the installation process. For example, with the cluster family configured in step 510, using steps 515 through 535, method 500 may support clusters of its own family on clusters external to its family. For example, an installation to a production cluster may support another production cluster (in a system family) on a remote cluster that was originally used for disaster recovery (electronic vault). Since users may tend to want production data to maintain local and fast-replicated high availability (to support availability through disaster recovery), it is more efficient to get a production cluster based on short-term goals and still not compromising long-term goals.

参考图6A和6B,是示出了本发明的集群族选择和协作复制方法的优选实施方式的示意性流程图。方法600实质上包括执行上面关于图1至4介绍的装置和系统的操作呈现的功能的步骤。在优选的实施方式中,利用计算机程序产品来实现所述方法,所述计算机程序产品包括具有计算机可读程序的计算机可读媒介。可以将计算机可读程序集成到计算机系统,例如集成管理器320和/或主机210,其中与计算系统结合的程序能够执行方法600。Referring to FIGS. 6A and 6B , there are schematic flowcharts illustrating a preferred embodiment of the cluster family selection and cooperative replication method of the present invention. The method 600 essentially includes steps to perform the functions presented above with respect to the operation of the devices and systems described above with respect to FIGS. 1 to 4 . In a preferred embodiment, the method is implemented by means of a computer program product comprising a computer readable medium with a computer readable program. A computer readable program may be integrated into a computer system, such as integrated manager 320 and/or host 210 , where the program associated with the computing system is capable of performing method 600 .

方法600开始且在步骤605,拷贝过程开始。例如,城市A中的外部数据对象需要在城市B中复制(例如,图2B)。Method 600 begins and at step 605, the copy process begins. For example, an external data object in city A needs to be replicated in city B (eg, Figure 2B).

在步骤610,控制确定接收拷贝请求的集群是否是集群族成员。如果不是,在步骤615,拷贝卷,且不执行协作复制。例如,协作复制模块415可以管理拷贝请求,而没有延迟或优先级改变。In step 610, control determines whether the cluster receiving the copy request is a cluster family member. If not, at step 615, the volume is copied and no collaborative copy is performed. For example, collaborative replication module 415 can manage copy requests without delay or priority changes.

此外,协作复制模块415可以在族中选择至少一个族成员,以跨遥远的链路或网络拉取数据。一旦确定了集群是族成员且没有其它族成员跨网路拉取数据,可以执行选择。Additionally, the collaborative replication module 415 can select at least one family member in the family to pull data across distant links or networks. Selection can be performed once it is determined that the cluster is a family member and no other family members are pulling data across the network.

如果这是集群族成员,在步骤620,控制确定其它族成员中的一个是否已经完成了拷贝所述卷。如果是,在步骤625,由于其他族成员中的一个已经拷贝了卷,为拷贝卷提供较低优先级并且将卷拷贝回队列。If this is a cluster family member, at step 620 control determines whether one of the other family members has finished copying the volume. If so, at step 625, since one of the other family members has already copied the volume, the copy volume is given lower priority and the volume is copied back to the queue.

如果族成员中的一个还没有完成对所述卷的拷贝,在步骤630,控制确定另一族成员是否激活地包括所述卷。如果是,在步骤635,降低用于拷贝这个卷的优先级并且在回到队列前存在延迟。在将拷贝请求发送回队列之前的延迟确保,例如,激活地拷贝卷的另一族成员没有遭遇拷贝卷的任何问题。If one of the family members has not completed copying the volume, in step 630, control determines whether another family member actively includes the volume. If so, at step 635, the priority for copying this volume is lowered and there is a delay before returning to the queue. The delay before sending the copy request back to the queue ensures, for example, that another family member actively copying the volume does not encounter any problems with the copy volume.

在步骤630,如果没有激活地拷贝所述卷的其它族成员,那么在步骤640,控制确定另一族成员是否也已经准备拷贝所述卷,但是此时没有激活地拷贝。如果不是,在步骤645,控制确定此时没有激活地拷贝的这种其它族成员是否应当继承拷贝需求标志。如果是,在步骤645,这个集群降低拷贝优先级且延迟回到队列。In step 630, if there are no other family members actively copying the volume, then in step 640, control determines whether another family member is also ready to copy the volume, but not actively copying at this time. If not, in step 645, control determines whether such other family members that are not actively copied at this time should inherit the copy requirement flag. If so, at step 645, the cluster lowers the copy priority and delays back to the queue.

如果在步骤645中的否,方法600移动到步骤655并且所述族成员赢得两个集群成员之间的连接中断器并且继承拷贝标志。因此,在步骤645,控制确定将指定哪个族成员来继承拷贝标志。非指定族成员降低拷贝优先级并且延迟回到队列(例如,步骤650)。If no in step 645, method 600 moves to step 655 and the family member wins the connection breaker between the two cluster members and inherits the copy flag. Accordingly, in step 645, control determines which family member is to be designated to inherit the copy flag. Non-designated family members lower the copy priority and delay back to the queue (eg, step 650).

返回步骤640,如果另一族成员没有准备好拷贝所述卷,那么在步骤655,控制确定在所述集群族中仅有一个族成员准备好拷贝所述卷且指定族成员作为集群来继承拷贝标志且完成复制。Returning to step 640, if another family member is not ready to copy the volume, then in step 655, control determines that only one family member in the cluster family is ready to copy the volume and designates the family member as the cluster to inherit the copy flag And complete the copy.

应当注意的是,在步骤640,控制可以确定存在准备拷贝并且此时没有激活地拷贝的另一族,但如步骤645所示的,确定其它集群将不会继承拷贝标志。因此,步骤640中,集群将继承拷贝标志,如步骤655所示。It should be noted that at step 640, control may determine that there is another cluster that is ready to copy and is not actively copying at this time, but as shown in step 645, determines that the other clusters will not inherit the copy flag. Therefore, in step 640, the cluster will inherit the copy flag, as shown in step 655.

在步骤660,在步骤655中继承拷贝标志的指定集群完成拷贝。At step 660 , the specified cluster that inherited the copy flag in step 655 completes the copy.

在步骤670,控制清除源集群处的拷贝需求标志并且通过设置用于集群族的族成员的拷贝需求标志,协作以累积地将族变为一致的。In step 670, control clears the copy required flag at the source cluster and cooperates to cumulatively make the family consistent by setting the copy required flag for the family members of the cluster family.

在步骤675,集群成员的其它族成员完成他们的拷贝,并且重置在步骤655中设置的在被指定集成拷贝标志的集群内的他们的拷贝需求标志。In step 675, the other family members of the cluster member complete their copy and reset their copy required flags set in step 655 within the cluster assigned the integrated copy flag.

图1至3可以是多集群配置的指示。在多集群配置或(网格配置)中,从微代码的角度,每个集群可能不知道其与自己以及其它集群的关系和角色,因此与所有其它期间平等独立地工作。例如,当从一个或两个生产集群全球远程地配置两个或多个集群时,他们可以通过在远程网络上“拉取”数据而单独地复制。由于集群没有关系意识,他们不能够基于它们的角色和/或与其它集群的距离来按最有效的方式进行操作。Figures 1 to 3 may be indicative of a multi-cluster configuration. In a multi-cluster configuration or (grid configuration), each cluster may be unaware of its relationship and role to itself and to other clusters from a microcode perspective, and thus works equally and independently from all others. For example, when two or more clusters are configured remotely globally from one or two production clusters, they can be replicated individually by "pulling" data over the remote network. Since clusters are not relationship aware, they cannot operate in the most efficient manner based on their role and/or distance from other clusters.

此外,在多集群配置中,由于这种对关系的无意识,很大程度上影响选择集群以在安装期间获得卷的装置以及集群获得卷复制的能力。例如,生产集群可以在城市远程集群上选择全球远程源集群,以用于安装和/或拷贝处理。由于集群间的网络距离,全球远程集群的效率要低得多。Furthermore, in a multi-cluster configuration, due to this unawareness of the relationship, the device that selects the cluster to obtain the volume during installation and the ability of the cluster to obtain volume replication is largely affected. For example, a production cluster may select a global remote source cluster over a city remote cluster for installation and/or copy processing. Globally remote clusters are much less efficient due to the network distance between clusters.

本发明的实现方式能够通过为多集群或网格配置中族成员和族之间引入关系意识来解决这些问题。此外,实现本发明可以提高数据拷贝和/或复制的性能、效率和优化。例如,协作地复制到族,从而实现累积族一致性更快N倍以及通过相比于让每个集群独立地从相同的远程生产集群获取降低变为DR一致和HA一致的整体时间,仅使用累积网络吞吐量的1/N可以提升效率和性能。Implementations of the present invention are able to address these issues by introducing awareness of family members and relationships between families in a multi-cluster or grid configuration. Furthermore, implementing the present invention can improve the performance, efficiency and optimization of data copying and/or replication. For example, cooperatively replicating to the family, thereby achieving cumulative family consistency N times faster and reducing the overall time to become DR consistent and HA consistent by using only 1/N of cumulative network throughput can improve efficiency and performance.

参照图1至6,本发明的实现方式可以涉及软件、固件、微代码、硬件和/或任意组合。实现方式可以采用在媒介中实现的代码或逻辑的形式,媒介例如是分层存储节点315的存储器、存储和/或电路,其中媒介可以包括硬件逻辑(例如,集成电路芯片、可编程门阵列[PGA]、应用专用集成电路[ASIC]、或其它电路、逻辑或设备)、或计算机可读存储媒介,例如磁存储媒介(例如,电、磁、光、电磁、红外、或半导体系统、半导体或固态存储器、磁带、可移动计算机磁带、以及随机存取存储器[RAM]、只读存储器[ROM]、硬磁盘和光盘、致密盘-只读存储器[CD-ROM]、致密盘-读/写[CD-R/W]以及数字视频磁碟(DVD))。Referring to Figures 1 to 6, implementation of the present invention may involve software, firmware, microcode, hardware, and/or any combination thereof. Implementations may take the form of code or logic implemented in a medium, such as the memory, storage, and/or circuitry of hierarchical storage node 315, where the medium may include hardware logic (e.g., an integrated circuit chip, a programmable gate array [ PGA], application-specific integrated circuit [ASIC], or other circuits, logic, or devices), or computer-readable storage media, such as magnetic storage media (e.g., electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, semiconductor or Solid State Memory, Magnetic Tape, Removable Computer Tape, and Random Access Memory [RAM], Read Only Memory [ROM], Hard Disk and Optical Disk, Compact Disk - Read Only Memory [CD-ROM], Compact Disk - Read/Write [CD -R/W] and Digital Video Disk (DVD)).

所属领域的技术人员容易了解的是,可以对上述讨论的方式进行改变,包括对步骤顺序的改变。此外,所属领域的技术人员将了解的是,可以与此处示出的那些方式使用不同的特定组件安排。Those skilled in the art will readily appreciate that changes may be made to the manner discussed above, including changes to the order of the steps. Furthermore, those skilled in the art will appreciate that specific component arrangements may be used other than those shown here.

虽然此处已经详细地说明了本发明的优选实施方式,应当了解的是,在不脱离下列权利要求所述的本发明的范围的情况下,所述领域的技术人员可以对这些实施方式进行各种修改和改变。Although preferred embodiments of the present invention have been described in detail herein, it should be understood that various modifications can be made to these embodiments by those skilled in the art without departing from the scope of the invention as set forth in the following claims. modifications and changes.

Claims (25)

1.一种用于多个集群的协作复制的方法,所述方法包括以下步骤:1. A method for collaborative replication of a plurality of clusters, said method comprising the steps of: 将多个集群的至少一个子集安排到集群族的族成员中;arranging at least a subset of the plurality of clusters into family members of cluster families; 在集群族成员之间进行协商,以确定哪个集群族成员在从族外部的至少一个集群获得至少一个外部数据对象的最佳位置;negotiating among cluster family members to determine which cluster family member is in the best position to obtain at least one external data object from at least one cluster external to the cluster; 选择集群族的一个族成员,以获得外部数据对象;Select a family member of the cluster family to obtain the external data object; 在集群族内实现与外部数据对象的累积一致性;以及achieve cumulative consistency with external data objects within the cluster family; and 在集群族成员之间共享外部数据对象,使得集群族内的每个集群与外部数据对象一致。The external data object is shared among cluster family members such that each cluster within the cluster family is consistent with the external data object. 2.根据权利要求1所述的方法,进一步包括基于集群关系因素和角色因素中的至少一个,在集群族之间创建关系的步骤。2. The method of claim 1, further comprising the step of creating relationships between cluster families based on at least one of cluster relationship factors and role factors. 3.根据权利要求1所述的方法,进一步包括以下步骤:3. The method of claim 1, further comprising the steps of: 具有一致源的族成员在缓存中保持卷,以使卷可容易地用于对等复制的其它族成员。Family members with consistent sources maintain volumes in cache so that the volumes are readily available to other family members for peer-to-peer replication. 4.根据权利要求3所述的方法,进一步包括以下步骤:4. The method of claim 3, further comprising the steps of: 所述族成员在缓存中保持所述卷用于其他族成员,使外部集群免除在外部集群的缓存中保持拷贝。The family member maintains the volume in cache for other family members, freeing the external cluster from maintaining a copy in the external cluster's cache. 5.根据权利要求1所述的方法,进一步包括以下步骤:5. The method of claim 1, further comprising the steps of: 族中N个族成员中的每一个复制外部数据对象的1/N卷。Each of the N family members in the family replicates 1/N volumes of external data objects. 6.根据权利要求3至5中任意一项所述的方法,进一步包括协作地将所有复制序列化到集群族中的步骤。6. A method according to any one of claims 3 to 5, further comprising the step of cooperatively serializing all replicas into cluster families. 7.根据权利要求1所述的方法,进一步包括以下步骤:7. The method of claim 1, further comprising the steps of: 复制外部数据对象的1/N卷的N个族成员中的第一个通知外部集群,第一族成员将保持所述卷用于集群族成员以及免除外部集群在外部集群缓存中保持卷。The first of N family members replicating 1/N volumes of the external data object informs the external cluster that the first family member will keep the volume for the cluster family member and exempts the external cluster from maintaining the volume in the external cluster cache. 8.根据权利要求1至5中任意一项所述的方法,进一步包括提供多个集群族的步骤,其中来自每个族的至少一个族成员复制卷。8. The method of any one of claims 1 to 5, further comprising the step of providing a plurality of cluster families, wherein at least one family member from each family replicates a volume. 9.根据权利要求1所述的方法,进一步包括以下步骤:提供包括多个集群的域,其中集群协作以确保来自每个族的至少一个族成员复制卷并且剩余的集群服从复制要求。9. The method of claim 1, further comprising the step of providing a domain comprising a plurality of clusters, wherein the clusters cooperate to ensure that at least one family member from each family replicates a volume and that the remaining clusters obey replication requirements. 10.根据权利要求1所述的方法,进一步包括以下步骤:10. The method of claim 1, further comprising the steps of: 接收拷贝卷到第一集群的拷贝请求;receiving a copy request to copy the volume to the first cluster; 确定第一集群是否是集群族的族成员;determining whether the first cluster is a family member of a cluster family; 响应于第一集群是第一族成员,确定指定集群族中的哪一族成员继承拷贝请求;In response to the first cluster being a first family member, determining which family member in the specified cluster family inherits the copy request; 响应于所述确定,执行拷贝请求并且协作地复制所述卷到集群族中;In response to the determination, executing a copy request and cooperatively replicating the volume into the cluster family; 在集群族内实现累积一致性;以及achieve cumulative consistency within cluster families; and 在集群族内共享所述卷,使得所述集群族内的卷的所有拷贝是一致的。The volume is shared within the cluster family such that all copies of the volume within the cluster family are consistent. 11.根据权利要求10所述的方法,其中,响应于第一集群是第一族成员,确定指定集群族中的哪一族成员继承拷贝请求包括以下步骤:11. The method of claim 10, wherein, in response to the first cluster being a first family member, determining which family member of the specified cluster family inherits the copy request comprises the steps of: 确定另一族成员是否已经完成拷贝所述卷,determining whether another family member has finished copying the volume, 响应于另一族成员还没拷贝所述卷,指定第一族成员继承拷贝请求。In response to another family member not having copied the volume, the first family member is designated to inherit the copy request. 12.根据权利要求10所述的方法,其中,响应于第一集群是第一族成员,确定指定集群族中的哪一族成员继承拷贝请求包括以下步骤:12. The method of claim 10, wherein, in response to the first cluster being a first family member, determining which family member of the specified cluster family inherits the copy request comprises the steps of: 确定第二族成员是否激活地拷贝所述卷;以及determining whether the second family member is actively copying the volume; and 响应于第二族成员激活地拷贝所述卷,指定第二族成员来继承拷贝请求。In response to the second family member actively copying the volume, the second family member is designated to inherit the copy request. 13.根据权利要求12所述的方法,其中,响应于第一集群是第一族成员,确定指定集群族中的哪一族成员继承拷贝请求进一步包括以下步骤:13. The method of claim 12, wherein, in response to the first cluster being a first family member, determining which family member of the specified cluster family inherits the copy request further comprises the step of: 确定第二族成员是否准备好拷贝,但是此时没有激活地拷贝;Determine whether the second family member is ready to copy, but is not actively copying at this time; 响应于第二族成员准备好拷贝并且此时没有激活地拷贝,降低第二族成员的拷贝优先级且延迟拷贝请求。In response to the second family member being ready to copy and not actively copying at this time, the copy priority of the second family member is lowered and the copy request is delayed. 14.根据权利要求10所述的方法,进一步包括以下步骤:14. The method of claim 10, further comprising the steps of: 响应于确定第二族成员继承拷贝请求,降低第一族成员的拷贝优先级且延迟拷贝请求。In response to determining that the second family member inherits the copy request, lowering the copy priority of the first family member and delaying the copy request. 15.根据权利要求10所述的方法,进一步包括以下步骤:15. The method of claim 10, further comprising the steps of: 指定第一族成员继承拷贝请求,作为集群族的源集群;Designate members of the first family to inherit the copy request as the source cluster of the cluster family; 在第一族成员处完成拷贝;Copies are made at members of the first family; 在第一族成员处清除拷贝请求标志;Clear the copy request flag at the first family member; 在第一族成员处为其他族成员设置拷贝请求标志;setting a copy request flag at the first family member for other family members; 其他族成员继承拷贝请求标志;Other family members inherit the copy request flag; 其他族成员完成拷贝;以及other family members complete the copy; and 每个族成员重置拷贝请求标志。Each family member resets the copy request flag. 16.一种用于多个集群的协作复制的系统,所述系统包括:16. A system for collaborative replication of multiple clusters, the system comprising: 网络;network; 通过网络进行通信的多个站点,每个站点包括至少一个主机和存储系统,所述存储系统包括多个集群,每个集群包括被配置为存取在磁带上存储的卷的至少一个磁带驱动器、至少一个磁带卷缓存、以及集群管理器,所述集群管理器包括:a plurality of sites in communication over a network, each site comprising at least one host and a storage system comprising a plurality of clusters, each cluster comprising at least one tape drive configured to access volumes stored on tape, at least one tape volume cache, and a cluster manager comprising: 创建模块,用于建立集群族以及将集群分组到集群族的族成员中;以及create modules for establishing cluster families and grouping clusters into cluster family members; and 协作复制模块,用于选择族成员以协作地复制外部数据对象到集群族中并且实现累积一致性。A cooperative replication module for selecting family members to cooperatively replicate external data objects into cluster families and achieve cumulative consistency. 17.根据权利要求16所述的系统,其中协作复制模块进一步用于在每个族成员之间共享复制。17. The system of claim 16, wherein the collaborative replication module is further for sharing replication among each family member. 18.根据权利要求17所述的系统,其中协作复制模块进一步用于指示族中N个族成员中的每一个复制外部数据对象的1/N卷。18. The system of claim 17, wherein the collaborative replication module is further for instructing each of the N family members of the family to replicate 1/N volumes of the external data object. 19.根据权利要求16到18中任意一项所述的系统,其中创建模块进一步用于基于集群间关系因素和角色因素中的至少一个,创建集群族。19. The system of any one of claims 16 to 18, wherein the creation module is further configured to create a cluster family based on at least one of an inter-cluster relationship factor and a role factor. 20.一种用于多个集群的协作复制的装置,所述装置包括:20. An apparatus for collaborative replication of multiple clusters, the apparatus comprising: 创建模块,用于从多个集群创建集群族,其中集群通过网络进行通信并且每个集群包括缓存;以及Create a module for creating a cluster family from multiple clusters, where the clusters communicate over a network and each cluster includes a cache; and 协作复制模块,用于协作地复制至少一个外部数据对象到集群族并且实现累积一致性。A cooperative replication module for cooperatively replicating at least one external data object to the cluster family and achieving cumulative consistency. 21.根据权利要求20所述的装置,其中创建模块进一步用于根据集群关系因素和角色因素中的至少一个建立集群族以及将集群分组到集群族的族成员。21. The apparatus of claim 20, wherein the creation module is further for establishing cluster families and grouping clusters into cluster family members based on at least one of a cluster relationship factor and a role factor. 22.根据权利要求20至21中任意一项所述的装置,其中协作复制模块进一步用于通过延迟复制来处理持续复制源觉察。22. The apparatus according to any one of claims 20 to 21, wherein the collaborative replication module is further configured to handle persistent replication source awareness by delaying replication. 23.根据权利要求20至21中任意一项所述的装置,其中协作复制模块进一步用于指示集群族中N个族成员中的每一个复制外部数据对象的1/N个卷。23. The apparatus according to any one of claims 20 to 21, wherein the cooperative replication module is further configured to instruct each of the N family members in the cluster family to replicate 1/N volumes of the external data object. 24.根据权利要求20至21中任意一项所述的装置,其中协作复制模块进一步用于:选择集群族的一个族成员,以获得外部数据对象,并且作为用于集群族内所有族成员的源集群;以及在共享外部数据对象之前,使得源集群负责实现集群族内的累积一致性。24. The apparatus according to any one of claims 20 to 21, wherein the cooperative replication module is further configured to: select a family member of the cluster family to obtain an external data object, and serve as the source cluster; and making the source cluster responsible for achieving cumulative consistency within the cluster family before sharing external data objects. 25.根据权利要求20至21中任意一项所述的装置,进一步包括:25. Apparatus according to any one of claims 20 to 21, further comprising: 关系模块,用于保持定义集群族和族成员之间的角色、规则和关系的因素。A relationship module that holds factors that define roles, rules, and relationships between cluster families and family members.
CN201080055666.7A 2009-12-11 2010-11-16 Method and system for cluster selection and cooperative replication Active CN102652423B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/635,702 US8812799B2 (en) 2009-12-11 2009-12-11 Cluster families for cluster selection and cooperative replication
US12/635,702 2009-12-11
PCT/EP2010/067595 WO2011069783A1 (en) 2009-12-11 2010-11-16 Cluster families for cluster selection and cooperative replication

Publications (2)

Publication Number Publication Date
CN102652423A CN102652423A (en) 2012-08-29
CN102652423B true CN102652423B (en) 2015-04-01

Family

ID=43567534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201080055666.7A Active CN102652423B (en) 2009-12-11 2010-11-16 Method and system for cluster selection and cooperative replication

Country Status (6)

Country Link
US (5) US8812799B2 (en)
JP (1) JP5695660B2 (en)
CN (1) CN102652423B (en)
DE (1) DE112010003837T5 (en)
GB (1) GB2488248B (en)
WO (1) WO2011069783A1 (en)

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8812799B2 (en) * 2009-12-11 2014-08-19 International Business Machines Corporation Cluster families for cluster selection and cooperative replication
US20110153715A1 (en) * 2009-12-17 2011-06-23 Microsoft Corporation Lightweight service migration
US9389895B2 (en) * 2009-12-17 2016-07-12 Microsoft Technology Licensing, Llc Virtual storage target offload techniques
US9323775B2 (en) 2010-06-19 2016-04-26 Mapr Technologies, Inc. Map-reduce ready distributed file system
US11726955B2 (en) 2010-06-19 2023-08-15 Hewlett Packard Enterprise Development Lp Methods and apparatus for efficient container location database snapshot operation
US9311328B2 (en) * 2011-04-22 2016-04-12 Veritas Us Ip Holdings Llc Reference volume for initial synchronization of a replicated volume group
US8856082B2 (en) * 2012-05-23 2014-10-07 International Business Machines Corporation Policy based population of genealogical archive data
WO2013190649A1 (en) * 2012-06-20 2013-12-27 富士通株式会社 Information processing method and device related to virtual-disk migration
US9619256B1 (en) * 2012-06-27 2017-04-11 EMC IP Holding Company LLC Multi site and multi tenancy
US8904231B2 (en) * 2012-08-08 2014-12-02 Netapp, Inc. Synchronous local and cross-site failover in clustered storage systems
US20140122817A1 (en) * 2012-10-31 2014-05-01 Duke Browning System and method for an optimized distributed storage system
US8903539B2 (en) * 2012-11-21 2014-12-02 International Business Machines Corporation Efficient distribution and selection of storage media in a storage medium library
US20140229695A1 (en) * 2013-02-13 2014-08-14 Dell Products L.P. Systems and methods for backup in scale-out storage clusters
US9438674B2 (en) 2013-06-07 2016-09-06 International Business Machines Corporation Appliance interconnection architecture
US10423643B2 (en) 2013-08-29 2019-09-24 Oracle International Corporation System and method for supporting resettable acknowledgements for synchronizing data in a distributed data grid
US20230254127A1 (en) * 2013-11-06 2023-08-10 Pure Storage, Inc. Sharing Encryption Information Amongst Storage Devices In A Storage System
US11128448B1 (en) * 2013-11-06 2021-09-21 Pure Storage, Inc. Quorum-aware secret sharing
US9304871B2 (en) 2013-12-02 2016-04-05 International Business Machines Corporation Flash copy for disaster recovery (DR) testing
US9286366B2 (en) * 2013-12-02 2016-03-15 International Business Machines Corporation Time-delayed replication for data archives
US9262290B2 (en) 2013-12-02 2016-02-16 International Business Machines Corporation Flash copy for disaster recovery (DR) testing
US20150271014A1 (en) * 2014-03-21 2015-09-24 Onyx Ccs Automatic configuration of new components by infrastructure management software
US9606873B2 (en) 2014-05-13 2017-03-28 International Business Machines Corporation Apparatus, system and method for temporary copy policy
US9542277B2 (en) 2014-09-30 2017-01-10 International Business Machines Corporation High availability protection for asynchronous disaster recovery
US20160140197A1 (en) * 2014-11-14 2016-05-19 Tim Gast Cross-system synchronization of hierarchical applications
US20160259573A1 (en) * 2015-03-03 2016-09-08 International Business Machines Corporation Virtual tape storage using inter-partition logical volume copies
US10896207B2 (en) * 2015-08-20 2021-01-19 International Business Machines Corporation Optimization of object-based storage
WO2017131791A1 (en) * 2016-01-30 2017-08-03 Entit Software Llc Log event cluster analytics management
CN107220263B (en) * 2016-03-22 2021-09-03 阿里巴巴集团控股有限公司 Optimization method, evaluation method, processing method and device for data migration
CN107291724A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 Company-data clone method, priority determine method and device
US10248523B1 (en) * 2016-08-05 2019-04-02 Veritas Technologies Llc Systems and methods for provisioning distributed datasets
CN108089949A (en) * 2017-12-29 2018-05-29 广州创慧信息科技有限公司 A kind of method and system of automatic duplicating of data
KR102292389B1 (en) * 2018-01-17 2021-08-25 한국전자통신연구원 Apparatus for distributed processing through remote direct memory access and method for the same
US11010233B1 (en) 2018-01-18 2021-05-18 Pure Storage, Inc Hardware-based system monitoring
US11172021B2 (en) 2018-02-06 2021-11-09 Hewlett-Packard Development Company, L.P. File objects download and file objects data exchange
US11023174B2 (en) 2019-09-12 2021-06-01 International Business Machines Corporation Combining of move commands to improve the performance of an automated data storage library
US11106509B2 (en) 2019-11-18 2021-08-31 Bank Of America Corporation Cluster tuner
US11429441B2 (en) 2019-11-18 2022-08-30 Bank Of America Corporation Workflow simulator
US12204657B2 (en) 2019-11-22 2025-01-21 Pure Storage, Inc. Similar block detection-based detection of a ransomware attack
US12067118B2 (en) 2019-11-22 2024-08-20 Pure Storage, Inc. Detection of writing to a non-header portion of a file as an indicator of a possible ransomware attack against a storage system
US12248566B2 (en) 2019-11-22 2025-03-11 Pure Storage, Inc. Snapshot deletion pattern-based determination of ransomware attack against data maintained by a storage system
US12411962B2 (en) 2019-11-22 2025-09-09 Pure Storage, Inc. Managed run-time environment-based detection of a ransomware attack
US11687418B2 (en) 2019-11-22 2023-06-27 Pure Storage, Inc. Automatic generation of recovery plans specific to individual storage elements
US12050683B2 (en) * 2019-11-22 2024-07-30 Pure Storage, Inc. Selective control of a data synchronization setting of a storage system based on a possible ransomware attack against the storage system
US11645162B2 (en) 2019-11-22 2023-05-09 Pure Storage, Inc. Recovery point determination for data restoration in a storage system
US11720714B2 (en) 2019-11-22 2023-08-08 Pure Storage, Inc. Inter-I/O relationship based detection of a security threat to a storage system
US12153670B2 (en) 2019-11-22 2024-11-26 Pure Storage, Inc. Host-driven threat detection-based protection of storage elements within a storage system
US11755751B2 (en) 2019-11-22 2023-09-12 Pure Storage, Inc. Modify access restrictions in response to a possible attack against data stored by a storage system
US11657155B2 (en) 2019-11-22 2023-05-23 Pure Storage, Inc Snapshot delta metric based determination of a possible ransomware attack against data maintained by a storage system
US11720692B2 (en) 2019-11-22 2023-08-08 Pure Storage, Inc. Hardware token based management of recovery datasets for a storage system
US11520907B1 (en) 2019-11-22 2022-12-06 Pure Storage, Inc. Storage system snapshot retention based on encrypted data
US12079502B2 (en) 2019-11-22 2024-09-03 Pure Storage, Inc. Storage element attribute-based determination of a data protection policy for use within a storage system
US12079356B2 (en) 2019-11-22 2024-09-03 Pure Storage, Inc. Measurement interval anomaly detection-based generation of snapshots
US11615185B2 (en) 2019-11-22 2023-03-28 Pure Storage, Inc. Multi-layer security threat detection for a storage system
US12050689B2 (en) 2019-11-22 2024-07-30 Pure Storage, Inc. Host anomaly-based generation of snapshots
US11675898B2 (en) 2019-11-22 2023-06-13 Pure Storage, Inc. Recovery dataset management for security threat monitoring
US11500788B2 (en) 2019-11-22 2022-11-15 Pure Storage, Inc. Logical address based authorization of operations with respect to a storage system
US11341236B2 (en) 2019-11-22 2022-05-24 Pure Storage, Inc. Traffic-based detection of a security threat to a storage system
US11625481B2 (en) 2019-11-22 2023-04-11 Pure Storage, Inc. Selective throttling of operations potentially related to a security threat to a storage system
US12079333B2 (en) 2019-11-22 2024-09-03 Pure Storage, Inc. Independent security threat detection and remediation by storage systems in a synchronous replication arrangement
US11941116B2 (en) 2019-11-22 2024-03-26 Pure Storage, Inc. Ransomware-based data protection parameter modification
US11651075B2 (en) 2019-11-22 2023-05-16 Pure Storage, Inc. Extensible attack monitoring by a storage system
CN111885170B (en) * 2020-07-23 2022-03-11 平安科技(深圳)有限公司 Processing method and system of Internet of things control system, cloud server and medium
CN113641503B (en) * 2021-09-01 2024-05-14 上海联蔚盘云科技有限公司 Multi-cloud multi-cluster Kubernetes management system, method and equipment
US12321786B2 (en) 2021-09-28 2025-06-03 Hewlett Packard Enterprise Development Lp Regulation of throttling of polling based on processor utilizations
US11630603B1 (en) * 2021-09-28 2023-04-18 Hewlett Packard Enterprise Development Lp Hardware device polling using delay order
US12154592B2 (en) * 2021-11-02 2024-11-26 Quantum Corporation Automated system and method for diagnosing tape drive and media issues within large-scale tape library system
US12423195B2 (en) * 2023-10-13 2025-09-23 International Business Machines Corporation Concurrent recovery of exported physical tape data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030221074A1 (en) * 2002-05-24 2003-11-27 Ai Satoyama Computer system and a method of replication
US6718361B1 (en) * 2000-04-07 2004-04-06 Network Appliance Inc. Method and apparatus for reliable and scalable distribution of data files in distributed networks
CN1780420A (en) * 2004-11-18 2006-05-31 华为技术有限公司 A Method for Determining Priority in Cluster User Cluster Group
CN101005372A (en) * 2006-01-19 2007-07-25 思华科技(上海)有限公司 Cluster cache service system and its realizing method
US7461130B1 (en) * 2004-11-24 2008-12-02 Sun Microsystems, Inc. Method and apparatus for self-organizing node groups on a network
CN101355476A (en) * 2008-05-23 2009-01-28 林云帆 System and method for storing, distributing and applying data files based on server cluster

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862312A (en) * 1995-10-24 1999-01-19 Seachange Technology, Inc. Loosely coupled mass storage computer cluster
US6438705B1 (en) 1999-01-29 2002-08-20 International Business Machines Corporation Method and apparatus for building and managing multi-clustered computer systems
JP2000322292A (en) 1999-05-10 2000-11-24 Nec Corp Cluster type data server system and data storage method
US6952741B1 (en) * 1999-06-30 2005-10-04 Computer Sciences Corporation System and method for synchronizing copies of data in a computer system
US6950833B2 (en) * 2001-06-05 2005-09-27 Silicon Graphics, Inc. Clustered filesystem
US7243103B2 (en) 2002-02-14 2007-07-10 The Escher Group, Ltd. Peer to peer enterprise storage system with lexical recovery sub-system
US7523204B2 (en) * 2004-06-01 2009-04-21 International Business Machines Corporation Coordinated quiesce of a distributed file system
US20060080362A1 (en) 2004-10-12 2006-04-13 Lefthand Networks, Inc. Data Synchronization Over a Computer Network
US20060149922A1 (en) 2004-12-28 2006-07-06 Ceva D.S.P. Ltd. Multiple computational clusters in processors and methods thereof
US9176741B2 (en) 2005-08-29 2015-11-03 Invention Science Fund I, Llc Method and apparatus for segmented sequential storage
JP4281925B2 (en) 2006-06-19 2009-06-17 株式会社スクウェア・エニックス Network system
US7757111B2 (en) 2007-04-05 2010-07-13 International Business Machines Corporation Method and system for insuring data integrity in anticipation of a disaster
US7774094B2 (en) 2007-06-28 2010-08-10 International Business Machines Corporation Selecting a source cluster by measuring system factors, calculating a mount-to-dismount lifespan, and selecting the source cluster in response to the lifespan and a user policy
US8073922B2 (en) 2007-07-27 2011-12-06 Twinstrata, Inc System and method for remote asynchronous data replication
EP2052688B1 (en) * 2007-10-25 2012-06-06 pfm medical ag Snare mechanism for surgical retrieval
JP5018403B2 (en) 2007-10-31 2012-09-05 日本電気株式会社 BACKUP SYSTEM, SERVER DEVICE, BACKUP METHOD USED FOR THEM, AND PROGRAM THEREOF
US8180747B2 (en) 2007-11-12 2012-05-15 F5 Networks, Inc. Load sharing cluster file systems
US7779074B2 (en) * 2007-11-19 2010-08-17 Red Hat, Inc. Dynamic data partitioning of data across a cluster in a distributed-tree structure
CN104123239B (en) * 2008-01-31 2017-07-21 甲骨文国际公司 system and method for transactional cache
JP5508798B2 (en) 2009-09-29 2014-06-04 株式会社日立製作所 Management method and system for managing replication in consideration of clusters
US8812799B2 (en) 2009-12-11 2014-08-19 International Business Machines Corporation Cluster families for cluster selection and cooperative replication

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718361B1 (en) * 2000-04-07 2004-04-06 Network Appliance Inc. Method and apparatus for reliable and scalable distribution of data files in distributed networks
US20030221074A1 (en) * 2002-05-24 2003-11-27 Ai Satoyama Computer system and a method of replication
CN1780420A (en) * 2004-11-18 2006-05-31 华为技术有限公司 A Method for Determining Priority in Cluster User Cluster Group
US7461130B1 (en) * 2004-11-24 2008-12-02 Sun Microsystems, Inc. Method and apparatus for self-organizing node groups on a network
CN101005372A (en) * 2006-01-19 2007-07-25 思华科技(上海)有限公司 Cluster cache service system and its realizing method
CN101355476A (en) * 2008-05-23 2009-01-28 林云帆 System and method for storing, distributing and applying data files based on server cluster

Also Published As

Publication number Publication date
DE112010003837T5 (en) 2012-11-08
US9684472B2 (en) 2017-06-20
GB2488248B (en) 2015-07-01
CN102652423A (en) 2012-08-29
US9250825B2 (en) 2016-02-02
GB2488248A (en) 2012-08-22
GB201203109D0 (en) 2012-04-04
WO2011069783A1 (en) 2011-06-16
US10073641B2 (en) 2018-09-11
US20170235508A1 (en) 2017-08-17
JP2013513839A (en) 2013-04-22
US20140344540A1 (en) 2014-11-20
US8521975B2 (en) 2013-08-27
US20160103616A1 (en) 2016-04-14
US8812799B2 (en) 2014-08-19
JP5695660B2 (en) 2015-04-08
US20110145497A1 (en) 2011-06-16
US20120290805A1 (en) 2012-11-15

Similar Documents

Publication Publication Date Title
CN102652423B (en) Method and system for cluster selection and cooperative replication
US10146453B2 (en) Data migration using multi-storage volume swap
JP6009097B2 (en) Separation of content and metadata in a distributed object storage ecosystem
US8010485B1 (en) Background movement of data between nodes in a storage cluster
CN115668172A (en) Managing host mapping of replication endpoints
US7774094B2 (en) Selecting a source cluster by measuring system factors, calculating a mount-to-dismount lifespan, and selecting the source cluster in response to the lifespan and a user policy
US20160162371A1 (en) Supporting multi-tenancy through service catalog
CN104025058B (en) Content choice for storage of hierarchically
US9606873B2 (en) Apparatus, system and method for temporary copy policy
US9760457B2 (en) System, method and computer program product for recovering stub files
CN116368458A (en) Data path virtualization
US11972266B2 (en) Hibernating and resuming nodes of a computing cluster
JP6160296B2 (en) Storage control device, storage system, and control program
US10747635B1 (en) Establishing quorums on an object-by-object basis within a management system
JP6227771B2 (en) System and method for managing logical volumes
US20120203964A1 (en) Selecting a virtual tape server in a storage system to provide data copy while minimizing system job load
WO2024131184A1 (en) Disaster recovery method for database, and related device
US9942098B2 (en) Appliance node clusters
US12099719B2 (en) Cluster management in large-scale storage systems
CN114442942A (en) Data migration method, system, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant