CN102110154B - File redundancy storage method in cluster file system - Google Patents
File redundancy storage method in cluster file system Download PDFInfo
- Publication number
- CN102110154B CN102110154B CN 201110042143 CN201110042143A CN102110154B CN 102110154 B CN102110154 B CN 102110154B CN 201110042143 CN201110042143 CN 201110042143 CN 201110042143 A CN201110042143 A CN 201110042143A CN 102110154 B CN102110154 B CN 102110154B
- Authority
- CN
- China
- Prior art keywords
- data
- state
- data slice
- file
- redundant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000011084 recovery Methods 0.000 claims abstract description 21
- 238000012423 maintenance Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims 3
- 230000004048 modification Effects 0.000 claims 3
- 238000000151 deposition Methods 0.000 claims 2
- 238000004364 calculation method Methods 0.000 abstract description 17
- 238000012795 verification Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 16
- 238000007726 management method Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000000593 degrading effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种集群文件系统中文件冗余存储方法,在保证数据可靠性和可用性的前提下,数据可以在不同存储状态中动态切换。在系统存储资源紧张时,数据可以从镜像方式转化为冗余校验方式存储,减少了冗余空间占用,提高空间利用率。同时数据以镜像方式进行写更新,对于客户端的写请求不必考虑如何进行冗余校验方式的冗余计算及冗余更新问题,新写的数据块镜像到冗余存储管理设备上,写性能相比冗余校验方式得到大幅度提高。另外当存储节点出现故障时可以用镜像和冗余校验结合的方式来进行恢复操作来减少计算量,并改善数据恢复的效率。
The invention provides a file redundancy storage method in a cluster file system. On the premise of ensuring data reliability and availability, data can be dynamically switched in different storage states. When the system storage resources are tight, the data can be stored in the mirroring mode to the redundant verification mode, which reduces the redundant space occupation and improves the space utilization rate. At the same time, the data is written and updated in the form of mirroring. For the write request of the client, it is not necessary to consider how to perform redundant calculation and redundant update in the redundancy check mode. The newly written data block is mirrored to the redundant storage management device, and the writing performance is relatively high. Compared with the redundant check method, it has been greatly improved. In addition, when a storage node fails, a combination of mirroring and redundancy check can be used to perform recovery operations to reduce the amount of calculation and improve the efficiency of data recovery.
Description
技术领域 technical field
本发明涉及共享存储系统的数据的安全保障机制,特别涉及集群系统中文件的冗余存储方法。The invention relates to a data security guarantee mechanism of a shared storage system, in particular to a redundant storage method for files in a cluster system.
背景技术 Background technique
在以数据为中心的信息时代,如何妥善有效地保护数据是存储系统的核心问题之一。人们可以忍受计算机异常宕机、所有应用程序重新启动甚至硬件损坏,但是他们要求信息永远不会丢失。存储系统最重要的任务是不论发生什么故障,都要保证存储的信息不能丢失,并且尽力不间断地提供高质量的数据服务。数据信息的毁坏和丢失不但影响到企业的业务连续性,甚至极大地威胁到一个机构的生存。In the data-centric information age, how to properly and effectively protect data is one of the core issues of storage systems. People can tolerate abnormal computer downtime, all application restarts, and even hardware damage, but they demand that information should never be lost. The most important task of the storage system is to ensure that the stored information will not be lost no matter what failure occurs, and try to provide high-quality data services without interruption. The destruction and loss of data information not only affects the business continuity of the enterprise, but even greatly threatens the survival of an organization.
为了保证存储在磁盘中的数据的安全性,本领域的技术人员提出了独立冗余磁盘阵列(RAID)技术,该技术将多个磁盘组合成一个磁盘阵列,并在各个磁盘中存储其它磁盘的冗余信息,使得当阵列中的某个磁盘发生故障后,可以根据阵列中其它磁盘所存储的冗余信息恢复故障磁盘上的数据。RAID根据实现原理可分为不同的级别,分别用RAID0-RAID7表示。不同级别的RAID系统的工作模式存在较大的差异,比较有代表性的是采用镜像方式的RAID1和采用冗余校验方式的RAID5。In order to ensure the security of the data stored in the disks, those skilled in the art have proposed Redundant Array of Independent Disks (RAID) technology, which combines multiple disks into a disk array, and stores the information of other disks in each disk. Redundant information, so that when a certain disk in the array fails, the data on the failed disk can be restored according to the redundant information stored in other disks in the array. According to the realization principle, RAID can be divided into different levels, represented by RAID0-RAID7 respectively. There are great differences in the working modes of different levels of RAID systems. The more representative ones are RAID1 using mirroring mode and RAID5 using redundancy check mode.
在网络存储系统中同样可以采用类似上述的RAID的技术,多个存储节点组成一个网络RAID系统。但如果采用类似RAID1的镜像方式,尽管读写性能高,但是空间利用率只有50%,整个系统的性价比反而较低。如果采用类似RAID5的冗余校验方式,增加了空间利用率,但是写性能尤其是小写性能较低。A technology similar to the aforementioned RAID can also be used in a network storage system, and multiple storage nodes form a network RAID system. However, if a mirroring method similar to RAID1 is used, although the read and write performance is high, the space utilization rate is only 50%, and the cost performance of the entire system is relatively low. If a redundancy check method similar to RAID5 is used, the space utilization rate is increased, but the write performance, especially the lower write performance, is low.
如何为高性能、大容量的基于网络存储的集群文件系统,提供高效的数据高可靠和高可用保障机制,即保持优良的性能并且减少冗余空间的损耗,这个问题已经成为一个研究热点。目前集群存储厂商Panasas公司的并行文件系统PanFS实现了文件级冗余存储机制,小文件使用类似RAID1的镜像方式存放,而大文件采用类似RAID5的冗余校验方式存放,但PanFS依然无法解决上述传统镜像和冗余校验技术所不能解决的问题。How to provide an efficient data high reliability and high availability guarantee mechanism for a high-performance, large-capacity network-based storage cluster file system, that is, to maintain excellent performance and reduce the loss of redundant space, has become a research hotspot. At present, PanFS, a parallel file system of cluster storage manufacturer Panasas, implements a file-level redundant storage mechanism. Small files are stored in a mirroring mode similar to RAID1, while large files are stored in a redundancy check mode similar to RAID5, but PanFS still cannot solve the above problems. Problems that cannot be solved by traditional mirroring and redundancy check technologies.
发明内容 Contents of the invention
本发明的目的在于克服上述现有技术的缺陷,为高性能、大容量的基于网络存储的集群文件系统,提供高效的数据高可靠和高可用保障机制,在保持优良的性能的同时减少冗余空间的损耗。The purpose of the present invention is to overcome the defects of the above-mentioned prior art, provide an efficient data high reliability and high availability guarantee mechanism for a high-performance, large-capacity network-based storage cluster file system, and reduce redundancy while maintaining excellent performance loss of space.
本发明的目的是通过以下技术方案实现的:The purpose of the present invention is achieved through the following technical solutions:
本发明提供了一种集群文件系统中文件冗余存储方法,其中,在所述文件系统中,文件以数据片的方式存放在网络存储节点,包括以下存储状态:状态1,数据片以镜像方式存放;状态2,数据片以冗余校验方式存放;所述方法包括以下步骤:The present invention provides a file redundancy storage method in a cluster file system, wherein, in the file system, files are stored in network storage nodes in the form of data slices, including the following storage states: State 1, data slices are stored in the form of mirror images Deposit; state 2, the data sheet is stored in a redundancy check mode; the method includes the following steps:
读取处于状态1的文件的各数据片;Read each data piece of the file in state 1;
进行冗余计算,生成冗余校验片;Carry out redundancy calculation and generate redundancy check sheet;
将冗余校验片写入文件;Write the redundancy check piece to the file;
释放文件各镜像数据片的存储空间;Release the storage space of each image data piece of the file;
修改状态1为状态2。Change state 1 to state 2.
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,存储状态2所采用的冗余校验方式是奇偶校验。According to the file redundancy storage method in the cluster file system of the preferred embodiment of the present invention, the redundancy check mode adopted by the storage state 2 is parity check.
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,所述方法的步骤是由文件系统中作为客户端的应用服务器执行的。According to the file redundancy storage method in the cluster file system in the preferred embodiment of the present invention, the steps of the method are executed by an application server serving as a client in the file system.
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,文件的存储状态还包括状态3,其中原版本数据片以冗余校验方式存放,写更新所生成的新版本数据片以镜像方式存放;According to the file redundancy storage method in the cluster file system of the preferred embodiment of the present invention, the storage state of the file also includes
所述方法还包括以下步骤:The method also includes the steps of:
保持被写更新的处于状态2的原版本数据片不变,生成新版本数据片;Keep the original version of the data piece in state 2 that has been written and updated unchanged, and generate a new version of the data piece;
将新版本数据片写入相应的存储节点,同时做镜像备份到另一存储节点;Write the new version of the data piece to the corresponding storage node, and make a mirror backup to another storage node at the same time;
修改状态2为状态3。Change state 2 to
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,还包括以下写更新步骤:The file redundancy storage method in the cluster file system according to the preferred embodiment of the present invention also includes the following write update steps:
对以镜像方式存放的数据片,同时修改存放在两个存储节点的数据片及其镜像;For data slices stored in the mirror mode, modify the data slices and their mirror images stored in the two storage nodes at the same time;
对以冗余校验方式存放的数据片,保持以冗余校验方式存放的原版本数据片不变,生成新版本数据片;将新版本数据片写入相应的存储节点,同时做镜像备份到另一存储节点。For the data slices stored in the redundancy check mode, keep the original version data slices stored in the redundancy check mode unchanged, and generate a new version of the data slices; write the new version of the data slices into the corresponding storage nodes, and do mirror backup at the same time to another storage node.
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,还包括以下步骤:The file redundant storage method in the cluster file system according to the preferred embodiment of the present invention also includes the following steps:
依次遍历处于状态3的文件的镜像数据片,遇到空洞,将原版本数据片填充写入新版本数据片和镜像数据片;Traverse the mirrored data slices of the file in
将原版本数据片替换为新版本数据片;Replace the original version of the data piece with the new version of the data piece;
释放新版本数据片和冗余校验数据片的存储空间;Release the storage space of the new version data piece and the redundancy check data piece;
修改状态3为1。Modify
根据本发明优选实施例的集群文件系统中的文件冗余存储方法,还包括数据片出错时的恢复步骤:The file redundancy storage method in the cluster file system according to the preferred embodiment of the present invention also includes a recovery step when the data piece is in error:
状态1时,用对应节点上的数据片镜像恢复出错数据片;In state 1, use the data slice image on the corresponding node to restore the error data slice;
状态2时,进行冗余计算恢复出错的数据片;In state 2, redundant calculations are performed to restore erroneous data slices;
状态3时,如果出错的是原版本数据片,则进行冗余计算恢复出错数据片;如果出错的是新版本数据片或者其镜像数据片,则用对应节点上的数据片镜像恢复出错数据片。In
与现有技术相比,根据本发明具体实施例提供的集群文件系统中的文件冗余存储方法,数据可以在不同存储状态中动态切换。对于客户端的写操作,数据以镜像方式进行写更新,对于客户端的写请求不必考虑如何进行冗余计算及冗余更新问题,新写的数据块镜像到冗余存储管理设备上。避免了冗余校验方式中小写更新导致的写性能低下问题,写性能相比现有冗余校验方式得到大幅度提高。同时,非写活跃性质的数据在系统存储资源紧张时,可以转化为冗余校验方式存储,在保证数据可靠性和可用性的前提下,减少了冗余空间占用,提高空间利用率。另外当存储节点出现故障时可以用镜像方式和冗余校验结合的方式来减少计算量,改善数据恢复的效率,保障了数据的可靠性和可用性。Compared with the prior art, according to the file redundancy storage method in the cluster file system provided by the specific embodiment of the present invention, data can be dynamically switched in different storage states. For the write operation of the client, the data is written and updated by mirroring. For the write request of the client, there is no need to consider how to perform redundant calculation and redundant update. The newly written data block is mirrored to the redundant storage management device. The problem of low write performance caused by lowercase update in the redundancy check mode is avoided, and the write performance is greatly improved compared with the existing redundancy check mode. At the same time, when the system storage resources are tight, the non-write-active data can be converted into redundant verification storage. On the premise of ensuring data reliability and availability, the redundant space occupation is reduced and the space utilization rate is improved. In addition, when a storage node fails, the combination of mirroring and redundancy checks can be used to reduce the amount of calculation, improve the efficiency of data recovery, and ensure data reliability and availability.
附图说明 Description of drawings
以下参照附图对本发明实施例作进一步说明,其中:Embodiments of the present invention will be further described below with reference to the accompanying drawings, wherein:
图1为根据本发明实施例的各存储节点没有写入数据的初始状态的示意图;FIG. 1 is a schematic diagram of an initial state in which no data is written to each storage node according to an embodiment of the present invention;
图2为根据本发明实施例的数据片以镜像方式存放的状态1的示意图;2 is a schematic diagram of a state 1 in which data slices are stored in a mirror image according to an embodiment of the present invention;
图3为根据本发明实施例的数据片以冗余校验方式存放的状态2的示意图;3 is a schematic diagram of a state 2 in which data slices are stored in a redundancy check mode according to an embodiment of the present invention;
图4为根据本发明实施例的原版本数据片以冗余校验方式存放,写更新所生成的新版本数据片以镜像方式冗余存放的状态3的示意图;4 is a schematic diagram of a
图5为根据本发明实施例的状态1下写更新数据片的示意图;5 is a schematic diagram of writing and updating data slices in state 1 according to an embodiment of the present invention;
图6为根据本发明实施例的状态3下新版本数据片被再次写更新后的示意图;6 is a schematic diagram of a new version of the data piece being written and updated again in
图7为根据本发明实施例的状态3下原版本数据片被写更新的示意图;7 is a schematic diagram of the original version data piece being written and updated in
图8为根据本发明实施例的状态2下冗余校验片出错的示意图;8 is a schematic diagram of an error in a redundancy check slice in state 2 according to an embodiment of the present invention;
图9为根据本发明实施例的状态2下数据片出错的示意图;FIG. 9 is a schematic diagram of a data slice error in state 2 according to an embodiment of the present invention;
图10为根据本发明实施例的状态3下冗余校验片出错的示意图;10 is a schematic diagram of an error in a redundancy check slice in
图11为根据本发明实施例的状态3下原版本数据片出错的示意图;Fig. 11 is a schematic diagram of an error in the original version of the data piece in
图12为根据本发明实施例的状态3下新版本数据片的镜像出错的示意图;Fig. 12 is a schematic diagram of an error in mirroring of a new version data piece in
图13为根据本发明实施例的状态3下原版本数据片及其新版本的数据片都出错的示意图。Fig. 13 is a schematic diagram of errors in both the original version data slice and the new version data slice in
具体实施方式 Detailed ways
为了使本发明的目的,技术方案及优点更加清楚明白,以下结合附图通过具体实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
在本发明的实施例中所述文件系统包括作为应用服务器,元数据服务器以及多个存储节点,其中应用服务器作为文件系统的客户端执行文件的读写操作。图1为根据本发明的一个实施例的初始状态,各存储节点(SN)没有写入数据。在本发明的实施例中初次保存文件时,文件以数据片的方式分别存放在相应网络存储节点,同时将各存储节点数据片作镜像备份至其余节点,形成镜像方式冗余存放,提高数据可靠性。此时文件的存储状态称为状态1,如图2所示。图2为根据本发明实施例的状态1,数据片以镜像方式存放。在本实施例中仅以3个存储节点和3个镜像节点为示例,但在其他实施例中存储节点的数目不受限制,可以为任意数目。In the embodiment of the present invention, the file system includes an application server, a metadata server, and multiple storage nodes, wherein the application server, as a client of the file system, executes file read and write operations. FIG. 1 is an initial state according to an embodiment of the present invention, each storage node (SN) has no data written. In the embodiment of the present invention, when the file is saved for the first time, the file is stored in the corresponding network storage nodes in the form of data slices, and at the same time, the data slices of each storage node are mirrored and backed up to other nodes to form redundant storage in the mirror mode, which improves data reliability. sex. The storage state of the file at this time is called state 1, as shown in FIG. 2 . FIG. 2 is state 1 according to an embodiment of the present invention, where data slices are stored in a mirror image. In this embodiment, only 3 storage nodes and 3 mirror nodes are used as an example, but in other embodiments, the number of storage nodes is not limited and can be any number.
在本发明的实施例中,所述文件系统中文件的存储状态还包括如图3所示的状态2,数据片以冗余校验方式存放。在本实施例中冗余校验方式采用的是类似RAID5的奇偶校验,在其他实施例中冗余校验方式也可以采用如海明码校验等其他为本领域技术人员所知的冗余校验方式。In an embodiment of the present invention, the storage state of the file in the file system further includes state 2 as shown in FIG. 3 , and the data slices are stored in a redundancy check mode. In the present embodiment, the redundancy check mode adopts a parity check similar to RAID5, and in other embodiments, the redundancy check mode can also adopt other redundancy known to those skilled in the art such as Hamming code check. verification method.
在本发明的实施例中,上述的文件存储状态1可以通过以下步骤转换为状态2,转换步骤包括:In an embodiment of the present invention, the above-mentioned file storage status 1 can be converted to status 2 through the following steps, and the conversion steps include:
读取处于状态1的文件的各数据片;Read each data piece of the file in state 1;
进行冗余计算,生成冗余校验片;Carry out redundancy calculation and generate redundancy check sheet;
将冗余校验片写入文件;Write the redundancy check piece to the file;
释放文件各镜像数据片的存储空间;Release the storage space of each image data piece of the file;
修改状态1为状态2。Change state 1 to state 2.
以上步骤也可以称为降级操作。当大量文件处于状态1以镜像方式保存,文件空间冗余达到100%,此时如果系统剩余存储资源较为紧张,可以对文件进行降级操作,将状态1转化为状态2,以冗余校验保存文件,在不降低系统可靠性的基础上释放存储资源,减少空间占用。The above steps can also be referred to as a downgrade operation. When a large number of files are saved in state 1 and mirrored, the redundancy of the file space reaches 100%. At this time, if the remaining storage resources of the system are relatively tight, the file can be downgraded to convert state 1 to state 2 and saved with redundancy check Files, freeing storage resources and reducing space occupation without reducing system reliability.
在本发明实施例中降级操作由文件系统中作为客户端的应用服务器执行,这样可以避免产生集中式管理的单个控制节点性能瓶颈。此降级操作可以由后台降级守护进程(DegradeDeamon)调用,在资源紧张时进行空间压缩和资源释放。在一些实施例中,降级操作也可以由存储管理服务器执行。在另一些实施例中,降级步骤可以由文件系统中的服务器端执行。In the embodiment of the present invention, the downgrade operation is performed by the application server serving as the client in the file system, which can avoid the performance bottleneck of a single control node for centralized management. This degrading operation can be called by the background degrading daemon process (DegradeDeamon) to perform space compression and resource release when resources are tight. In some embodiments, the demotion operation may also be performed by the storage management server. In some other embodiments, the step of downgrading can be performed by the server in the file system.
在本发明实施例中所述文件系统中文件的存储状态还包括如图4所示的状态3,其中原版本数据片以冗余校验方式存放,写更新所生成的新版本数据片以镜像方式存放。当对处于状态2的文件进行写更新时文件的存储状态从状态2转换到状态3,包括以下步骤:In the embodiment of the present invention, the storage state of files in the file system also includes
保持被写更新的处于状态2的原版本数据片不变,生成新版本数据片;Keep the original version of the data piece in state 2 that has been written and updated unchanged, and generate a new version of the data piece;
将新版本数据片写入相应的存储节点,同时做镜像备份到另一存储节点;Write the new version of the data piece to the corresponding storage node, and make a mirror backup to another storage node at the same time;
修改状态2为状态3。Change state 2 to
以上步骤也可以统称为对状态2的写操作。如图4所示,对处于状态2的文件进行写更新,以写更新数据片D10为例,保持原来的旧片段D10不变,并生成新片段D11,同时将新片段D11的镜像D11′保存至另一个存储节点,实现镜像备份,以保证数据的可靠性,此时文件的存储状态称为状态3。The above steps can also be collectively referred to as a write operation to state 2 . As shown in Figure 4, the file in state 2 is written and updated. Taking the write update data piece D 10 as an example, the original old piece D 10 is kept unchanged, and a new piece D 11 is generated. At the same time, the new piece D 11 is The mirror image D 11 ′ is saved to another storage node to implement mirror backup to ensure data reliability. At this time, the storage state of the file is called
在本发明的实施例中对文件进行写更新时需要根据数据片的不同存放方式执行不同的操作,包括以下步骤:In the embodiment of the present invention, when writing and updating files, different operations need to be performed according to different storage methods of data slices, including the following steps:
对以镜像方式存放的数据片,同时修改存放在两个存储节点的数据片及其镜像;For data slices stored in the mirror mode, modify the data slices and their mirror images stored in the two storage nodes at the same time;
对以冗余校验方式存放的数据片,保持以冗余校验方式存放的原版本数据片不变,生成新版本数据片;将新版本数据片写入相应的存储节点,同时做镜像备份到另一存储节点。For the data slices stored in the redundancy check mode, keep the original version data slices stored in the redundancy check mode unchanged, and generate a new version of the data slices; write the new version of the data slices into the corresponding storage nodes, and do mirror backup at the same time to another storage node.
以上步骤可以统称为写更新操作。图5为在本发明实施例中的状态1下写更新数据片的示意图,状态1时,数据片以镜像方式存放,对其进行写更新时,需要同时修改以镜像方式保存在两个存储节点的数据片及其镜像,修改可以采用同步模式或异步模式。以写更新数据片D1为例,更新D1为D12,同时将另一个存储节点的镜像D1′更新为D12′。此时数据存储状态仍为状态1,以上步骤也可以称为对状态1的写更新操作。The above steps may be collectively referred to as a write update operation. Figure 5 is a schematic diagram of writing and updating data slices in state 1 in the embodiment of the present invention. In state 1, the data slices are stored in a mirror image. When writing and updating them, they need to be modified and stored in two storage nodes in a mirror image. The data slice and its mirror image can be modified in synchronous mode or asynchronous mode. Taking writing and updating data slice D 1 as an example, update D 1 to D 12 , and at the same time update the image D 1 ′ of another storage node to D 12 ′. At this time, the data storage state is still state 1, and the above steps can also be referred to as a write update operation on state 1.
对处于状态3的文件进行写更新时,要区分数据片的存储方式,原版本数据片以冗余校验方式存放,而写更新所生成的新版本数据片以镜像方式存放。图6为根据本发明实施例的状态3下新版本数据片被写更新后的示意图,其中新版本数据片以镜像方式存放,对图4所示的处于状态3的文件进行写更新,以写更新D11为例,保持原来的以冗余校验方式存放的原版本数据片D10不变,更新D11为D12,同时将D11在另一个存储节点的镜像D11′更新为D12′,实现镜像备份,此时存储状态也为状态3。When writing and updating a file in
图7为根据本发明实施例的状态3下原版本数据片被写更新的示意图;其中原版本数据片以冗余校验方式存放,对如图6所示处于状态3的文件写更新,以写更新D30为例,保持原版本数据片D30不变,并生成新版本数据片D31,同时将新版本数据片D31的镜像D31′保存至另一个存储节点,实现镜像备份,以保证数据的可靠性。此时存储状态也为状态3。Fig. 7 is a schematic diagram of the original version data piece being written and updated under
读写操作read and write operations
在本发明实施例中,应用服务器作为客户端进行写读操作。In the embodiment of the present invention, the application server performs write and read operations as a client.
客户端进行读操作,不会影响文件系统一致性状态,不会影响快照的创建和维护,因此按正常的流程即可,如步骤(1),(2),(3)所示:The client's read operation will not affect the consistency state of the file system, and will not affect the creation and maintenance of snapshots, so follow the normal process, as shown in steps (1), (2), and (3):
(1)客户端向服务端请求文件布局关系信息layout(即文件逻辑地址到物理地址映射);(1) The client requests the file layout relationship information layout from the server (that is, the file logical address to physical address mapping);
(2)服务端查询相关元数据,返回相对应的layout;(2) The server queries relevant metadata and returns the corresponding layout;
(3)客户端通过返回的layout读存储设备上的数据。(3) The client reads the data on the storage device through the returned layout.
应用服务器作为客户端进行写操作,如步骤(4),(5),(6)所示:The application server performs write operations as a client, as shown in steps (4), (5), and (6):
(4)客户端向服务端请求layout信息;(4) The client requests layout information from the server;
(5)服务端查询相关元数据,返回对应的读写layout,并且预留相应的资源。客户端根据返回的layout的进行设备读写,根据只读layout读出存储设备的内容,修改的内容写入可写layout对应存储设备中;(5) The server queries relevant metadata, returns the corresponding read-write layout, and reserves corresponding resources. The client reads and writes the device according to the returned layout, reads the content of the storage device according to the read-only layout, and writes the modified content into the storage device corresponding to the writable layout;
(6)客户端向layout提交相应的元数据信息,并且把预留资源信息加入到元数据组织中。(6) The client submits the corresponding metadata information to the layout, and adds the reserved resource information to the metadata organization.
对于读请求,客户端通过元数据服务器得到资源映射地址后,从存储节点直接请求数据,性能不受冗余存储影响;对于写请求客户端不必考虑冗余更新问题,只需使用远程镜像方式将新写的数据块镜像到冗余存储管理设备上,写性能相比冗余校验方式得到大幅度提高。在一些实施例中客户端还可采用延迟写和写聚合以及缓存等技术进一步优化写操作的性能。For read requests, after the client obtains the resource mapping address from the metadata server, it directly requests data from the storage node, and the performance is not affected by redundant storage; for write requests, the client does not need to consider redundant update issues, and only needs to use remote mirroring to The newly written data block is mirrored to the redundant storage management device, and the writing performance is greatly improved compared with the redundancy check method. In some embodiments, the client may further optimize the performance of the write operation by adopting techniques such as delayed write, write aggregation, and caching.
在本发明的实施例中文件的存储状态可以通过以下步骤从状态3转换到状态1:In the embodiment of the present invention, the storage state of the file can be converted from
依次遍历处于状态3的文件的镜像数据片,遇到空洞,将原版本数据片填充写入新版本数据片和镜像数据片;Traverse the mirrored data slices of the file in
将原版本数据片替换为新版本数据片;Replace the original version of the data piece with the new version of the data piece;
释放新版本数据片和冗余校验数据片的存储空间;Release the storage space of the new version data piece and the redundancy check data piece;
修改状态3为1。Modify
以上步骤也可以称为升级操作。在本实施例中由应用服务器作为客户端进行如下操作:依次遍历处于状态3的文件的镜像数据片block′,遇到空洞,则说明该数据片尚未写入有效数据,说明原版本数据片block0没有被修改,自然也没有生成新版本数据片blocki;此时将原版本数据片填充写入新版本数据片blocki和镜像数据片block′;The above steps may also be referred to as an upgrade operation. In this embodiment, the application server acts as the client to perform the following operations: sequentially traverse the image data block' of the file in
将原版本数据片block0替换为新版本数据片blocki,因为此时需要将文件所处存储状态3转化为状态1,只保留数据片及其镜像;Replace the original version of the data piece block 0 with the new version of the data piece block i , because at this time, the
将新版数据片blocki、冗余校验数据片P的相关信息进行删除、清空和回收;同时修改状态3为1;Delete, clear and recycle the relevant information of the new version of the data piece block i and the redundancy check data piece P; at the same time, modify the
依次对处于状态3的文件执行升级操作就可以完成从状态3到状态1的转换。The transition from
在本实施例中,上述降级操作,对状态2的写操作,写更新操作以及升级操作都是由文件系统中作为客户端的应用服务器执行的。在其他一些的实施例中,上述操作也可以由存储管理服务器或者文件系统中的服务器执行。In this embodiment, the above-mentioned downgrade operation, write operation to state 2, write update operation, and upgrade operation are all performed by the application server serving as the client in the file system. In some other embodiments, the above operations may also be performed by a storage management server or a server in a file system.
在本发明的一些实施例中为集群文件系统提供了一种文件存储状态动态转换的方法,其中所述文件系统中的文件包括以下存储状态:In some embodiments of the present invention, a method for dynamically converting a file storage state is provided for a cluster file system, wherein the files in the file system include the following storage states:
状态1,数据片以镜像方式存放;State 1, data slices are stored in a mirror image;
状态2,数据片以冗余校验方式存放;State 2, data slices are stored in redundancy check mode;
状态3,原版本数据片以冗余校验方式存放,写更新所生成的新版本数据片以镜像方式存放;
所述的动态转换方法包括以下步骤:Described dynamic conversion method comprises the following steps:
状态1经过写更新操作进入状态1;State 1 enters state 1 after a write update operation;
状态1经过降级操作进入状态2;State 1 enters state 2 after a downgrade operation;
状态2经过对状态2的写操作进入状态3;State 2 enters
状态3经过写更新操作进入状态3;
状态3经过升级操作进入状态1。
当状态为1时,降级操作将原本以镜像方式保存的数据片改为以冗余校验方式保存,转换到状态2。此操作通常发生在系统存储空间资源紧张时,可压缩数据,释放存储空间。When the state is 1, the downgrade operation changes the data slices originally stored in the mirror mode to the redundancy check mode, and transitions to state 2. This operation usually occurs when system storage space resources are tight, and data can be compressed to free up storage space.
状态2经过对状态2的写操作,转换到状态3,在状态3保持原有以冗余方式存储的数据片不变,需要进行写更新操作时直接生成新版本数据片并对其做镜像保存,这样所有的写更新操作都不必进行冗余计算和冗余更新,从而提高了写性能。State 2 is converted to
当文件的存储状态为3时,如果全部或绝大多数数据片已经执行了写更新操作,则全部或绝大多数数据片Dij都已经在另一存储节点保存镜像Dij′,事实上已经在以镜像方式存放数据。此时可执行升级操作。操作完成后,存储状态恢复至状态1。When the storage state of the file is 3, if all or most of the data slices have performed the write update operation, then all or most of the data slices D ij have already stored the image D ij ' in another storage node, in fact Data is stored in a mirrored manner. The upgrade operation can be performed at this time. After the operation is complete, the storage state reverts to state 1.
当某存储节点(SN)由于节点故障(如电源故障、操作系统出错等)或网络故障(如网络连接中断等)等原因停止服务,造成数据丢失后,需要将故障设备修复或替换。完成修复或替换后,客户端需要采用恢复操作恢复丢失数据,保证数据的可靠性。在本发明的实施例中在数据片出错时采用以下步骤:When a storage node (SN) stops serving due to node failure (such as power failure, operating system error, etc.) or network failure (such as network connection interruption, etc.), resulting in data loss, the faulty device needs to be repaired or replaced. After the repair or replacement is completed, the client needs to use the recovery operation to restore the lost data to ensure the reliability of the data. In an embodiment of the present invention, the following steps are adopted when the data sheet is in error:
状态1时,用对应节点上的数据片镜像恢复出错数据片;In state 1, use the data slice image on the corresponding node to restore the error data slice;
状态2时,进行冗余计算恢复出错的数据片;In state 2, redundant calculations are performed to restore erroneous data slices;
状态3时,如果出错的是原版本数据片,则进行冗余计算恢复出错数据片;如果出错的是新版本数据片或者其镜像数据片,则用对应节点上的数据片镜像恢复出错数据片。In
上述步骤也可称为恢复操作。下面参考附图8,9,10,11,12,13对恢复操作进行具体说明。The above steps may also be referred to as recovery operations. The restoration operation will be specifically described below with reference to accompanying
在本发明实施例中,文件处于状态1时,对小文件而言,可能只有1个数据分片;对大文件而言,可能有若干数据片分布在不同节点,每个数据片都在另一个节点存有镜像备份。当文件处于状态1时执行恢复操作只需要将故障节点的出错数据片用对应节点上的镜像恢复即可。In the embodiment of the present invention, when the file is in state 1, for a small file, there may be only one data slice; for a large file, there may be several data slices distributed on different nodes, each A node stores mirrored copies. When the file is in state 1, the recovery operation only needs to restore the error data piece of the faulty node with the mirror image on the corresponding node.
在本发明实施例中,状态2下出错数据片分类及恢复操作如下:In the embodiment of the present invention, the error data slice classification and recovery operations under state 2 are as follows:
如图8所示,状态2下,冗余校验片P出错时的恢复操作,需要读取文件的他数据片D1、D2和D3,作冗余计算,生成新冗余校验片P。如图9所示,状态2下,数据片D3出错时的恢复操作,需要读取冗余组内其他数据片D1、D2以及冗余校验片P,作冗余计算,生成新数据片D3。As shown in Figure 8, in state 2, the recovery operation when the redundancy check piece P fails, it is necessary to read other data pieces D 1 , D 2 and D 3 of the file for redundancy calculation and generate a new redundancy check Piece P. As shown in Figure 9, in state 2, the recovery operation when the data piece D 3 fails requires reading other data pieces D 1 , D 2 and the redundancy check piece P in the redundancy group to perform redundancy calculations and generate a new Data piece D 3 .
在本发明实施例中,状态3下出错数据分类及恢复操作如下:In the embodiment of the present invention, error data classification and recovery operations under
图10所示,状态3下,冗余校验片P0出错时的恢复操作,需要读取以冗余校验方式存放的其他原版本数据片D10、D20和D30,作冗余计算,恢复冗余校验片P0。图11所示,状态3下,原版本数据片D30出错时的恢复操作,需要读取其他原版本数据片D10、D20以及冗余校验片P,作冗余计算,恢复数据片D30。图12所示,状态3下,镜像数据片D11′出错时的恢复操作,通过利用未出错节点的数据片D11,恢复出错节点的镜像数据片D11′。图13所示,状态3下,原版本数据片D10和新版本数据片D11出错时的恢复操作,分为两步:首先需要读取在未出错节点上的文件以冗余校验方式存放的其他原版本数据片,作冗余计算,恢复原版本数据片D10;然后通过在镜像保存的数据片D11′,恢复出错节点上的数据片D11。As shown in Figure 10, in
可见当存储节点出现故障时采用镜像和冗余校验结合的方式也可以减少冗余计算量和冗余更新,改善数据恢复的效率。It can be seen that when a storage node fails, the combination of mirroring and redundancy check can also reduce the amount of redundant calculation and redundant update, and improve the efficiency of data recovery.
在本发明实施例中上述恢复操作是由在文件系统中作为客户端的应用服务器执行的。在其他一些的实施例中,上述操作也可以由存储管理服务器或者文件系统中的服务器执行。In the embodiment of the present invention, the above recovery operation is performed by the application server acting as the client in the file system. In some other embodiments, the above operations may also be performed by a storage management server or a server in a file system.
在本发明的一些实施例中,所述文件系统中的文件的存储状态可以是:状态1,数据片以RAID1方式存放;状态2,数据片以RAID5模式冗余存放;状态3,原版本数据片以RAID5模式冗余方式存放,写更新所生成的新版本数据片以RAID1方式存放。In some embodiments of the present invention, the storage state of the files in the file system can be: state 1, data slices are stored in RAID1 mode; state 2, data slices are redundantly stored in RAID5 mode;
通过上述本发明的具体实施例,为基于网络存储的集群文件系统,提供高效的高可靠和高可用保障机制的方法,利用镜像和冗余校验结合的数据冗余机制保证文件系统的数据可靠性和可用性。根据本发明实施例提供的文件冗余存储方法使数据的存储状态进行动态转化:对于客户端的写操作,数据以镜像方式进行写更新,对于客户端的写请求不必考虑如何进行冗余校验方式中的冗余计算及冗余更新问题,新写的数据块镜像到冗余存储管理设备上,避免了如RAID5等冗余校验方式中小写更新导致的写性能低下问题,写性能得到大幅度提高。同时,非写活跃性质的数据在系统存储资源紧张时,可以转化为冗余校验方式存储,在保证数据可靠性和可用性的前提下,减少了冗余空间占用,提高空间利用率。Through the above-mentioned specific embodiments of the present invention, a method for providing an efficient, highly reliable and highly available guarantee mechanism for a cluster file system based on network storage, using a data redundancy mechanism combining mirroring and redundancy checks to ensure data reliability of the file system and usability. According to the file redundancy storage method provided by the embodiment of the present invention, the storage state of the data is dynamically converted: for the write operation of the client, the data is written and updated in a mirroring manner, and it is not necessary to consider how to perform a redundancy check in the write request of the client The problem of redundant computing and redundant update, the newly written data block is mirrored to the redundant storage management device, avoiding the problem of low write performance caused by lower-case update in redundant verification methods such as RAID5, and the write performance is greatly improved . At the same time, when the system storage resources are tight, the non-write-active data can be converted into redundant verification storage. On the premise of ensuring data reliability and availability, the redundant space occupation is reduced and the space utilization rate is improved.
可见根据本发明实施例的文件冗余存储方法具备良好的空间利用率,并且能够有效减少冗余计算开销,提高系统性能。同时当存储节点出现故障时可以用镜像和冗余校验结合的方式来减少计算量,改善数据恢复的效率,保障了数据的可靠性和可用性。It can be seen that the file redundancy storage method according to the embodiment of the present invention has a good space utilization rate, can effectively reduce redundant computing overhead, and improve system performance. At the same time, when the storage node fails, the combination of mirroring and redundancy check can be used to reduce the amount of calculation, improve the efficiency of data recovery, and ensure the reliability and availability of data.
虽然本发明已经通过优选实施例进行了描述,然而本发明并非局限于这里所描述的实施例,在不脱离本发明范围的情况下还包括所作出的各种改变以及变化。Although the present invention has been described in terms of preferred embodiments, the present invention is not limited to the embodiments described herein, and various changes and changes are included without departing from the scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110042143 CN102110154B (en) | 2011-02-21 | 2011-02-21 | File redundancy storage method in cluster file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110042143 CN102110154B (en) | 2011-02-21 | 2011-02-21 | File redundancy storage method in cluster file system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102110154A CN102110154A (en) | 2011-06-29 |
CN102110154B true CN102110154B (en) | 2012-12-26 |
Family
ID=44174315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110042143 Expired - Fee Related CN102110154B (en) | 2011-02-21 | 2011-02-21 | File redundancy storage method in cluster file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102110154B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750195A (en) * | 2012-06-07 | 2012-10-24 | 浪潮电子信息产业股份有限公司 | Method for cluster file system data fault tolerance |
CN102857554B (en) * | 2012-07-26 | 2016-07-06 | 福建网龙计算机网络信息技术有限公司 | Data redundancy processing method is carried out based on distributed memory system |
CN103761268B (en) * | 2014-01-06 | 2017-12-01 | 无锡城市云计算中心有限公司 | A kind of method of Distributed File System Data storage layout |
CN104239417B (en) * | 2014-08-19 | 2017-06-09 | 天津南大通用数据技术股份有限公司 | Dynamic adjusting method and device after a kind of distributed data base data fragmentation |
CN104267913B (en) * | 2014-10-20 | 2017-06-16 | 北京北亚宸星科技有限公司 | It is a kind of can dynamic asynchronous adjust RAID storage method and storage system |
CN105550367A (en) * | 2015-06-30 | 2016-05-04 | 巫立斌 | Asynchronous remote copy method for memory |
CN106383665B (en) * | 2016-09-05 | 2018-05-11 | 华为技术有限公司 | Date storage method and coordination memory node in data-storage system |
CN106954098B (en) * | 2017-03-22 | 2019-04-09 | 深圳市九洲电器有限公司 | A kind of set top box upgrading data selecting method and system |
CN108959547B (en) * | 2018-07-02 | 2022-02-18 | 上海浪潮云计算服务有限公司 | PV snapshot distributed database cluster recovery method |
CN110764692B (en) * | 2019-08-28 | 2022-11-11 | 计算力(江苏)智能技术有限公司 | Method and system for redundancy and recovery of storage slice data in server cluster environment |
CN114924778B (en) * | 2022-04-01 | 2024-04-26 | 北京遥测技术研究所 | Program upgrading method based on ZYNQ redundancy backup |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101681282A (en) * | 2006-12-06 | 2010-03-24 | 弗森多系统公司(dba弗森-艾奥) | Apparatus, system and method for shared, front-end, distributed RAID |
-
2011
- 2011-02-21 CN CN 201110042143 patent/CN102110154B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101681282A (en) * | 2006-12-06 | 2010-03-24 | 弗森多系统公司(dba弗森-艾奥) | Apparatus, system and method for shared, front-end, distributed RAID |
Non-Patent Citations (1)
Title |
---|
JOHN WILKES et al..《The HP AutoRAID Hierarchical Storage System》.《ACM Transactions on Computer System》.1996,第14卷(第1期),111、119-121. * |
Also Published As
Publication number | Publication date |
---|---|
CN102110154A (en) | 2011-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102110154B (en) | File redundancy storage method in cluster file system | |
JP6294518B2 (en) | Synchronous mirroring in non-volatile memory systems | |
US9811285B1 (en) | Dynamic restriping in nonvolatile memory systems | |
CN101291347B (en) | Network storage system | |
US20220137835A1 (en) | Systems and methods for parity-based failure protection for storage devices | |
US8055938B1 (en) | Performance in virtual tape libraries | |
US9990263B1 (en) | Efficient use of spare device(s) associated with a group of devices | |
CN102053802B (en) | Network RAID (redundant array of independent disk) system | |
CN102024044A (en) | Distributed file system | |
US11003558B2 (en) | Systems and methods for sequential resilvering | |
CN102662607A (en) | RAID6 level mixed disk array, and method for accelerating performance and improving reliability | |
JP2016534471A (en) | Recovery of independent data integrity and redundancy driven by targets in shared nothing distributed storage systems | |
CN111858189B (en) | Offline processing of storage disks | |
CN106227464A (en) | A kind of double-deck redundant storage system and data write, reading and restoration methods | |
US10503620B1 (en) | Parity log with delta bitmap | |
CN116204137B (en) | Distributed storage system, control method, device and equipment based on DPU | |
CN103488434A (en) | Method for improving disk array reliability | |
US20200363958A1 (en) | Efficient recovery of resilient spaces | |
WO2008041267A1 (en) | System management program, system management device, and system management method | |
US12079084B2 (en) | Distributed raid for parity-based flash storage devices | |
Shu | Storage Arrays |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121226 |