WO2016180049A1 - Storage management method and distributed file system - Google Patents
Storage management method and distributed file system Download PDFInfo
- Publication number
- WO2016180049A1 WO2016180049A1 PCT/CN2016/071235 CN2016071235W WO2016180049A1 WO 2016180049 A1 WO2016180049 A1 WO 2016180049A1 CN 2016071235 W CN2016071235 W CN 2016071235W WO 2016180049 A1 WO2016180049 A1 WO 2016180049A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- disk
- group
- storage
- health
- disks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- This application relates to, but is not limited to, the field of storage applications for distributed file systems.
- the distributed file system for large-scale application uses the erasure code as the underlying file storage strategy, which can significantly reduce the physical space occupied by files and maintain the same reliability as multiple copies.
- the system scale becomes larger and longer. After the time is running, the disk is damaged or the fault is not a small probability event, but the normal situation that is often faced in daily maintenance.
- the configuration file storage policy original block N the check block M is 6:2, and two fragment copies can be damaged at the same time, and the overall reliability of the system is the same as the 3 copy mode.
- the damage of three disks in one storage system may cause partial data corruption.
- a storage cluster contains hundreds or even thousands of disks, and after a long run, tens of millions of copies are stored on each disk. If the system corrupts more than 2 disks at the same time in a certain period of time, the data content of some files cannot be repaired, resulting in unreadable file contents.
- This paper provides a storage management method and distributed file system to alleviate the existence of distributed file system. At the same time, damage to more than 2 disks will result in unreadable file content.
- a storage management method for a distributed file system comprising: dividing all disks of a distributed file system into multiple disk groups; obtaining serviceability of each disk group according to each disk The serviceability of the packet divides each disk group into a health group set, a sub-health group set, and a bad group set; when data needs to be stored, a disk group is selected from the health group set, and the disk storage data is selected from the selected disk group. .
- dividing all the disks of the distributed file system into multiple disk groups includes: dividing each disk into multiple disk groups according to the slot information in each disk in the storage server to which the disk belongs.
- dividing all the disks into multiple disk groups according to the slot information of each disk in the storage server to which the disk belongs includes: dividing the disks having one or more the same slot information in all storage servers into one disk grouping. .
- obtaining the serviceability of each disk group comprises: obtaining the number of available volumes of each disk group, and determining the serviceability of each disk group according to the storage policy and the number of available volumes of each disk group.
- the method further includes: when the disk group cannot be selected from the health group set, the disk group is selected from the sub-health group set, and if the disk group cannot be selected from the sub-health group set, the data storage fails.
- selecting a disk group from the health group set or the sub-health group set comprises: selecting a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
- a distributed file system comprising: a plurality of disks, and a location register, the location register comprising: a disk grouping module and a selection memory module, wherein the disk grouping module is configured to: all disks of the distributed file system Divided into multiple disk groups; obtain serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to the serviceability of each disk group; the selected storage The module is set to: select the disk group from the health group set when the data needs to be stored, and select the disk storage data from the selected disk group.
- the disk grouping module is configured to divide all disks into multiple disk groups according to slot information in each disk in which the disk belongs.
- the disk grouping module is configured to: divide all disks having one or more same slot information in all storage servers into one disk group.
- the disk grouping module is configured to: obtain the number of available volumes of each disk group, and determine the serviceable of each disk group according to the storage policy and the number of available volumes of each disk grouping. Sex.
- the selection storage module is further configured to: select a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and if the disk group cannot be selected from the sub-health group set, the data storage failure.
- the selecting the storage module is configured to: select a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
- a computer readable storage medium storing computer executable instructions for performing the method of any of the above.
- the storage management method provided by the embodiment of the present invention firstly manages all disk groups and further distinguishes the serviceability status of the disk group.
- the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear.
- the file contents will not be lost, which can improve the overall reliability of the system.
- FIG. 1 is a flowchart of a storage management method according to a first embodiment of the present invention
- FIG. 2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention.
- FIG. 3 is a flowchart of a step of grouping disks in a storage management method according to a third embodiment of the present invention.
- FIG. 4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention.
- FIG. 1 is a flowchart of a storage management method according to a first embodiment of the present invention.
- the storage management method provided by the embodiment of the present invention includes the following steps:
- S102 Acquire serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to serviceability of each disk group;
- dividing all disks of the distributed file system into multiple disk groups in the foregoing embodiment includes: dividing all disks into multiple disk groups according to slot information in each storage server to which the disk belongs. .
- dividing all disks into multiple disk groups according to slot information in each storage server of the disk in the foregoing embodiment includes: having one or more same slot information in all storage servers.
- the disk is divided into a disk group.
- the obtaining the serviceability of each disk group in the above embodiment comprises: obtaining the number of available volumes of each disk group, and determining each disk group according to the storage policy and the number of available disks per disk grouping. Serviceability.
- the above embodiment further includes: selecting a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and failing to select the disk group from the sub-health group set, the data storage fails .
- selecting a disk group from the health group set or the sub-health group set in the above embodiment comprises selecting a disk group according to a ratio of available space of each disk group in the health group set or the sub-health group set.
- FIG. 2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention.
- the distributed file system provided by the embodiment of the present invention includes multiple disks 22, and the disks belong to different disks.
- a storage server and one or more location registers 21 (Fig. 2 only exemplarily shows a location register), wherein
- the location register 21 includes: a disk grouping module 211 and a selection storage module 212,
- the disk grouping module 211 is configured to divide all the disks 22 of the distributed file system into a plurality of magnetic disks. Disk grouping; obtaining the serviceability of each disk group, and classifying each disk group into a health group set, a sub-health group set, and a bad group set according to the serviceability of each disk group;
- the selection storage module 212 is configured to select a disk group from the health group set when the data needs to be stored, and select the disk storage data from the selected disk group.
- the disk grouping module 211 is configured to divide all disks into a plurality of disk groups based on slot information within each disk in which the disk belongs.
- the disk grouping module 211 is configured to divide a disk having one or more of the same slot information in all storage servers into one disk group.
- the disk grouping module 211 is configured to: obtain the number of available volumes for each disk group, and determine the serviceability of each disk group based on the storage policy and the number of available volumes per disk grouping.
- the selection storage module 212 is further configured to: select a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and if the disk group cannot be selected from the sub-health group set, the data Storage failed.
- the selection storage module 212 is configured to select a disk grouping based on a ratio of available space for each disk grouping in the health group set or the sub-health group set.
- FIG. 3 is a flowchart of a disk grouping step in a storage management method according to a third embodiment of the present invention. As shown in FIG. 3, the disk grouping step in the storage management method provided by the embodiment of the present invention includes the following steps:
- S301 Group all disks at a physical level, and divide all disks into multiple disk groups
- the disk of each storage server is divided into multiple disk groups according to the physical slot number s0, s1, ..., sd-1, for example, in the same slot of all storage servers.
- the disks constitute a disk group, the number of disk groups g is equal to the number of slots d; or a plurality of disks of the same slot form a disk group, and the number of disk groups g is smaller than the number of slots d.
- the number of disks and capacity of each disk group can be different. For system read and write performance, you need to ensure that the disk read and write interfaces are balanced.
- the FLR When the system is powered on, the FLR initializes the disk packet data area, and all the disks are respectively added to the disk packet data area according to the grouping; when the disk or the node fails, the corresponding disk grouping information is updated.
- the information of the disk grouping includes approximate real-time statistics such as all disk information, total packet capacity, packet available capacity, packet read/write IO, packet serviceability status, etc., and the data is reported to the file location register FLR for the CHUNK of the file to be allocated. Decision making when copying the location.
- the FLR traverses each disk group, counts the number of available disks for all disks grouped by one disk, and calculates the sum of the volume weights, which are included in the disk grouping.
- the FLR traverses all disk groups and sets the serviceability of the disk group according to certain rules, such as the number of available disks, node failure, network failure, disk abnormality, etc., and adds them to the health group set, the sub-health group set, and the bad group.
- the set; the group serviceability state refers to whether the disk group can be used by the FLR to allocate the CHUNK copy position, and can include three states: a health group, a sub-health group, and a bad group.
- the status of the disk grouping is that the number of available disks is greater than or equal to N+M, and the storage unit node fails to maintain reliability, the unit media surface state is normal, and the like; the disk grouping status is sub-health refers to: the disk suddenly drops.
- Factors such as disk or storage server power loss, resulting in a reduction in the available volume of a packet.
- the packet is considered to be less serviceable, and the disk grouping status is set to the sub-health group; when a packet available volume is less than N, The disk grouping status is set to bad.
- S305 The FLR traverses the disk groups in the above three sets respectively, and maps the disk group according to the number of available volumes.
- FIG. 4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention. As shown in FIG. 4, the data storage step in the storage management method provided by the embodiment of the present invention includes the following steps:
- the FLR After receiving the message, the FLR selects the health group set as the current set.
- the FLR obtains the random number r, and performs mod (modulo) calculation on the set weight of r and the health group to obtain a keyword key;
- step S404 The FLR searches for the target disk group by using the binary search in the current set according to the key; if the target disk group is found, step S405 is performed, otherwise step S407 is performed;
- the FLR maintains the reliability condition according to the disk IO equalization, the node failure, the multi-disk failure, and the like, and selects the copy location;
- step S406 is performed, otherwise step S407 is performed;
- the FLR determines the copy location, writes the data to the corresponding disk, and ends the process.
- the FLR selects the sub-health group set as the current set
- the FLR obtains the random number r, and performs mod calculation on the set weights of the r and the sub-health group to obtain a keyword key;
- step S409 The FLR searches for the target disk group in the current set by the binary search according to the key; if the target disk group is found, step S410 is performed, otherwise step S411 is performed;
- the FLR maintains the reliability condition according to the disk IO balance, the node failure, the multi-disk failure, and the like, and selects the copy location;
- step S406 is performed, otherwise step S411 is performed;
- S411 FLR fails to allocate a disk, fails to write a file, and fails to exit.
- all disks are first grouped and managed, and the serviceability status of the disk group is further distinguished.
- the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear.
- the file contents will not be lost, which can improve the overall reliability of the system.
- all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
- the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
- the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
- the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
- all disks are grouped and managed to distinguish the serviceability status of the disk group.
- the copy position is preferentially allocated from the health group.
- multiple disks greater than M
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请涉及但不限于分布式文件系统的存储应用领域。This application relates to, but is not limited to, the field of storage applications for distributed file systems.
大规模应用的分布式文件系统,使用纠删码作为底层文件存储策略,可以显著的减少文件占用物理空间,并保持和多副本一样的可靠性,而企业应用中,系统规模变大后且长时间运行后,磁盘损坏或者故障已经不是小概率事件,而是日常维护中经常面对的常态。The distributed file system for large-scale application uses the erasure code as the underlying file storage strategy, which can significantly reduce the physical space occupied by files and maintain the same reliability as multiple copies. In enterprise applications, the system scale becomes larger and longer. After the time is running, the disk is damaged or the fault is not a small probability event, but the normal situation that is often faced in daily maintenance.
例如,企业应用一般使用纠删码时,配置文件存储策略原始块N:校验块M为6:2,可以同时损坏2个分片副本,系统整体可靠性同3副本模式一样。结合企业应用现状以及不可避免的存储集群本身软硬件问题,一个存储系统中损坏3块磁盘,可能会导致部分数据损坏。一个存储集群包含数百甚至数千块磁盘,经过长时间运行后,每个磁盘中存储几千万数目的副本。若在某一时间段内,系统同时损坏超过2块磁盘,会因为部分文件的数据块无法修复,导致文件内容不可读。For example, when an enterprise application generally uses an erasure code, the configuration file storage policy original block N: the check block M is 6:2, and two fragment copies can be damaged at the same time, and the overall reliability of the system is the same as the 3 copy mode. Combined with the current situation of enterprise applications and the inevitable hardware and software problems of the storage cluster itself, the damage of three disks in one storage system may cause partial data corruption. A storage cluster contains hundreds or even thousands of disks, and after a long run, tens of millions of copies are stored on each disk. If the system corrupts more than 2 disks at the same time in a certain period of time, the data content of some files cannot be repaired, resulting in unreadable file contents.
因此,如何提供一种可缓解分布式文件系统存在的因同时损坏超过2块磁盘会导致文件内容不可读问题的存储管理方法,是本领域技术人员亟待解决的技术问题。Therefore, how to provide a storage management method that can alleviate the existence of a distributed file system and simultaneously damages more than two disks causes the file content to be unreadable, which is a technical problem to be solved by those skilled in the art.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本文提供了一种存储管理方法及分布式文件系统,以缓解分布式文件系统存在的因同时损坏超过2块磁盘会导致文件内容不可读问题。This paper provides a storage management method and distributed file system to alleviate the existence of distributed file system. At the same time, damage to more than 2 disks will result in unreadable file content.
一种用于分布式文件系统的存储管理方法,包括:将分布式文件系统的所有磁盘分为多个磁盘分组;获取每个磁盘分组的可服务性,根据每个磁盘 分组的可服务性将每个磁盘分组分为健康组集合、亚健康组集合和坏组集合;在需要存储数据时,从健康组集合中选择磁盘分组,从所选择的磁盘分组选择磁盘存储数据。A storage management method for a distributed file system, comprising: dividing all disks of a distributed file system into multiple disk groups; obtaining serviceability of each disk group according to each disk The serviceability of the packet divides each disk group into a health group set, a sub-health group set, and a bad group set; when data needs to be stored, a disk group is selected from the health group set, and the disk storage data is selected from the selected disk group. .
可选的,将分布式文件系统的所有磁盘分为多个磁盘分组包括:根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组。Optionally, dividing all the disks of the distributed file system into multiple disk groups includes: dividing each disk into multiple disk groups according to the slot information in each disk in the storage server to which the disk belongs.
可选的,根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组包括:将所有存储服务器内具备一个或多个相同槽位信息的磁盘分为一个磁盘分组。Optionally, dividing all the disks into multiple disk groups according to the slot information of each disk in the storage server to which the disk belongs includes: dividing the disks having one or more the same slot information in all storage servers into one disk grouping. .
可选的,获取每个磁盘分组的可服务性包括:获取每个磁盘分组的可用卷数目,根据存储策略及每个磁盘分组的可用卷数目确定每个磁盘分组的可服务性。Optionally, obtaining the serviceability of each disk group comprises: obtaining the number of available volumes of each disk group, and determining the serviceability of each disk group according to the storage policy and the number of available volumes of each disk group.
可选的,所述方法还包括:当无法从健康组集合中选择磁盘分组时,从亚健康组集合中选择磁盘分组,若无法从亚健康组集合中选择磁盘分组,则数据存储失败。Optionally, the method further includes: when the disk group cannot be selected from the health group set, the disk group is selected from the sub-health group set, and if the disk group cannot be selected from the sub-health group set, the data storage fails.
可选的,从健康组集合或亚健康组集合中选择磁盘分组包括:根据健康组集合或亚健康组集合中每个磁盘分组的可用空间的比例选择磁盘分组。Optionally, selecting a disk group from the health group set or the sub-health group set comprises: selecting a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
一种分布式文件系统,其包括,多个磁盘,以及位置寄存器,所述位置寄存器包括:磁盘分组模块和选择存储模块,其中,所述磁盘分组模块设置为:将分布式文件系统的所有磁盘分为多个磁盘分组;获取每个磁盘分组的可服务性,根据每个磁盘分组的可服务性将每个磁盘分组分为健康组集合、亚健康组集合和坏组集合;所述选择存储模块设置为:在需要存储数据时,从健康组集合中选择磁盘分组,从所选择的磁盘分组选择磁盘存储数据。A distributed file system comprising: a plurality of disks, and a location register, the location register comprising: a disk grouping module and a selection memory module, wherein the disk grouping module is configured to: all disks of the distributed file system Divided into multiple disk groups; obtain serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to the serviceability of each disk group; the selected storage The module is set to: select the disk group from the health group set when the data needs to be stored, and select the disk storage data from the selected disk group.
可选的,所述磁盘分组模块是设置为:根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组。Optionally, the disk grouping module is configured to divide all disks into multiple disk groups according to slot information in each disk in which the disk belongs.
可选的,所述磁盘分组模块是设置为:将所有存储服务器内具备一个或多个相同槽位信息的磁盘分为一个磁盘分组。Optionally, the disk grouping module is configured to: divide all disks having one or more same slot information in all storage servers into one disk group.
可选的,所述磁盘分组模块是设置为:获取每个磁盘分组的可用卷数目,根据存储策略及每个磁盘分组的可用卷数目确定每个磁盘分组的可服务 性。Optionally, the disk grouping module is configured to: obtain the number of available volumes of each disk group, and determine the serviceable of each disk group according to the storage policy and the number of available volumes of each disk grouping. Sex.
可选的,所述选择存储模块还设置为:当无法从健康组集合中选择磁盘分组时,从亚健康组集合中选择磁盘分组,若无法从亚健康组集合中选择磁盘分组,则数据存储失败。Optionally, the selection storage module is further configured to: select a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and if the disk group cannot be selected from the sub-health group set, the data storage failure.
可选的,所述选择存储模块是设置为:根据健康组集合或亚健康组集合中每个磁盘分组的可用空间的比例选择磁盘分组。Optionally, the selecting the storage module is configured to: select a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任一项的方法。A computer readable storage medium storing computer executable instructions for performing the method of any of the above.
本发明实施例提供的存储管理方法,首先将所有磁盘分组管理,并且进一步区分磁盘分组的可服务性状态,在存储数据时,优先从健康组中分配副本位置,多块磁盘(大于M)出现故障时,只要这些磁盘不属于同一个磁盘分组,就不会造成文件内容丢失,可以提高系统整体可靠性。The storage management method provided by the embodiment of the present invention firstly manages all disk groups and further distinguishes the serviceability status of the disk group. When storing data, the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear. In the event of a failure, as long as the disks do not belong to the same disk group, the file contents will not be lost, which can improve the overall reliability of the system.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1为本发明第一实施例提供的存储管理方法的流程图;1 is a flowchart of a storage management method according to a first embodiment of the present invention;
图2为本发明第二实施例提供的分布式文件系统的示意图;2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention;
图3为本发明第三实施例提供的存储管理方法中磁盘分组步骤的流程图;3 is a flowchart of a step of grouping disks in a storage management method according to a third embodiment of the present invention;
图4为本发明第三实施例提供的存储管理方法中数据存储步骤的流程图。FIG. 4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention.
现结合附图对本发明的实施方式进行诠释说明。The embodiments of the present invention will now be described with reference to the accompanying drawings.
第一实施例:First embodiment:
图1为本发明第一实施例提供的存储管理方法的流程图,由图1可知, 在本实施例中,本发明实施例提供的存储管理方法包括以下步骤:1 is a flowchart of a storage management method according to a first embodiment of the present invention. In this embodiment, the storage management method provided by the embodiment of the present invention includes the following steps:
S101:将分布式文件系统的所有磁盘分为多个磁盘分组;S101: Divide all disks of the distributed file system into multiple disk groups;
S102:获取每个磁盘分组的可服务性,根据每个磁盘分组的可服务性将每个磁盘分组分为健康组集合、亚健康组集合和坏组集合;S102: Acquire serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to serviceability of each disk group;
S103:在需要存储数据时,从健康组集合中选择磁盘分组,从所选择的磁盘分组选择磁盘存储数据。S103: When the data needs to be stored, select a disk group from the health group set, and select a disk storage data from the selected disk group.
在一些实施例中,上述实施例中的将分布式文件系统的所有磁盘分为多个磁盘分组包括:根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组。In some embodiments, dividing all disks of the distributed file system into multiple disk groups in the foregoing embodiment includes: dividing all disks into multiple disk groups according to slot information in each storage server to which the disk belongs. .
在一些实施例中,上述实施例中的根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组包括:将所有存储服务器内具备一个或多个相同槽位信息的磁盘分为一个磁盘分组。In some embodiments, dividing all disks into multiple disk groups according to slot information in each storage server of the disk in the foregoing embodiment includes: having one or more same slot information in all storage servers. The disk is divided into a disk group.
在一些实施例中,上述实施例中的获取每个磁盘分组的可服务性包括:获取每个磁盘分组的可用卷数目,根据存储策略及每个磁盘分组的可用卷数目确定每个磁盘分组的可服务性。In some embodiments, the obtaining the serviceability of each disk group in the above embodiment comprises: obtaining the number of available volumes of each disk group, and determining each disk group according to the storage policy and the number of available disks per disk grouping. Serviceability.
在一些实施例中,上述实施例还包括:当无法从健康组集合中选择磁盘分组时,从亚健康组集合中选择磁盘分组,若无法从亚健康组集合中选择磁盘分组,则数据存储失败。In some embodiments, the above embodiment further includes: selecting a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and failing to select the disk group from the sub-health group set, the data storage fails .
在一些实施例中,上述实施例中的从健康组集合或亚健康组集合中选择磁盘分组包括:根据健康组集合或亚健康组集合中每个磁盘分组的可用空间的比例选择磁盘分组。In some embodiments, selecting a disk group from the health group set or the sub-health group set in the above embodiment comprises selecting a disk group according to a ratio of available space of each disk group in the health group set or the sub-health group set.
第二实施例:Second embodiment:
图2为本发明第二实施例提供的分布式文件系统的示意图,由图2可知,在本实施例中,本发明实施例提供的分布式文件系统包括多个磁盘22,这些磁盘属于不同的存储服务器,以及一个或多个位置寄存器21(图2仅示例性的给出了一个位置寄存器),其中,FIG. 2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention. As shown in FIG. 2, in the embodiment, the distributed file system provided by the embodiment of the present invention includes multiple disks 22, and the disks belong to different disks. a storage server, and one or more location registers 21 (Fig. 2 only exemplarily shows a location register), wherein
位置寄存器21包括:磁盘分组模块211和选择存储模块212,The location register 21 includes: a
磁盘分组模块211设置为:将分布式文件系统的所有磁盘22分为多个磁
盘分组;获取每个磁盘分组的可服务性,根据每个磁盘分组的可服务性将每个磁盘分组分为健康组集合、亚健康组集合和坏组集合;The
选择存储模块212设置为:在需要存储数据时,从健康组集合中选择磁盘分组,从所选择的磁盘分组选择磁盘存储数据。The
在一些实施例中,磁盘分组模块211是设置为:根据每个磁盘在其所属存储服务器内的槽位信息将所有磁盘分为多个磁盘分组。In some embodiments, the
在一些实施例中,磁盘分组模块211是设置为:将所有存储服务器内具备一个或多个相同槽位信息的磁盘分为一个磁盘分组。In some embodiments, the
在一些实施例中,磁盘分组模块211是设置为:获取每个磁盘分组的可用卷数目,根据存储策略及每个磁盘分组的可用卷数目确定每个磁盘分组的可服务性。In some embodiments, the
在一些实施例中,选择存储模块212还设置为:当无法从健康组集合中选择磁盘分组时,从亚健康组集合中选择磁盘分组,若无法从亚健康组集合中选择磁盘分组,则数据存储失败。In some embodiments, the
在一些实施例中,选择存储模块212是设置为:根据健康组集合或亚健康组集合中每个磁盘分组的可用空间的比例选择磁盘分组。In some embodiments, the
现结合应用实例对本发明实施例做进一步的诠释说明。The embodiments of the present invention will be further explained in conjunction with application examples.
第三实施例:Third embodiment:
图3为本发明第三实施例提供的存储管理方法中磁盘分组步骤的流程图,由图3可知,本发明实施例提供的存储管理方法中磁盘分组步骤包括以下步骤:3 is a flowchart of a disk grouping step in a storage management method according to a third embodiment of the present invention. As shown in FIG. 3, the disk grouping step in the storage management method provided by the embodiment of the present invention includes the following steps:
S301:在物理层次对所有磁盘进行分组,将所有磁盘分为多个磁盘分组;S301: Group all disks at a physical level, and divide all disks into multiple disk groups;
在分布式文件系统中,文件存储策略使用纠删码,以冗余比N:M=6:2为例,假设分布式文件系统由n台存储服务器S1,S2,…,Sn(n≥8),且每台存储服务器有d块磁盘,如常见高密度存储服务器有24块或者更多磁盘。In the distributed file system, the file storage policy uses the erasure code, and the redundancy ratio N:M=6:2 is taken as an example. It is assumed that the distributed file system consists of n storage servers S1, S2, ..., Sn (n ≥ 8). ), and each storage server has d-block disks. For example, a common high-density storage server has 24 or more disks.
每台存储服务器的磁盘按照物理槽位编号s0,s1,……,sd-1,这些磁盘按照槽位被划分成多个磁盘分组,例如所有存储服务器中每个相同槽位上 的磁盘组成一个磁盘分组,磁盘分组数g等于槽位数d;或者多个相同槽位的磁盘组成一个磁盘分组,磁盘分组数g小于槽位数d。The disk of each storage server is divided into multiple disk groups according to the physical slot number s0, s1, ..., sd-1, for example, in the same slot of all storage servers. The disks constitute a disk group, the number of disk groups g is equal to the number of slots d; or a plurality of disks of the same slot form a disk group, and the number of disk groups g is smaller than the number of slots d.
每个磁盘分组包含的磁盘数和容量可以不相同,为了系统读写性能,需要保证磁盘读写接口是均衡的。The number of disks and capacity of each disk group can be different. For system read and write performance, you need to ensure that the disk read and write interfaces are balanced.
S302:分布式文件系统上电,位置寄存器FLR初始化数据;S302: The distributed file system is powered on, and the location register FLR initializes the data;
当在系统上电时,FLR初始化磁盘分组数据区,所有磁盘按照所属分组,分别加入到磁盘分组数据区中;当磁盘或节点故障,更新相应的磁盘分组的信息。磁盘分组的信息包含所有磁盘信息、分组总容量、分组可用容量、分组读写IO、分组可服务性状态等近似实时统计信息,这些数据被上报到文件位置寄存器FLR上,供其分配文件的CHUNK副本位置时决策。When the system is powered on, the FLR initializes the disk packet data area, and all the disks are respectively added to the disk packet data area according to the grouping; when the disk or the node fails, the corresponding disk grouping information is updated. The information of the disk grouping includes approximate real-time statistics such as all disk information, total packet capacity, packet available capacity, packet read/write IO, packet serviceability status, etc., and the data is reported to the file location register FLR for the CHUNK of the file to be allocated. Decision making when copying the location.
S303:FLR计算每个磁盘分组的卷权重之和;S303: FLR calculates a sum of volume weights of each disk group;
FLR遍历每一个磁盘分组,对一个磁盘分组的所有磁盘,统计可用卷数目,并计算卷权重之和,计入磁盘分组中。The FLR traverses each disk group, counts the number of available disks for all disks grouped by one disk, and calculates the sum of the volume weights, which are included in the disk grouping.
S304:FLR计算每个磁盘分组的可服务性;S304: FLR calculates serviceability of each disk group;
FLR遍历所有磁盘分组,按照一定规则,如可用卷数目、节点故障、网络故障、磁盘异常等条件,设置磁盘分组的可服务性,将其分别加入到健康组集合、亚健康组集合、坏组集合;其中分组可服务性状态指,磁盘分组是否能够被FLR使用,用来分配CHUNK副本位置,可包含3种状态:健康组、亚健康组、坏组。磁盘分组状态为健康组是指,可用卷数目大于等于N+M,并且满足存储单元节点失效保持可靠性、单元传输媒体面状态正常等条件;磁盘分组状态为亚健康是指:由于磁盘突然掉盘或者存储服务器掉电等因素导致,导致某个分组的可用卷减少。当某个分组可用卷数目/分组内配置卷数目小于阈值且大于等于N时,认为此分组可服务性降低,将磁盘分组状态设置为亚健康组;当某个分组可用卷小于N时,将磁盘分组状态设置为坏组。The FLR traverses all disk groups and sets the serviceability of the disk group according to certain rules, such as the number of available disks, node failure, network failure, disk abnormality, etc., and adds them to the health group set, the sub-health group set, and the bad group. The set; the group serviceability state refers to whether the disk group can be used by the FLR to allocate the CHUNK copy position, and can include three states: a health group, a sub-health group, and a bad group. The status of the disk grouping is that the number of available disks is greater than or equal to N+M, and the storage unit node fails to maintain reliability, the unit media surface state is normal, and the like; the disk grouping status is sub-health refers to: the disk suddenly drops. Factors such as disk or storage server power loss, resulting in a reduction in the available volume of a packet. When the number of available volumes in a packet/the number of configured volumes in the packet is less than the threshold and greater than or equal to N, the packet is considered to be less serviceable, and the disk grouping status is set to the sub-health group; when a packet available volume is less than N, The disk grouping status is set to bad.
S305:FLR分别遍历上述三个集合中的磁盘分组,将磁盘分组根据其可用卷数目进行映射。S305: The FLR traverses the disk groups in the above three sets respectively, and maps the disk group according to the number of available volumes.
图4为本发明第三实施例提供的存储管理方法中数据存储步骤的流程 图,由图4可知,本发明实施例提供的存储管理方法中数据存储步骤包括以下步骤:4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention; As shown in FIG. 4, the data storage step in the storage management method provided by the embodiment of the present invention includes the following steps:
S401:客户端写文件时,分配数据块副本位置消息发送到FLR;S401: when the client writes the file, the allocated data block copy location message is sent to the FLR;
S402:FLR接收到消息后,选择健康组集合为当前集合;S402: After receiving the message, the FLR selects the health group set as the current set.
S403:FLR获取随机数r,将r与健康组集合权重进行mod(取模)计算,得到关键字key;S403: The FLR obtains the random number r, and performs mod (modulo) calculation on the set weight of r and the health group to obtain a keyword key;
S404:FLR根据key通过二分查找在当前集合中,查找目标磁盘分组;若查找到目标磁盘分组,则执行步骤S405,否则执行步骤S407;S404: The FLR searches for the target disk group by using the binary search in the current set according to the key; if the target disk group is found, step S405 is performed, otherwise step S407 is performed;
S405:FLR在目标磁盘分组中,按照磁盘IO均衡、节点失效、多磁盘故障等情形下保持可靠性条件,选择副本位置;S405: In the target disk group, the FLR maintains the reliability condition according to the disk IO equalization, the node failure, the multi-disk failure, and the like, and selects the copy location;
若副本个数大于或等于N+M(由存储策略的参数确定,一般为6+2),则执行步骤S406,否则执行步骤S407;If the number of copies is greater than or equal to N + M (determined by the parameters of the storage policy, generally 6 + 2), then step S406 is performed, otherwise step S407 is performed;
S406:FLR确定副本位置,将数据写入对应磁盘,结束流程。S406: The FLR determines the copy location, writes the data to the corresponding disk, and ends the process.
S407:FLR选择亚健康组集合为当前集合;S407: The FLR selects the sub-health group set as the current set;
S408:FLR获取随机数r,将r与亚健康组集合权重进行mod计算,得到关键字key;S408: The FLR obtains the random number r, and performs mod calculation on the set weights of the r and the sub-health group to obtain a keyword key;
S409:FLR根据key通过二分查找在当前集合中,查找目标磁盘分组;若查找到目标磁盘分组,则执行步骤S410,否则执行步骤S411;S409: The FLR searches for the target disk group in the current set by the binary search according to the key; if the target disk group is found, step S410 is performed, otherwise step S411 is performed;
S410:FLR在目标磁盘分组中,按照磁盘IO均衡、节点失效、多磁盘故障等情形下保持可靠性条件,选择副本位置;S410: In the target disk group, the FLR maintains the reliability condition according to the disk IO balance, the node failure, the multi-disk failure, and the like, and selects the copy location;
若副本个数大于或等于N+M(由存储策略的参数确定,一般为6+2),则执行步骤S406,否则执行步骤S411;If the number of copies is greater than or equal to N + M (determined by the parameters of the storage policy, generally 6 + 2), then step S406 is performed, otherwise step S411 is performed;
S411:FLR分配磁盘失败,写文件失败,失败退出。S411: FLR fails to allocate a disk, fails to write a file, and fails to exit.
综上可知,本发明实施例中,首先将所有磁盘分组管理,并且进一步区分磁盘分组的可服务性状态,在存储数据时,优先从健康组中分配副本位置,多块磁盘(大于M)出现故障时,只要这些磁盘不属于同一个磁盘分组,就不会造成文件内容丢失,可以提高系统整体可靠性。 In summary, in the embodiment of the present invention, all disks are first grouped and managed, and the serviceability status of the disk group is further distinguished. When the data is stored, the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear. In the event of a failure, as long as the disks do not belong to the same disk group, the file contents will not be lost, which can improve the overall reliability of the system.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
本发明实施例中,将所有磁盘分组管理,区分磁盘分组的可服务性状态,在存储数据时,优先从健康组中分配副本位置,多块磁盘(大于M)出现故障时,只要这些磁盘不属于同一个磁盘分组,就不会造成文件内容丢失,可以提高系统整体可靠性。 In the embodiment of the present invention, all disks are grouped and managed to distinguish the serviceability status of the disk group. When storing data, the copy position is preferentially allocated from the health group. When multiple disks (greater than M) fail, as long as the disks are not If it belongs to the same disk group, it will not cause the file content to be lost, which can improve the overall reliability of the system.
Claims (13)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510245500.6A CN106293492B (en) | 2015-05-14 | 2015-05-14 | A storage management method and distributed file system |
| CN201510245500.6 | 2015-05-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016180049A1 true WO2016180049A1 (en) | 2016-11-17 |
Family
ID=57248414
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/071235 Ceased WO2016180049A1 (en) | 2015-05-14 | 2016-01-18 | Storage management method and distributed file system |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106293492B (en) |
| WO (1) | WO2016180049A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019023260A1 (en) * | 2017-07-24 | 2019-01-31 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US10819656B2 (en) | 2017-07-24 | 2020-10-27 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US11030062B2 (en) | 2017-08-10 | 2021-06-08 | Rubrik, Inc. | Chunk allocation |
| CN113791893A (en) * | 2021-08-16 | 2021-12-14 | 济南浪潮数据技术有限公司 | A method and device for realizing capacity balance based on disk grouping |
| CN114594916A (en) * | 2022-03-19 | 2022-06-07 | 山西三叶虫信息技术股份有限公司 | Enterprise file storage management method and device, electronic equipment and storage medium |
| CN115145490A (en) * | 2022-07-26 | 2022-10-04 | 济南浪潮数据技术有限公司 | Disk detection and alarm method of distributed storage system and related device |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107436826B (en) * | 2017-08-15 | 2018-12-18 | 金钱猫科技股份有限公司 | A cold data processing method and terminal |
| CN110535898B (en) * | 2018-05-25 | 2022-10-04 | 许继集团有限公司 | Method for storing and complementing copies and selecting nodes in big data storage and management system |
| CN109407981A (en) * | 2018-09-28 | 2019-03-01 | 深圳市茁壮网络股份有限公司 | A kind of data processing method and device |
| CN109445712A (en) * | 2018-11-09 | 2019-03-08 | 浪潮电子信息产业股份有限公司 | Instruction processing method, system, device and computer-readable storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140149356A1 (en) * | 2012-11-26 | 2014-05-29 | Amazon Technologies, Inc. | Automatic repair of corrupted blocks in a database |
| CN104346221A (en) * | 2013-08-02 | 2015-02-11 | 北京百度网讯科技有限公司 | Method and device for grading and dispatching management of server hardware equipment and server |
| CN104484134A (en) * | 2014-12-23 | 2015-04-01 | 北京华胜天成科技股份有限公司 | Method and device for allocating distributed type storage magnetic discs |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050010722A1 (en) * | 2003-07-11 | 2005-01-13 | Chih-Wei Chen | Multi-volume disk array management method and system |
| US8019728B2 (en) * | 2008-04-17 | 2011-09-13 | Nec Laboratories America, Inc. | Dynamically quantifying and improving the reliability of distributed data storage systems |
| CN102981927B (en) * | 2011-09-06 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Distributed raid-array storage means and distributed cluster storage system |
-
2015
- 2015-05-14 CN CN201510245500.6A patent/CN106293492B/en active Active
-
2016
- 2016-01-18 WO PCT/CN2016/071235 patent/WO2016180049A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140149356A1 (en) * | 2012-11-26 | 2014-05-29 | Amazon Technologies, Inc. | Automatic repair of corrupted blocks in a database |
| CN104346221A (en) * | 2013-08-02 | 2015-02-11 | 北京百度网讯科技有限公司 | Method and device for grading and dispatching management of server hardware equipment and server |
| CN104484134A (en) * | 2014-12-23 | 2015-04-01 | 北京华胜天成科技股份有限公司 | Method and device for allocating distributed type storage magnetic discs |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019023260A1 (en) * | 2017-07-24 | 2019-01-31 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US10819656B2 (en) | 2017-07-24 | 2020-10-27 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US11030062B2 (en) | 2017-08-10 | 2021-06-08 | Rubrik, Inc. | Chunk allocation |
| CN113791893A (en) * | 2021-08-16 | 2021-12-14 | 济南浪潮数据技术有限公司 | A method and device for realizing capacity balance based on disk grouping |
| CN114594916A (en) * | 2022-03-19 | 2022-06-07 | 山西三叶虫信息技术股份有限公司 | Enterprise file storage management method and device, electronic equipment and storage medium |
| CN114594916B (en) * | 2022-03-19 | 2023-08-18 | 山西三叶虫信息技术股份有限公司 | Enterprise file storage management method and device, electronic equipment and storage medium |
| CN115145490A (en) * | 2022-07-26 | 2022-10-04 | 济南浪潮数据技术有限公司 | Disk detection and alarm method of distributed storage system and related device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106293492B (en) | 2021-08-20 |
| CN106293492A (en) | 2017-01-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2016180049A1 (en) | Storage management method and distributed file system | |
| US10983860B2 (en) | Automatic prefill of a storage system with conditioning of raid stripes | |
| US9823980B2 (en) | Prioritizing data reconstruction in distributed storage systems | |
| CN103152395B (en) | A kind of storage means of distributed file system and device | |
| CN103929454B (en) | The method and system of load balancing storage in a kind of cloud computing platform | |
| CN101674233B (en) | Peterson graph-based storage network structure and data read-write method thereof | |
| US10356150B1 (en) | Automated repartitioning of streaming data | |
| US20100161564A1 (en) | Cluster data management system and method for data recovery using parallel processing in cluster data management system | |
| US11016674B2 (en) | Method, device, and computer program product for reading data | |
| CN108540315B (en) | Distributed storage system, method and device | |
| CN106844108B (en) | A kind of date storage method, server and storage system | |
| US20100161565A1 (en) | Cluster data management system and method for data restoration using shared redo log in cluster data management system | |
| WO2019119311A1 (en) | Data storage method, device, and system | |
| US10708355B2 (en) | Storage node, storage node administration device, storage node logical capacity setting method, program, recording medium, and distributed data storage system | |
| CN106407083A (en) | Fault detection method and device | |
| CN107133228A (en) | A kind of method and device of fast resampling | |
| CN101827121A (en) | Method, service end and system for creating files in RAID (Redundant Array of Independent Disk) | |
| CN110413694A (en) | Metadata management method and relevant apparatus | |
| CN115756955A (en) | Data backup and data recovery method and device and computer equipment | |
| CN111488127B (en) | Data parallel storage method, device and data reading method based on disk cluster | |
| US11256428B2 (en) | Scaling raid-based storage by redistributing splits | |
| JP6233403B2 (en) | Storage system, storage device, storage device control method and control program, management device, management device control method and control program | |
| CN112256204A (en) | Storage resource allocation method and device, storage node and storage medium | |
| CN106708445B (en) | Link selecting method and device | |
| JP4428202B2 (en) | Disk array subsystem, distributed arrangement method, control method, and program in disk array subsystem |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16791899 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16791899 Country of ref document: EP Kind code of ref document: A1 |