WO2016180049A1 - Procédé de gestion de stockage et système de fichiers distribués - Google Patents
Procédé de gestion de stockage et système de fichiers distribués Download PDFInfo
- Publication number
- WO2016180049A1 WO2016180049A1 PCT/CN2016/071235 CN2016071235W WO2016180049A1 WO 2016180049 A1 WO2016180049 A1 WO 2016180049A1 CN 2016071235 W CN2016071235 W CN 2016071235W WO 2016180049 A1 WO2016180049 A1 WO 2016180049A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- disk
- group
- storage
- health
- disks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- This application relates to, but is not limited to, the field of storage applications for distributed file systems.
- the distributed file system for large-scale application uses the erasure code as the underlying file storage strategy, which can significantly reduce the physical space occupied by files and maintain the same reliability as multiple copies.
- the system scale becomes larger and longer. After the time is running, the disk is damaged or the fault is not a small probability event, but the normal situation that is often faced in daily maintenance.
- the configuration file storage policy original block N the check block M is 6:2, and two fragment copies can be damaged at the same time, and the overall reliability of the system is the same as the 3 copy mode.
- the damage of three disks in one storage system may cause partial data corruption.
- a storage cluster contains hundreds or even thousands of disks, and after a long run, tens of millions of copies are stored on each disk. If the system corrupts more than 2 disks at the same time in a certain period of time, the data content of some files cannot be repaired, resulting in unreadable file contents.
- This paper provides a storage management method and distributed file system to alleviate the existence of distributed file system. At the same time, damage to more than 2 disks will result in unreadable file content.
- a storage management method for a distributed file system comprising: dividing all disks of a distributed file system into multiple disk groups; obtaining serviceability of each disk group according to each disk The serviceability of the packet divides each disk group into a health group set, a sub-health group set, and a bad group set; when data needs to be stored, a disk group is selected from the health group set, and the disk storage data is selected from the selected disk group. .
- dividing all the disks of the distributed file system into multiple disk groups includes: dividing each disk into multiple disk groups according to the slot information in each disk in the storage server to which the disk belongs.
- dividing all the disks into multiple disk groups according to the slot information of each disk in the storage server to which the disk belongs includes: dividing the disks having one or more the same slot information in all storage servers into one disk grouping. .
- obtaining the serviceability of each disk group comprises: obtaining the number of available volumes of each disk group, and determining the serviceability of each disk group according to the storage policy and the number of available volumes of each disk group.
- the method further includes: when the disk group cannot be selected from the health group set, the disk group is selected from the sub-health group set, and if the disk group cannot be selected from the sub-health group set, the data storage fails.
- selecting a disk group from the health group set or the sub-health group set comprises: selecting a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
- a distributed file system comprising: a plurality of disks, and a location register, the location register comprising: a disk grouping module and a selection memory module, wherein the disk grouping module is configured to: all disks of the distributed file system Divided into multiple disk groups; obtain serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to the serviceability of each disk group; the selected storage The module is set to: select the disk group from the health group set when the data needs to be stored, and select the disk storage data from the selected disk group.
- the disk grouping module is configured to divide all disks into multiple disk groups according to slot information in each disk in which the disk belongs.
- the disk grouping module is configured to: divide all disks having one or more same slot information in all storage servers into one disk group.
- the disk grouping module is configured to: obtain the number of available volumes of each disk group, and determine the serviceable of each disk group according to the storage policy and the number of available volumes of each disk grouping. Sex.
- the selection storage module is further configured to: select a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and if the disk group cannot be selected from the sub-health group set, the data storage failure.
- the selecting the storage module is configured to: select a disk group according to a proportion of available space of each disk group in the health group set or the sub-health group set.
- a computer readable storage medium storing computer executable instructions for performing the method of any of the above.
- the storage management method provided by the embodiment of the present invention firstly manages all disk groups and further distinguishes the serviceability status of the disk group.
- the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear.
- the file contents will not be lost, which can improve the overall reliability of the system.
- FIG. 1 is a flowchart of a storage management method according to a first embodiment of the present invention
- FIG. 2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention.
- FIG. 3 is a flowchart of a step of grouping disks in a storage management method according to a third embodiment of the present invention.
- FIG. 4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention.
- FIG. 1 is a flowchart of a storage management method according to a first embodiment of the present invention.
- the storage management method provided by the embodiment of the present invention includes the following steps:
- S102 Acquire serviceability of each disk group, and divide each disk group into a health group set, a sub-health group set, and a bad group set according to serviceability of each disk group;
- dividing all disks of the distributed file system into multiple disk groups in the foregoing embodiment includes: dividing all disks into multiple disk groups according to slot information in each storage server to which the disk belongs. .
- dividing all disks into multiple disk groups according to slot information in each storage server of the disk in the foregoing embodiment includes: having one or more same slot information in all storage servers.
- the disk is divided into a disk group.
- the obtaining the serviceability of each disk group in the above embodiment comprises: obtaining the number of available volumes of each disk group, and determining each disk group according to the storage policy and the number of available disks per disk grouping. Serviceability.
- the above embodiment further includes: selecting a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and failing to select the disk group from the sub-health group set, the data storage fails .
- selecting a disk group from the health group set or the sub-health group set in the above embodiment comprises selecting a disk group according to a ratio of available space of each disk group in the health group set or the sub-health group set.
- FIG. 2 is a schematic diagram of a distributed file system according to a second embodiment of the present invention.
- the distributed file system provided by the embodiment of the present invention includes multiple disks 22, and the disks belong to different disks.
- a storage server and one or more location registers 21 (Fig. 2 only exemplarily shows a location register), wherein
- the location register 21 includes: a disk grouping module 211 and a selection storage module 212,
- the disk grouping module 211 is configured to divide all the disks 22 of the distributed file system into a plurality of magnetic disks. Disk grouping; obtaining the serviceability of each disk group, and classifying each disk group into a health group set, a sub-health group set, and a bad group set according to the serviceability of each disk group;
- the selection storage module 212 is configured to select a disk group from the health group set when the data needs to be stored, and select the disk storage data from the selected disk group.
- the disk grouping module 211 is configured to divide all disks into a plurality of disk groups based on slot information within each disk in which the disk belongs.
- the disk grouping module 211 is configured to divide a disk having one or more of the same slot information in all storage servers into one disk group.
- the disk grouping module 211 is configured to: obtain the number of available volumes for each disk group, and determine the serviceability of each disk group based on the storage policy and the number of available volumes per disk grouping.
- the selection storage module 212 is further configured to: select a disk group from the sub-health group set when the disk group cannot be selected from the health group set, and if the disk group cannot be selected from the sub-health group set, the data Storage failed.
- the selection storage module 212 is configured to select a disk grouping based on a ratio of available space for each disk grouping in the health group set or the sub-health group set.
- FIG. 3 is a flowchart of a disk grouping step in a storage management method according to a third embodiment of the present invention. As shown in FIG. 3, the disk grouping step in the storage management method provided by the embodiment of the present invention includes the following steps:
- S301 Group all disks at a physical level, and divide all disks into multiple disk groups
- the disk of each storage server is divided into multiple disk groups according to the physical slot number s0, s1, ..., sd-1, for example, in the same slot of all storage servers.
- the disks constitute a disk group, the number of disk groups g is equal to the number of slots d; or a plurality of disks of the same slot form a disk group, and the number of disk groups g is smaller than the number of slots d.
- the number of disks and capacity of each disk group can be different. For system read and write performance, you need to ensure that the disk read and write interfaces are balanced.
- the FLR When the system is powered on, the FLR initializes the disk packet data area, and all the disks are respectively added to the disk packet data area according to the grouping; when the disk or the node fails, the corresponding disk grouping information is updated.
- the information of the disk grouping includes approximate real-time statistics such as all disk information, total packet capacity, packet available capacity, packet read/write IO, packet serviceability status, etc., and the data is reported to the file location register FLR for the CHUNK of the file to be allocated. Decision making when copying the location.
- the FLR traverses each disk group, counts the number of available disks for all disks grouped by one disk, and calculates the sum of the volume weights, which are included in the disk grouping.
- the FLR traverses all disk groups and sets the serviceability of the disk group according to certain rules, such as the number of available disks, node failure, network failure, disk abnormality, etc., and adds them to the health group set, the sub-health group set, and the bad group.
- the set; the group serviceability state refers to whether the disk group can be used by the FLR to allocate the CHUNK copy position, and can include three states: a health group, a sub-health group, and a bad group.
- the status of the disk grouping is that the number of available disks is greater than or equal to N+M, and the storage unit node fails to maintain reliability, the unit media surface state is normal, and the like; the disk grouping status is sub-health refers to: the disk suddenly drops.
- Factors such as disk or storage server power loss, resulting in a reduction in the available volume of a packet.
- the packet is considered to be less serviceable, and the disk grouping status is set to the sub-health group; when a packet available volume is less than N, The disk grouping status is set to bad.
- S305 The FLR traverses the disk groups in the above three sets respectively, and maps the disk group according to the number of available volumes.
- FIG. 4 is a flowchart of a data storage step in a storage management method according to a third embodiment of the present invention. As shown in FIG. 4, the data storage step in the storage management method provided by the embodiment of the present invention includes the following steps:
- the FLR After receiving the message, the FLR selects the health group set as the current set.
- the FLR obtains the random number r, and performs mod (modulo) calculation on the set weight of r and the health group to obtain a keyword key;
- step S404 The FLR searches for the target disk group by using the binary search in the current set according to the key; if the target disk group is found, step S405 is performed, otherwise step S407 is performed;
- the FLR maintains the reliability condition according to the disk IO equalization, the node failure, the multi-disk failure, and the like, and selects the copy location;
- step S406 is performed, otherwise step S407 is performed;
- the FLR determines the copy location, writes the data to the corresponding disk, and ends the process.
- the FLR selects the sub-health group set as the current set
- the FLR obtains the random number r, and performs mod calculation on the set weights of the r and the sub-health group to obtain a keyword key;
- step S409 The FLR searches for the target disk group in the current set by the binary search according to the key; if the target disk group is found, step S410 is performed, otherwise step S411 is performed;
- the FLR maintains the reliability condition according to the disk IO balance, the node failure, the multi-disk failure, and the like, and selects the copy location;
- step S406 is performed, otherwise step S411 is performed;
- S411 FLR fails to allocate a disk, fails to write a file, and fails to exit.
- all disks are first grouped and managed, and the serviceability status of the disk group is further distinguished.
- the copy position is preferentially allocated from the health group, and multiple disks (greater than M) appear.
- the file contents will not be lost, which can improve the overall reliability of the system.
- all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
- the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
- the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
- the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
- all disks are grouped and managed to distinguish the serviceability status of the disk group.
- the copy position is preferentially allocated from the health group.
- multiple disks greater than M
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé de gestion de stockage et un système de fichiers distribués. Le procédé de gestion de stockage comprend les étapes consistant : à diviser tous les disques magnétiques du système de fichiers distribués en une pluralité de paquets de disque magnétique (S101) ; à acquérir la capacité de service de chaque paquet de disque magnétique, et à diviser chaque paquet de disque magnétique en un ensemble de groupe sain, un sous-ensemble de groupe sain et un ensemble de groupe non sain selon la capacité de service de chaque paquet de disque magnétique (S102) ; et lorsque des données doivent être stockées, à sélectionner un paquet de disque magnétique à partir de l'ensemble de groupe sain, et à sélectionner un disque à partir du paquet de disque magnétique sélectionné pour stocker les données (S103).
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510245500.6A CN106293492B (zh) | 2015-05-14 | 2015-05-14 | 一种存储管理方法及分布式文件系统 |
| CN201510245500.6 | 2015-05-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016180049A1 true WO2016180049A1 (fr) | 2016-11-17 |
Family
ID=57248414
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/071235 Ceased WO2016180049A1 (fr) | 2015-05-14 | 2016-01-18 | Procédé de gestion de stockage et système de fichiers distribués |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106293492B (fr) |
| WO (1) | WO2016180049A1 (fr) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019023260A1 (fr) * | 2017-07-24 | 2019-01-31 | Rubrik, Inc. | Limitation de bandes passantes de réseau à l'aide d'interfaces réseau par nœud |
| US10819656B2 (en) | 2017-07-24 | 2020-10-27 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US11030062B2 (en) | 2017-08-10 | 2021-06-08 | Rubrik, Inc. | Chunk allocation |
| CN113791893A (zh) * | 2021-08-16 | 2021-12-14 | 济南浪潮数据技术有限公司 | 一种基于磁盘分组实现容量均衡的方法及装置 |
| CN114594916A (zh) * | 2022-03-19 | 2022-06-07 | 山西三叶虫信息技术股份有限公司 | 一种企业文件存储管理方法、装置、电子设备及存储介质 |
| CN115145490A (zh) * | 2022-07-26 | 2022-10-04 | 济南浪潮数据技术有限公司 | 一种分布式存储系统的磁盘检测与告警方法及相关装置 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107436826B (zh) * | 2017-08-15 | 2018-12-18 | 金钱猫科技股份有限公司 | 一种冷数据处理方法及终端 |
| CN110535898B (zh) * | 2018-05-25 | 2022-10-04 | 许继集团有限公司 | 大数据存储中副本存放、补全、节点选择方法及管理系统 |
| CN109407981A (zh) * | 2018-09-28 | 2019-03-01 | 深圳市茁壮网络股份有限公司 | 一种数据处理方法及装置 |
| CN109445712A (zh) * | 2018-11-09 | 2019-03-08 | 浪潮电子信息产业股份有限公司 | 一种指令处理方法、系统、设备及计算机可读存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140149356A1 (en) * | 2012-11-26 | 2014-05-29 | Amazon Technologies, Inc. | Automatic repair of corrupted blocks in a database |
| CN104346221A (zh) * | 2013-08-02 | 2015-02-11 | 北京百度网讯科技有限公司 | 服务器硬件设备等级划分、调度管理方法及装置、服务器 |
| CN104484134A (zh) * | 2014-12-23 | 2015-04-01 | 北京华胜天成科技股份有限公司 | 分布式存储的磁盘分配方法及装置 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050010722A1 (en) * | 2003-07-11 | 2005-01-13 | Chih-Wei Chen | Multi-volume disk array management method and system |
| US8019728B2 (en) * | 2008-04-17 | 2011-09-13 | Nec Laboratories America, Inc. | Dynamically quantifying and improving the reliability of distributed data storage systems |
| CN102981927B (zh) * | 2011-09-06 | 2015-11-25 | 阿里巴巴集团控股有限公司 | 分布式独立冗余磁盘阵列存储方法及分布式集群存储系统 |
-
2015
- 2015-05-14 CN CN201510245500.6A patent/CN106293492B/zh active Active
-
2016
- 2016-01-18 WO PCT/CN2016/071235 patent/WO2016180049A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140149356A1 (en) * | 2012-11-26 | 2014-05-29 | Amazon Technologies, Inc. | Automatic repair of corrupted blocks in a database |
| CN104346221A (zh) * | 2013-08-02 | 2015-02-11 | 北京百度网讯科技有限公司 | 服务器硬件设备等级划分、调度管理方法及装置、服务器 |
| CN104484134A (zh) * | 2014-12-23 | 2015-04-01 | 北京华胜天成科技股份有限公司 | 分布式存储的磁盘分配方法及装置 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019023260A1 (fr) * | 2017-07-24 | 2019-01-31 | Rubrik, Inc. | Limitation de bandes passantes de réseau à l'aide d'interfaces réseau par nœud |
| US10819656B2 (en) | 2017-07-24 | 2020-10-27 | Rubrik, Inc. | Throttling network bandwidth using per-node network interfaces |
| US11030062B2 (en) | 2017-08-10 | 2021-06-08 | Rubrik, Inc. | Chunk allocation |
| CN113791893A (zh) * | 2021-08-16 | 2021-12-14 | 济南浪潮数据技术有限公司 | 一种基于磁盘分组实现容量均衡的方法及装置 |
| CN114594916A (zh) * | 2022-03-19 | 2022-06-07 | 山西三叶虫信息技术股份有限公司 | 一种企业文件存储管理方法、装置、电子设备及存储介质 |
| CN114594916B (zh) * | 2022-03-19 | 2023-08-18 | 山西三叶虫信息技术股份有限公司 | 一种企业文件存储管理方法、装置、电子设备及存储介质 |
| CN115145490A (zh) * | 2022-07-26 | 2022-10-04 | 济南浪潮数据技术有限公司 | 一种分布式存储系统的磁盘检测与告警方法及相关装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106293492B (zh) | 2021-08-20 |
| CN106293492A (zh) | 2017-01-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2016180049A1 (fr) | Procédé de gestion de stockage et système de fichiers distribués | |
| US10983860B2 (en) | Automatic prefill of a storage system with conditioning of raid stripes | |
| US9823980B2 (en) | Prioritizing data reconstruction in distributed storage systems | |
| CN103152395B (zh) | 一种分布式文件系统的存储方法及装置 | |
| CN103929454B (zh) | 一种云计算平台中负载均衡存储的方法和系统 | |
| CN101674233B (zh) | 基于彼得森图的存储网络系统及数据读写方法 | |
| US10356150B1 (en) | Automated repartitioning of streaming data | |
| US20100161564A1 (en) | Cluster data management system and method for data recovery using parallel processing in cluster data management system | |
| US11016674B2 (en) | Method, device, and computer program product for reading data | |
| CN108540315B (zh) | 分布式存储系统、方法和装置 | |
| CN106844108B (zh) | 一种数据存储方法、服务器以及存储系统 | |
| US20100161565A1 (en) | Cluster data management system and method for data restoration using shared redo log in cluster data management system | |
| WO2019119311A1 (fr) | Procédé, dispositif et système de stockage de données | |
| US10708355B2 (en) | Storage node, storage node administration device, storage node logical capacity setting method, program, recording medium, and distributed data storage system | |
| CN106407083A (zh) | 故障检测方法及装置 | |
| CN107133228A (zh) | 一种数据重分布的方法及装置 | |
| CN101827121A (zh) | 在raid中创建文件的方法、服务端和系统 | |
| CN110413694A (zh) | 元数据管理方法及相关装置 | |
| CN115756955A (zh) | 一种数据备份、数据恢复的方法、装置及计算机设备 | |
| CN111488127B (zh) | 基于磁盘簇的数据并行存储方法、装置以及数据读取方法 | |
| US11256428B2 (en) | Scaling raid-based storage by redistributing splits | |
| JP6233403B2 (ja) | ストレージシステム、ストレージ装置、ストレージ装置の制御方法及び制御プログラム、管理装置、管理装置の制御方法及び制御プログラム | |
| CN112256204A (zh) | 存储资源分配方法、装置、存储节点及存储介质 | |
| CN106708445B (zh) | 链路选择方法及装置 | |
| JP4428202B2 (ja) | ディスクアレイ・サブシステム、ディスクアレイ・サブシステムにおける分散配置方法、制御方法、プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16791899 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16791899 Country of ref document: EP Kind code of ref document: A1 |