CN106100801A - A kind of non-homogeneous erasure code method of cloud storage system - Google Patents
A kind of non-homogeneous erasure code method of cloud storage system Download PDFInfo
- Publication number
- CN106100801A CN106100801A CN201610743775.7A CN201610743775A CN106100801A CN 106100801 A CN106100801 A CN 106100801A CN 201610743775 A CN201610743775 A CN 201610743775A CN 106100801 A CN106100801 A CN 106100801A
- Authority
- CN
- China
- Prior art keywords
- data block
- data
- cold data
- cold
- data blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000012795 verification Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 3
- 238000011084 recovery Methods 0.000 abstract description 28
- 230000008439 repair process Effects 0.000 abstract description 7
- 238000012937 correction Methods 0.000 abstract description 6
- 238000005728 strengthening Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 16
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 201000009240 nasopharyngitis Diseases 0.000 description 2
- 238000009828 non-uniform distribution Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0056—Systems characterized by the type of code used
- H04L1/0061—Error detection codes
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明公开了一种云存储系统的非均匀纠删编码方法,根据物理存储节点的失效率,把存储在它们上的数据块分入大小不同的逻辑分组,并为不同的分组提供不同级别的纠删保护,进而对高失效率存储节点上的数据给予更多的保护,同时让它们的修复开销更低。如此,本发明通过加强高失效率存储节点上数据的保护能提高系统的总体可靠性,通过降低高失效率存储节点上数据的修复开销能进一步降低系统总体修复开销。
The invention discloses a non-uniform erasure correction coding method for a cloud storage system. According to the failure rate of physical storage nodes, data blocks stored on them are divided into logical groups of different sizes, and different levels of data are provided for different groups. Erasure protection, thus giving more protection to data on storage nodes with high failure rates, while making their repair costs lower. In this way, the present invention can improve the overall reliability of the system by strengthening the protection of data on storage nodes with high failure rates, and can further reduce the overall system repair costs by reducing the data recovery costs on storage nodes with high failure rates.
Description
技术领域technical field
本发明涉及一种云存储系统的非均匀纠删编码方法。The invention relates to a non-uniform erasure correction coding method for a cloud storage system.
背景技术Background technique
近年来,以大型数据中心为代表的云存储系统发展迅猛(如Windows AzureStorage、Amazon S3、Google GFS]、EMC ATMOS等),累积的数据量从PB到EB甚至到ZB,使得云存储网络规模和节点数量呈爆炸式增长,但是数据的失效率也随之越来越高(例如,Google云存储中心的每执行一个MapReduce作业平均有5个左右节点失效,大约4000个节点的Hadoop系统平均6个小时就有一个磁盘失效,且其失效率随系统规模增长而迅速增长)。存储节点(指存储服务器或磁盘)在失效后,导致所存储数据不可访问,尤其是关键数据的丢失,将给用户带来巨大损失。所以,需要采用数据纠删或灾备技术来保证数据的可靠性,在一部分数据删除(即丢失)后,能够进行纠删操作,仍然能保证存储系统正常工作。多副本技术(通常是3副本)是众多云存储系统的默认存储标准,是常用的数据灾备技术,在原始数据丢失后,用剩余副本进行替换。Google GFS、Hadoop HDFS和CFS等云存储系统中均采用多副本技术,其基本策略是:在文件对象写入时,将其拆分为固定大小的数据块(如64M)进行分布式存储,且每个数据块在写入时复制为多个副本,采用链式方式依次推送到云存储网络中的不同存储节点,当某些存储节点失效时,可对冗余副本进行复制操作,恢复出数据块。总的来看,n副本技术的主要优点体现在:一是能很好地支持并发读操作,在数据读取率高时具有性能优势;二是数据重构/恢复开销不大,丢失一个数据块,只要复制一个数据块即可。但是,其缺点也较明显:数据副本的存储开销极其巨大,其存储开销是正常情况的n倍,尤其是当对系统容删除能力要求高时,该问题尤为突出。基于有限域运算的(n,k)纠删码(Erasure Codes)能将n个数据块编码为(n+k)个纠删码分块(多出k个冗余校验数据块),只要获取其中任意n个纠删码分块就能恢复所有n个原始数据块,能最多容删除k个数据块。代表性的Reed-Solomon编码,即RS(n,k)纠删码是一种系统性编码,即编码后原n个数据块保持不变,剩余k个冗余校验数据块为原始数据块计算得来。与多副本技术相比,容错能力相同条件下纠删码技术的存储效率更高、成本更低;例如,若按照100PB数据(WindowsAzure Storage 2012年左右规模),$1/GB/年,按容错能力1来计算(若副本技术和纠删码各需要2倍和1.5倍存储空间),则仅存储开销纠删码每年便可节省$52,428,800。In recent years, cloud storage systems represented by large-scale data centers have developed rapidly (such as Windows AzureStorage, Amazon S3, Google GFS ] , EMC ATMOS, etc.), and the accumulated data volume has ranged from PB to EB or even to ZB, making cloud storage network scale and The number of nodes is growing explosively, but the data failure rate is also getting higher and higher (for example, every time a MapReduce job is executed in the Google cloud storage center, there are about 5 node failures on average, and the Hadoop system with about 4000 nodes has an average of 6 failures. One disk fails every hour, and the failure rate increases rapidly with the size of the system). When a storage node (referring to a storage server or a disk) fails, the stored data becomes inaccessible, especially the loss of key data, which will bring huge losses to users. Therefore, it is necessary to use data correction or disaster recovery technology to ensure the reliability of data. After some data is deleted (that is, lost), the correction operation can be performed, and the normal operation of the storage system can still be guaranteed. Multi-copy technology (usually 3 copies) is the default storage standard of many cloud storage systems, and it is a commonly used data disaster recovery technology. After the original data is lost, the remaining copies are used to replace it. Cloud storage systems such as Google GFS, Hadoop HDFS, and CFS all use multi-copy technology. The basic strategy is: when a file object is written, it is split into fixed-size data blocks (such as 64M) for distributed storage, and Each data block is copied into multiple copies when written, and pushed to different storage nodes in the cloud storage network in a chained manner. When some storage nodes fail, the redundant copies can be copied to restore the data piece. In general, the main advantages of n-copy technology are as follows: first, it can well support concurrent read operations, and has performance advantages when the data reading rate is high; second, the data reconstruction/recovery overhead is not large, and one data loss block, just copy a block of data. However, its disadvantages are also obvious: the storage overhead of the data copy is extremely huge, and its storage overhead is n times that of the normal situation, especially when the system has high requirements for the capacity to delete. (n,k) erasure codes (Erasure Codes) based on finite field operations can encode n data blocks into (n+k) erasure code blocks (more than k redundant check data blocks), as long as Obtaining any n erasure code blocks among them can restore all n original data blocks, and can delete up to k data blocks. The representative Reed-Solomon code, that is, the RS(n,k) erasure code is a systematic code, that is, the original n data blocks remain unchanged after coding, and the remaining k redundant check data blocks are the original data blocks Calculated. Compared with multi-copy technology, erasure coding technology has higher storage efficiency and lower cost under the same fault tolerance; 1 (if the copy technology and the erasure code require 2 times and 1.5 times the storage space respectively), then only the storage overhead erasure code can save $52,428,800 per year.
因此,为了在保障数据高可靠性的同时尽可能地降低系统的存储开销和运营成本,各主流云存储服务供应商采用的典型技术是将逐渐冷却后的数据块编码为纠删码并删除多余副本,即在热数据块存储早期采用多副本冗余(如标准的3副本机制)以保证并发读操作性能,而当数据块逐渐冷却后(用户访问率低时)则采用纠删码技术将其转换为纠删码分块存储。例如,微软的Windows Azure Storage采用LRC纠删码处理冷数据块、Google的GFS II(colossus)系统采用RS(6,3)纠删码处理冷数据块,Facebook的f4云存储系统则采用Xorbas和Hitchhiker纠删码、DiskReduce系统和EMC的ATMOS系统则采用RS(10,6)或RS(9,3)等纠删码。然而,纠删码技术的缺点也很明显:(n,k)纠删码最大的缺陷是需要获取至少n个数据块才能恢复原始数据;当只需要恢复一个数据块时,仍然需要获取n个数据块,其数据恢复时的系统I/O和网络带宽成本太高,远高于多副本技术中只要一个数据块的开销。显然,在大规模云存储环境下将导致其数据恢复/重构效率低,网络开销大,往往难以适应云存储环境下的数据密集型处理需求。目前纠删码技术主要是研究如何在存储空间开销和数据修复流量开销之间进行权衡处理,以解决纠删码技术数据修复流量开销大的问题,近年来产生了多种新技术,如再生码、本地重构码LRC(Local Reconstruction Codes)。Therefore, in order to reduce system storage overhead and operating costs as much as possible while ensuring high data reliability, the typical technology adopted by mainstream cloud storage service providers is to encode gradually cooled data blocks into erasure codes and delete redundant data blocks. Copy, that is, in the early stage of hot data block storage, multi-copy redundancy (such as the standard 3-copy mechanism) is used to ensure the performance of concurrent read operations, and when the data block gradually cools down (when the user access rate is low), erasure coding technology is used to It translates to erasure coded block storage. For example, Microsoft's Windows Azure Storage uses LRC erasure codes to process cold data blocks, Google's GFS II (colossus) system uses RS (6,3) erasure codes to process cold data blocks, and Facebook's f4 cloud storage system uses Xorbas and Hitchhiker erasure codes, DiskReduce systems, and EMC's ATMOS systems use erasure codes such as RS(10,6) or RS(9,3). However, the disadvantages of erasure code technology are also obvious: (n,k) The biggest defect of erasure code is that at least n data blocks need to be obtained to restore the original data; when only one data block needs to be restored, n data blocks still need to be obtained For data blocks, the cost of system I/O and network bandwidth during data recovery is too high, much higher than the overhead of only one data block in multi-copy technology. Obviously, in a large-scale cloud storage environment, the data recovery/reconstruction efficiency will be low, and the network overhead will be large, which is often difficult to adapt to the data-intensive processing requirements in the cloud storage environment. At present, the erasure code technology is mainly to study how to balance the storage space overhead and the data repair traffic overhead, so as to solve the problem of large data repair traffic overhead of the erasure code technology. In recent years, a variety of new technologies have been produced, such as regenerative code , Local Reconstruction Codes LRC (Local Reconstruction Codes).
然而,目前云存储中心的纠删码技术均沿袭传统的通信领域的编码方法,没有针对数据失效模式的特点进行设计。近年来,已经有不少研究表明,各个存储节点的失效率是不一致的,受到多种物理参数的影响(如,存储节点所在机柜的水槽温度、磁盘工作温度、磁盘工作年龄等),呈现出一种非均匀分布的特点。例如,微软公司通过大量的实验统计,得出的研究结果表明,不同磁盘的年平均失效率随平均工作温度变化而有较大的波动,年平均失效率最高可达12%,最低时则为3%左右;而存储服务器的年平均失效率与所在机柜的水槽温度有较大关联,在2%~14%之间波动,可见通常最大失效率是最小失效率的6-7倍。由于各个存储设备的失效率不同,将导致其存储的数据的失效率不同,呈现出一种非均匀分布。However, the current erasure coding technology in the cloud storage center follows the traditional coding method in the communication field, and has not been designed for the characteristics of data failure modes. In recent years, many studies have shown that the failure rate of each storage node is inconsistent and is affected by various physical parameters (such as the temperature of the water tank in the cabinet where the storage node is located, the operating temperature of the disk, the age of the disk, etc.), showing a A characteristic of a non-uniform distribution. For example, Microsoft Corporation, through a large number of experimental statistics, has obtained research results that show that the annual average failure rate of different disks fluctuates greatly with the average operating temperature. The average annual failure rate of the storage server is closely related to the water tank temperature of the cabinet where it is located, fluctuating between 2% and 14%. It can be seen that the maximum failure rate is usually 6-7 times the minimum failure rate. Since the failure rate of each storage device is different, the failure rate of the data stored therein will be different, presenting a non-uniform distribution.
发明内容Contents of the invention
本发明所要解决的技术问题是,针对现有技术不足,提供一种云存储系统的非均匀纠删编码方法。The technical problem to be solved by the present invention is to provide a non-uniform erasure correction coding method for a cloud storage system in view of the deficiencies in the prior art.
为解决上述技术问题,本发明所采用的技术方案是:一种云存储系统的非均匀纠删编码方法,包括以下步骤:In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for non-uniform erasure correction coding of a cloud storage system, comprising the following steps:
1)找出要编码的冷数据块,并对所述冷数据块进行分组,将易失数据块放入小分组,不易失效的数据块放入大分组;1) Find out the cold data blocks to be encoded, and group the cold data blocks, put the volatile data blocks into small groups, and put the data blocks that are not easy to fail into large groups;
2)构造非均匀编码生成矩阵,并利用编码生成矩阵对冷数据块进行编码;2) Construct a non-uniform encoding generation matrix, and use the encoding generation matrix to encode the cold data block;
3)对编码后的数据块和冗余校验块进行部署。3) Deploy the encoded data blocks and redundancy check blocks.
步骤1)的具体实现过程包括:The specific implementation process of step 1) includes:
1)采用冷数据判断方法找出需要处理的冷数据块集合C,分批对冷数据块进行处理,每次处理一个子集SC={D0,D1,...,Dz},Dz表示冷数据块;z表述冷数据块的个数;1) Use the cold data judgment method to find out the cold data block set C that needs to be processed, and process the cold data blocks in batches, one subset at a time SC={D 0 , D 1 ,...,D z }, D z represents the cold data block; z represents the number of cold data blocks;
2)对子集SC包含的数据块分别进行随机映射,即采用随机哈希函数,将某个冷数据映射到某个存储节点上,表示该冷数据块在编码处理后要迁移到该存储节点,记录存储节点的ID值;2) Randomly map the data blocks contained in the subset SC, that is, use a random hash function to map a certain cold data to a certain storage node, indicating that the cold data block will be migrated to the storage node after encoding processing , record the ID value of the storage node;
3)利用目前已获得的存储节点的失效率对冷数据进行排序,将要存储在高失效率节点上(如年平均失效率高于10%)的易失效数据块排在最前面,存储在较低失效率(如年平均失效率低于10%)节点上的不易失效数据块排后面;3) Sort the cold data by using the failure rate of the storage nodes that have been obtained so far, and arrange the vulnerable data blocks that will be stored on the nodes with high failure rate (such as the annual average failure rate is higher than 10%) at the top, and store them in the lower Low failure rate (such as the annual average failure rate is less than 10%) nodes that are not prone to failure data blocks are ranked behind;
4)将最前面k0个数据块,即放入第一组,在将随后的(k0+1)个数据块放入第二组,再将随后的(k0+2)个数据块放入第三组,以此类推,直到把最后的(k0+l-1)个数据块放入第l个分组,且 4) Put the first k 0 data blocks, namely Put into the first group, put the subsequent (k 0 +1) data blocks into the second group, then put the subsequent (k 0 +2) data blocks into the third group, and so on, until the The last (k 0 +l-1) data blocks are put into the lth packet, and
构造非均匀编码生成矩阵,并利用编码生成矩阵对冷数据块进行编码的具体实现过程包括:构建编码生成矩阵G,通过G编码后让第(i+1)个分组有一个本地校验数据块pi,i=0,1,...,l-1,此外,所有分组还共享r个全局校验数据块Pj,j=0,1,...,r-1,pi由各个组内的冷数据块异或生成,Pj由所有的冷数据块乘以编码系数后再异或生成,G满足下式:The specific implementation process of constructing a non-uniform coding generation matrix and using the coding generation matrix to encode cold data blocks includes: constructing a coding generation matrix G, and allowing the (i+1)th group to have a local check data block after G coding p i , i=0,1,...,l-1, in addition, all groups also share r global parity data blocks P j , j=0,1,...,r-1, p i is determined by The cold data blocks in each group are XOR generated, and P j is generated by multiplying all the cold data blocks by the coding coefficient and then XOR is generated. G satisfies the following formula:
用线性方程组表示为:Expressed as a system of linear equations:
对编码后的数据块和冗余校验块进行部署的具体实现过程包括:利用G生成冗余的校验数据块,得到编码后的冷数据块集合 将它们部署到相应的存储节点,所有冷数据块根据已记录的存储节点ID值进行部署,校验数据块按随机映射方法进行部署。The specific implementation process of deploying the encoded data blocks and redundant check blocks includes: using G to generate redundant check data blocks, and obtain the encoded cold data block set Deploy them to the corresponding storage nodes, all cold data blocks are deployed according to the recorded storage node ID value, and the verification data blocks are deployed according to the random mapping method.
与现有技术相比,本发明所具有的有益效果为:本发明通过加强高失效率存储节点上数据的保护能提高系统的总体可靠性,通过降低高失效率存储节点上数据的修复开销能进一步降低系统总体修复开销,可以在保持相同容删除能力的前提下,有效地降低数据重构开销,包括磁盘I/O开销和网络流量开销,从而有效提升存储系统的性能。Compared with the prior art, the present invention has the beneficial effects that: the present invention can improve the overall reliability of the system by strengthening the protection of data on storage nodes with high failure rates, and can reduce the data repair overhead on storage nodes with high failure rates. Further reducing the overall system repair overhead can effectively reduce data reconstruction overhead, including disk I/O overhead and network traffic overhead, while maintaining the same capacity for deletion, thereby effectively improving the performance of the storage system.
附图说明Description of drawings
图1显示了UFP-LRC(2,2,2)编码方案的示意图。Figure 1 shows a schematic diagram of the UFP-LRC(2,2,2) encoding scheme.
图2显示了理论数值分析下的平均恢复开销对比。Figure 2 shows the comparison of average recovery overhead under theoretical numerical analysis.
图3显示了不同存储开销下的平均恢复开销对比。Figure 3 shows the comparison of average recovery overhead under different storage overheads.
图4显示了可用性对比。Figure 4 shows the usability comparison.
图5显示了实际系统实验中的平均恢复开销对比。Figure 5 shows the comparison of the average recovery overhead in the actual system experiment.
图6显示了数据恢复失败率对比。Figure 6 shows the comparison of data recovery failure rates.
具体实施方式detailed description
本发明设计的编码方法UFP-LRC(Unequal Failure Protection based LocalReconstruction Codes)分为以下几个步骤:第一步是找出对要编码的冷数据块,并对它们进行分组,将易失数据块放入较小分组,不易失效的数据块放入较大分组;第二步是构造非均匀编码生成矩阵,并利用编码生成矩阵对冷数据块进行编码;第三步是对编码后的数据块和冗余校验块进行部署。The encoding method UFP-LRC (Unequal Failure Protection based Local Reconstruction Codes) designed by the present invention is divided into the following steps: the first step is to find out the cold data blocks to be encoded, and group them, and put the volatile data blocks into into smaller groups, and data blocks that are not easy to fail are put into larger groups; the second step is to construct a non-uniform encoding generation matrix, and use the encoding generation matrix to encode cold data blocks; the third step is to encode the encoded data blocks and Redundancy check blocks are deployed.
对于编码方案UFP-LRC(k0,l,r)而言,具体编码实施过程如下:For the coding scheme UFP-LRC(k 0 ,l,r), the specific coding implementation process is as follows:
第一步:采用通用的冷数据判断方法找出需要处理的冷数据块集合C,分批对冷数据块进行处理,每次处理一个子集SC={D0,D1,...,Dz},Dz表示冷数据块。Step 1: Use the common cold data judgment method to find out the cold data block set C that needs to be processed, and process the cold data blocks in batches, one subset at a time SC={D 0 , D 1 ,...,D z }, where D z represents a cold data block.
第二步:对子集SC包含的数据块分别进行随机映射,即采用通用的随机哈希函数,将某个冷数据映射到某个存储节点上(通常可以保证不同的数据块映射到不同的存储节点,避免关联失效),表示该冷数据块在编码处理后要迁移到该存储节点,记录存储节点的ID值。Step 2: Randomly map the data blocks contained in the subset SC, that is, use a general random hash function to map a certain cold data to a certain storage node (usually it can be guaranteed that different data blocks are mapped to different storage node, to avoid association failure), indicating that the cold data block will be migrated to the storage node after encoding processing, and record the ID value of the storage node.
第三步:利用目前已获得的存储节点的失效率对冷数据进行排序,那些将要存储在高失效率节点上的易失效数据块排在最前面,存储在较低失效率节点上的不易失效数据块排后面。Step 3: Sort the cold data by using the failure rate of the storage nodes that have been obtained so far. Those data blocks that are prone to failure to be stored on nodes with a high failure rate are ranked at the top, and those stored on nodes with a lower failure rate are not easy to fail. The data blocks are in the back.
第四步:把数据块分为l个分组。即将最前面k0个数据块(即)放入第一组,在将随后的(k0+1)个数据块(即)放入第二组,再将随后的(k0+2)个数据块放入第三组,以此类推,直到把最后的(k0+l-1)个数据块放入第l个分组。且 Step 4: Divide the data block into l groups. That is, the first k 0 data blocks (ie ) into the first group, and the subsequent (k 0 +1) data blocks (ie ) into the second group, and then put the subsequent (k 0 +2) data blocks into the third group, and so on, until the last (k 0 +l-1) data blocks are put into the lth grouping. and
第五步:构建校验码生成矩阵G,即系数矩阵。通过G编码后可以让第(i+1)个分组有一个本地校验数据块pi(i=0,1,...,l-1),此外,所有分组还共享r个全局校验数据块Pj(j=0,1,...,r-1)。pi由各个组内的数据块异或生成,Pj而由所有的数据块乘以编码系数后再异或生成。G满足下面式(1),运算都在有限域上进行。Step 5: Construct the check code generation matrix G, that is, the coefficient matrix . After G encoding, the (i+1)th group can have a local check data block p i (i=0,1,...,l-1), in addition, all groups also share r global checks Data block P j (j=0,1,...,r-1). p i is generated by XOR of the data blocks in each group, and P j is generated by XOR of all data blocks multiplied by coding coefficients. G satisfies the following formula (1), and the operations are performed on finite fields.
用线性方程组表示为:Expressed as a system of linear equations:
其中,a,b,c,...,都为需要构造的有限域上的编码系数,所有的参数a对应的是第一组数据的编码系数,b对应的是第二组数据的编码系数,c为第三组的编码系数,以此类推。这些编码系数都是一旦设定好,下一轮数据块子集处理时,即可重复使用。Among them, a, b, c,..., are the coding coefficients on the finite field that need to be constructed, and all parameters a correspond to the coding coefficients of the first group of data, and b correspond to the coding coefficients of the second group of data , c is the coding coefficient of the third group, and so on. Once these coding coefficients are set, they can be reused in the next round of data block subset processing.
这些编码系数需要满足下面的式(3):These coding coefficients need to satisfy the following formula (3):
ax,ay,bs,bt,cv,cw...≠0a x ,a y ,b s ,b t ,c v ,c w ...≠0
ax≠ay,bs≠bt,cv≠cw,...a x ≠a y ,b s ≠b t ,c v ≠c w ,...
ax,ay≠bs,bt≠cv,cw≠...a x ,a y ≠b s ,b t ≠c v ,c w ≠...
ax+ay≠bs+bt≠cv+cw≠...a x +a y ≠b s +b t ≠c v +c w ≠...
ax≠bs+bt,bs≠ax+ay,cv≠ax+ay,... (3)a x ≠b s +b t ,b s ≠a x +a y ,c v ≠a x +a y ,... (3)
式(3)表示:所有编码系数a,b,c,...,都不为0;任意组内和组之间的两个编码系数都不相等;任意组内的两个系数之和都不等于另外一个组内的两个系数之和;任意一个系数都不等于其它组内的任意两个系数之和(注意:这里的和运算表示有限域上的异或运算)。Equation (3) expresses: all coding coefficients a, b, c, ... are not 0; the two coding coefficients in any group and between groups are not equal; the sum of the two coefficients in any group is equal to It is not equal to the sum of two coefficients in another group; any coefficient is not equal to the sum of any two coefficients in other groups (note: the sum operation here represents the XOR operation on the finite field).
通过选择满足式(3)的编码系数,UFP-LRC(k0,l,r)将具有良好的容删除能力:By selecting the coding coefficients satisfying formula (3), UFP-LRC(k 0 ,l,r) will have a good ability to tolerate erasure:
(1)所构造的UFP-LRC编码具有最大可恢复性。即只要线性方程组理论上能解码,即可进行数据恢复。(1) The constructed UFP-LRC code has maximum recoverability. That is, as long as the linear equations can be decoded theoretically, data recovery can be performed.
(2)具有非均匀纠删编码保护,即前(r-k0+1)个分组能容忍删除任意(r+2)个原始数据块;而后面的分组没有此特性。(2) It has non-uniform erasure coding protection, that is, the first (rk 0 +1) group can tolerate the deletion of any (r+2) original data blocks; but the latter group does not have this feature.
(3)能够容忍删除任意(r+1)个删除的块(包括原始数据块和校验数据块)。、(3) It can tolerate deletion of any (r+1) deleted blocks (including original data blocks and check data blocks). ,
(4)最多容忍删除(l+r)个块(包括原始数据块和校验数据块)。(4) Tolerate deletion of (l+r) blocks (including original data blocks and check data blocks) at most.
第六步:按照式(1),利用G生成冗余的校验数据块。最后得到编码后的数据块集合将它们部署到相应的存储节点,所有数据块D根据已记录的存储节点ID值进行部署,校验数据块按随机映射方法或其它常用映射方法进行部署。Step 6: According to formula (1), use G to generate redundant check data blocks. Finally, the encoded data block collection is obtained Deploy them to corresponding storage nodes, all data blocks D are deployed according to the recorded storage node ID values, and check data blocks are deployed according to random mapping method or other commonly used mapping methods.
第七步:最后,当某些存储节点失效时,设当有m个数据块,n个校验数据块丢失时,将失效的校验数据块pi或Pj对应的方程从式(2)中删除,删除对应的n个方程后,再从剩余的线性方程中找出任意m个方程构建一个新的方程组(只要该方程组理论上有确定唯一解,即对应的线性矩阵可逆),求解该方程组即可得出所有|SC|个原始数据块D0,D1,...,D|SC|-1,完成数据修复。Step 7: Finally, when some storage nodes fail, it is assumed that when there are m data blocks and n verification data blocks are lost, the equation corresponding to the failed verification data block p i or P j is calculated from the formula (2 ), delete the corresponding n equations, and then find any m equations from the remaining linear equations to construct a new equation system (as long as the equation system has a unique solution in theory, that is, the corresponding linear matrix is invertible) , solving this equation system can obtain all |SC| original data blocks D 0 , D 1 ,...,D |SC|-1 , and complete the data restoration.
下面以UFP-LRC(2,2,2)为例进行详细说明具体的编码和修复过程。The following takes UFP-LRC(2,2,2) as an example to describe the specific encoding and restoration process in detail.
第一步:设采用通用的冷数据块判断方法(如,根据最近访问时间,长时间没有访问的数据标记为冷数据),找出目前需要编码的5个冷数据块子集SC={D0,D1,...,D4}。Step 1: Assume that a common cold data block judgment method is adopted (for example, according to the latest access time, data that has not been accessed for a long time is marked as cold data), and find out the 5 cold data block subsets that need to be coded at present SC={D 0 ,D 1 ,...,D 4 }.
第二步:对子集SC包含的数据块分别进行随机映射,让不同的数据块映射到不同的存储节点或机柜,记录目标存储节点的ID值。Step 2: Randomly map the data blocks included in the subset SC, so that different data blocks are mapped to different storage nodes or cabinets, and record the ID value of the target storage node.
第三步:利用目标存储节点的失效率(失效率可由相关物理参数估计出)对冷数据块进行排序,设排序后为{D0,D1,...,D4};D0为最易失效数据块,D4为最不易失效数据块。Step 3: Use the failure rate of the target storage node (the failure rate can be estimated by the relevant physical parameters) to sort the cold data blocks, and set it to be {D 0 ,D 1 ,...,D 4 } after sorting; D 0 is The data block most prone to failure, D 4 is the data block least likely to fail.
第四步:把数据块分为2个分组。即将最前面2个数据块(即D0~D1)放入第一组,在将随后的3个数据块(即D2~D4)放入第二组。Step 4: Divide the data block into 2 groups. That is, put the first two data blocks (ie, D 0 to D 1 ) into the first group, and put the next three data blocks (ie, D 2 to D 4 ) into the second group.
第五步:构建校验码生成矩阵G。构建G关键是在有限域空间选择有效的满足式(3)的编码系数a,b。具体的选择方法是采用一种交错的方式:让第一组的编码系数的高位为0,低位为1,而第二组编码系数的高位为1,低位为0。因此,在有限域GF(25)空间中选择系数:a0=00011,a1=00010,b0=11000,b1=10000,b2=01000。以后每轮数据块编码都用这些系数。Step 5: Construct the check code generation matrix G. The key to constructing G is to select effective coding coefficients a and b satisfying formula (3) in the finite field space. The specific selection method is to use an interleaving method: let the high bit of the first group of coding coefficients be 0 and the low bit be 1, while the high bit of the second group of coding coefficients is 1 and the low bit is 0. Therefore, the coefficients are selected in the finite field GF(2 5 ) space: a 0 =00011, a 1 =00010, b 0 =11000, b 1 =10000, b 2 =01000. These coefficients are used in each subsequent round of data block encoding.
则校验数据块生成矩阵为:Then the check data block generation matrix is:
则生成校验数据块的方程组为:Then the equations for generating the check data block are:
图1显示了UFP-LRC(2,2,2)的编码后的分组示意图。Figure 1 shows a schematic diagram of the encoded grouping of UFP-LRC(2,2,2).
第六步:按照式(1),利用G生成冗余的校验数据块。具体编码时,是采用循环方式每次选取对每个数据块的一小块字节进行编码处理(如从每个数据块D中取16个bits),以保证编码速度,则编码运算在GF(216)上进行;最后得到编码后的数据块集合将它们部署到相应的存储节点,所有数据块D根据已记录的存储节点ID值进行部署,校验数据块按随机映射方法或其它常用映射方法进行部署。Step 6: According to formula (1), use G to generate redundant check data blocks. When encoding specifically, a small block of bytes of each data block is selected each time in a cyclic manner (for example, 16 bits are taken from each data block D) to ensure the encoding speed, and the encoding operation is performed in GF (2 16 ); finally get the encoded data block set Deploy them to corresponding storage nodes, all data blocks D are deployed according to the recorded storage node ID values, and check data blocks are deployed according to random mapping method or other commonly used mapping methods.
第七步:当某些存储节点失效时,设数据块D0,D1,D2都丢失,校验数据块p0丢失时,将失效的校验数据块p0对应的方程删除,再从剩余的线性方程中仍然可以找出3个方程构建一个新的方程组,D3,D4作为常数,解出下面式(6)的方程组即可恢复出D0,D1,D2。Step 7: When some storage nodes fail, assume that the data blocks D 0 , D 1 , and D 2 are all lost, and when the verification data block p 0 is lost, delete the equation corresponding to the failed verification data block p 0 , and then From the remaining linear equations, 3 equations can still be found to construct a new equation system, D 3 , D 4 are used as constants, and D 0 , D 1 , D 2 can be recovered by solving the equation system of the following formula (6) .
即 which is
其对应的系数矩阵G'如式(7)所示。由于有效地选择了编码系数,可以保证系数矩阵的向量是线性无关的,即|G'|=a0a1(a1-a0)≠0,故式(7)具有逆矩阵,式(6)有唯一确定解。Its corresponding coefficient matrix G' is shown in formula (7). Since the coding coefficients are effectively selected, the vector of the coefficient matrix can be guaranteed to be linearly independent, that is, |G'|=a 0 a 1 (a 1 -a 0 )≠0, so formula (7) has an inverse matrix, and formula ( 6) There is a unique definite solution.
即如式(8)所示,通过求逆矩阵可以恢复原数据D0,D1,D2。其它情况下的数据丢失采用类似的方法进行恢复。That is, as shown in formula (8), the original data D 0 , D 1 , and D 2 can be recovered by calculating the inverse matrix. Data loss in other situations can be recovered in a similar way.
相对于现有的RS(Reed-Solomon)、LRC编码技术,本发明改进了数据的分组方案,实现分组大小不均匀,并把易失效的数据块放在较小的组,由于同时采用本地校验数据块和全局校验数据块对数据进行保护,将带来两方面的优势:Compared with the existing RS (Reed-Solomon) and LRC coding techniques, the present invention improves the data grouping scheme, realizes that the grouping size is uneven, and puts data blocks that are prone to failure into smaller groups. The verification data block and the global verification data block are used to protect the data, which will bring two advantages:
(1)可以使得易失效的数据块在丢失时,消耗少量恢复开销即可恢复数据,从而总体上降低恢复失效数据的恢复开销。(1) When a data block that is prone to failure is lost, the data can be recovered with a small amount of recovery cost, thereby reducing the recovery cost of recovering invalid data as a whole.
(2)可以使得易失效数据的容删除能力更好,能抵抗更多的数据失效模式,从而总体上提高数据恢复成功率和存储系统的可靠性。(2) It can make the deletion capacity of vulnerable data better, and can resist more data failure modes, thereby improving the success rate of data recovery and the reliability of the storage system as a whole.
UFP-LRC比经典的RS编码,以及在顶级会议2012 USENIX ATC(现产品化在windowsserver中)上面发表的LRC方法,都要具有更好的恢复性能和可用性。以下用理论数值分析和实验数据来证明本发明的优点。UFP-LRC has better recovery performance and availability than the classic RS code, as well as the LRC method published at the top conference 2012 USENIX ATC (now productized in windows server). The advantages of the present invention are demonstrated below with theoretical numerical analysis and experimental data.
首先,在理论数值分析方面,主要是两个性能指标:(1)平均恢复开销:指恢复一个丢失的数据块的所需要下载/获取数据块的平均数量(指网络流量开销,计算延时相对网络延时几乎可以忽略);(2)可用性:当有一些存储节点失效时,单个数据块能成功恢复/获取的概率。First of all, in terms of theoretical numerical analysis, there are mainly two performance indicators: (1) Average recovery overhead: refers to the average number of data blocks that need to be downloaded/obtained to recover a lost data block (referring to network traffic overhead, and the calculation delay is relatively Network delay is almost negligible); (2) Availability: When some storage nodes fail, the probability that a single data block can be successfully recovered/acquired.
图2首先对相同存储开销下的UFP-LRC和LRC编码方法进行对比。X轴的失效率比值表示高失效率存储节点与低失效率存储节点的平均失效率之比。UFP-LRC(2,3,2)与LRC(9,3,2)具有相同的数据块、校验数据块和存储开销。UFP-LRC(1,3,2)与LRC(6,3,2)具有相同的数据块、校验数据块和存储开销。但是随着失效率比值的逐渐增大(如超过2时,微软的研究显示通常存储节点最大失效率是最小失效率的6-7倍),即各个存储节点之间的失效率差别越大时,UFP-LRC明显比LRC具有更小的恢复开销。Figure 2 first compares UFP-LRC and LRC encoding methods under the same storage overhead. The failure rate ratio on the X-axis represents the ratio of the average failure rate of storage nodes with a high failure rate to storage nodes with a low failure rate. UFP-LRC(2,3,2) and LRC(9,3,2) have the same data block, parity data block and storage overhead. UFP-LRC(1,3,2) and LRC(6,3,2) have the same data block, parity data block and storage overhead. However, as the failure rate ratio gradually increases (for example, when it exceeds 2, Microsoft's research shows that the maximum failure rate of a storage node is usually 6-7 times the minimum failure rate), that is, when the failure rate difference between each storage node is greater , UFP-LRC has significantly smaller recovery overhead than LRC.
图3显示了多种参数组合情况下的对比。在失效率比值ρ变大时,UFP-LRC在各种参数情况下的恢复开销都比LRC要低(即相同存储开销时,会有更低的恢复开销)。Figure 3 shows the comparison of various parameter combinations. When the failure rate ratio ρ becomes larger, the recovery cost of UFP-LRC is lower than that of LRC under various parameters (that is, the recovery cost is lower when the storage cost is the same).
图4显示了可用性对比情况。X轴表示目前失效的存储节点比例。UFP-LRC比RS的可用性要高得多,而且相同条件下和LRC的可用性基本一样,差距只有4.00E-08,基本可以忽略不计。Figure 4 shows the usability comparison. The X-axis represents the proportion of storage nodes that are currently failed. The availability of UFP-LRC is much higher than that of RS, and the availability of LRC is basically the same under the same conditions. The gap is only 4.00E-08, which is basically negligible.
然后,在实验方面,通过搭建30个存储节点组成的真实分布式存储系统,对相同存储开销情况下的UFP-LRC(2,3,2)、LRC(9,3,2)和RS(6,3)的性能进行了分析对比。主要关注两个性能参数:(1)平均恢复开销。(2)恢复失败率:即数据恢复/解码失败次数除以总的修复次数。Then, in terms of experiments, by building a real distributed storage system composed of 30 storage nodes, UFP-LRC(2,3,2), LRC(9,3,2) and RS(6 , 3) performance analysis and comparison. We mainly focus on two performance parameters: (1) Average recovery overhead. (2) Recovery failure rate: the number of data recovery/decoding failures divided by the total number of repairs.
图5显示了平均恢复开销,X轴表示失效的存储节点中属于高失效率的节点比例(即失效率高于0.2),明显UFP-LRC和LRC比经典的RS编码的恢复性能要好,RS的恢复开销基本固定。而UFP-LRC总体上比LRC的恢复开销低10-12%。因为更多的失效情况发生在易失效分组内,这些分组的数据的恢复开销更少,所以总体上恢复开销更少。Figure 5 shows the average recovery cost, and the X-axis indicates the proportion of nodes with high failure rate among the failed storage nodes (that is, the failure rate is higher than 0.2). It is obvious that UFP-LRC and LRC have better recovery performance than the classic RS code, and RS Restoration overhead is basically fixed. While UFP-LRC has an overall 10-12% lower recovery overhead than LRC. Since more failure cases occur in failure-prone packets, the recovery overhead for data in these packets is less, and therefore overall less recovery overhead.
图6显示了恢复失败率。由于RS(6,3)只能容忍3个数据块失效,它的失败率高于其它两个编码方法。UFP-LRC和LRC都能容忍3-5个数据块失效。随着系统中失效的存储节点数不断增加,UFP-LRC(2,3,2)的失败率逐渐要低于LRC(9,3,2),这是因为更多的失效会发生在易失效数据所在分组内,但是这些分组内的数据获得的保护更好,容删除能力更强,所以总体上可靠性更好。Figure 6 shows the recovery failure rate. Since RS(6,3) can only tolerate 3 data block failures, its failure rate is higher than that of the other two encoding methods. Both UFP-LRC and LRC can tolerate 3-5 data block failures. As the number of failed storage nodes in the system continues to increase, the failure rate of UFP-LRC (2,3,2) is gradually lower than that of LRC (9,3,2), because more failures will occur in the failure-prone The data is in the group, but the data in these groups is better protected, and the ability to delete is stronger, so the overall reliability is better.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610743775.7A CN106100801B (en) | 2016-08-29 | 2016-08-29 | A non-uniform erasure coding method for cloud storage system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610743775.7A CN106100801B (en) | 2016-08-29 | 2016-08-29 | A non-uniform erasure coding method for cloud storage system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106100801A true CN106100801A (en) | 2016-11-09 |
| CN106100801B CN106100801B (en) | 2019-04-19 |
Family
ID=57223812
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610743775.7A Active CN106100801B (en) | 2016-08-29 | 2016-08-29 | A non-uniform erasure coding method for cloud storage system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106100801B (en) |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106383664A (en) * | 2016-08-31 | 2017-02-08 | 北京小米移动软件有限公司 | Data storage method and apparatus |
| CN106648471A (en) * | 2016-12-29 | 2017-05-10 | 深圳市中博科创信息技术有限公司 | Cloud platform data storage method and device |
| CN106788455A (en) * | 2016-11-29 | 2017-05-31 | 陕西尚品信息科技有限公司 | A kind of building method of the optimal partial repairable system code based on bag |
| CN106788891A (en) * | 2016-12-16 | 2017-05-31 | 陕西尚品信息科技有限公司 | A kind of optimal partial suitable for distributed storage repairs code constructing method |
| CN107656832A (en) * | 2017-09-18 | 2018-02-02 | 华中科技大学 | A kind of correcting and eleting codes method of low data reconstruction expense |
| CN107844272A (en) * | 2017-10-31 | 2018-03-27 | 成都信息工程大学 | A kind of cross-packet coding and decoding method for improving error correcting capability |
| CN109151054A (en) * | 2018-09-21 | 2019-01-04 | 长安大学 | A kind of building method of layer code and the restorative procedure of malfunctioning node |
| CN109947587A (en) * | 2019-02-20 | 2019-06-28 | 长安大学 | Construction method and fault repair method of group repair code for non-uniform fault protection |
| CN110839001A (en) * | 2018-08-15 | 2020-02-25 | 中国移动通信集团重庆有限公司 | Apparatus, method, apparatus and medium for processing batch files |
| CN110968454A (en) * | 2018-09-28 | 2020-04-07 | 杭州海康威视系统技术有限公司 | Method and apparatus for determining recovery data for lost data blocks |
| CN111585756A (en) * | 2020-04-30 | 2020-08-25 | 西安建筑科技大学 | A certificateless cloud auditing method suitable for multi-copy-multi-cloud scenarios |
| CN112000278A (en) * | 2020-07-23 | 2020-11-27 | 哈尔滨工业大学(深圳) | An adaptive local reconstruction code design method and cloud storage system for hot data storage |
| CN113296999A (en) * | 2021-05-20 | 2021-08-24 | 山东云海国创云计算装备产业创新中心有限公司 | RAID6 encoding method and encoding circuit |
| CN114116295A (en) * | 2021-10-25 | 2022-03-01 | 长三角信息智能创新研究院 | Inter-block 4 erasure coding method for real-time transmission of big data |
| CN114331395A (en) * | 2021-12-22 | 2022-04-12 | 南京航空航天大学 | Erasure code based block chain data grouping storage optimization structure and method |
| CN115793984A (en) * | 2023-01-03 | 2023-03-14 | 苏州浪潮智能科技有限公司 | Data storage method and device, computer equipment and storage medium |
| CN116703651A (en) * | 2023-08-08 | 2023-09-05 | 成都秦川物联网科技股份有限公司 | A smart gas data center operation management method, Internet of Things system and medium |
| CN121050662A (en) * | 2025-11-03 | 2025-12-02 | 苏州元脑智能科技有限公司 | Erasure coding method of distributed storage system and electronic equipment |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070133691A1 (en) * | 2005-11-29 | 2007-06-14 | Docomo Communications Laboratories Usa, Inc. | Method and apparatus for layered rateless coding |
| CN103944678A (en) * | 2014-04-21 | 2014-07-23 | 浙江大学 | Fountain-code encoding method with feedback and unequal error protection capacity |
-
2016
- 2016-08-29 CN CN201610743775.7A patent/CN106100801B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070133691A1 (en) * | 2005-11-29 | 2007-06-14 | Docomo Communications Laboratories Usa, Inc. | Method and apparatus for layered rateless coding |
| CN103944678A (en) * | 2014-04-21 | 2014-07-23 | 浙江大学 | Fountain-code encoding method with feedback and unequal error protection capacity |
Non-Patent Citations (2)
| Title |
|---|
| XIAODAN SONG,ET AL.,: ""UNEQUAL ERROR PROTECTION FOR SCALABLE VIDEO STORAGE IN THE CLOUD"", 《2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO(ICME)》 * |
| 程振东等,: ""云文件系统中纠删码技术的研究与实现"", 《计算机科学与探索》 * |
Cited By (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106383664A (en) * | 2016-08-31 | 2017-02-08 | 北京小米移动软件有限公司 | Data storage method and apparatus |
| CN106383664B (en) * | 2016-08-31 | 2019-12-03 | 北京小米移动软件有限公司 | Data storage method and device |
| CN106788455A (en) * | 2016-11-29 | 2017-05-31 | 陕西尚品信息科技有限公司 | A kind of building method of the optimal partial repairable system code based on bag |
| CN106788455B (en) * | 2016-11-29 | 2019-11-22 | 陕西尚品信息科技有限公司 | A kind of building method of the optimal partial repairable system code based on packet |
| CN106788891A (en) * | 2016-12-16 | 2017-05-31 | 陕西尚品信息科技有限公司 | A kind of optimal partial suitable for distributed storage repairs code constructing method |
| CN106648471A (en) * | 2016-12-29 | 2017-05-10 | 深圳市中博科创信息技术有限公司 | Cloud platform data storage method and device |
| CN107656832A (en) * | 2017-09-18 | 2018-02-02 | 华中科技大学 | A kind of correcting and eleting codes method of low data reconstruction expense |
| CN107844272A (en) * | 2017-10-31 | 2018-03-27 | 成都信息工程大学 | A kind of cross-packet coding and decoding method for improving error correcting capability |
| CN110839001A (en) * | 2018-08-15 | 2020-02-25 | 中国移动通信集团重庆有限公司 | Apparatus, method, apparatus and medium for processing batch files |
| CN109151054A (en) * | 2018-09-21 | 2019-01-04 | 长安大学 | A kind of building method of layer code and the restorative procedure of malfunctioning node |
| CN109151054B (en) * | 2018-09-21 | 2021-03-23 | 长安大学 | A method for constructing a hierarchical code and a method for repairing faulty nodes |
| CN110968454A (en) * | 2018-09-28 | 2020-04-07 | 杭州海康威视系统技术有限公司 | Method and apparatus for determining recovery data for lost data blocks |
| CN110968454B (en) * | 2018-09-28 | 2022-09-09 | 杭州海康威视系统技术有限公司 | Method and apparatus for determining recovery data for lost data blocks |
| CN109947587A (en) * | 2019-02-20 | 2019-06-28 | 长安大学 | Construction method and fault repair method of group repair code for non-uniform fault protection |
| CN111585756A (en) * | 2020-04-30 | 2020-08-25 | 西安建筑科技大学 | A certificateless cloud auditing method suitable for multi-copy-multi-cloud scenarios |
| CN112000278A (en) * | 2020-07-23 | 2020-11-27 | 哈尔滨工业大学(深圳) | An adaptive local reconstruction code design method and cloud storage system for hot data storage |
| CN112000278B (en) * | 2020-07-23 | 2023-07-25 | 哈尔滨工业大学(深圳) | Self-adaptive local reconstruction code design method for thermal data storage and cloud storage system |
| CN113296999A (en) * | 2021-05-20 | 2021-08-24 | 山东云海国创云计算装备产业创新中心有限公司 | RAID6 encoding method and encoding circuit |
| CN114116295A (en) * | 2021-10-25 | 2022-03-01 | 长三角信息智能创新研究院 | Inter-block 4 erasure coding method for real-time transmission of big data |
| CN114331395A (en) * | 2021-12-22 | 2022-04-12 | 南京航空航天大学 | Erasure code based block chain data grouping storage optimization structure and method |
| CN115793984B (en) * | 2023-01-03 | 2023-04-28 | 苏州浪潮智能科技有限公司 | Data storage method, device, computer equipment and storage medium |
| CN115793984A (en) * | 2023-01-03 | 2023-03-14 | 苏州浪潮智能科技有限公司 | Data storage method and device, computer equipment and storage medium |
| CN116703651A (en) * | 2023-08-08 | 2023-09-05 | 成都秦川物联网科技股份有限公司 | A smart gas data center operation management method, Internet of Things system and medium |
| CN116703651B (en) * | 2023-08-08 | 2023-11-14 | 成都秦川物联网科技股份有限公司 | Intelligent gas data center operation management method, internet of things system and medium |
| US12120179B2 (en) | 2023-08-08 | 2024-10-15 | Chengdu Qinchuan Iot Technology Co., Ltd. | Methods and internet of things (IoT) systems for operation and management of smart gas data centers |
| US12483624B2 (en) | 2023-08-08 | 2025-11-25 | Chengdu Qinchuan Iot Technology Co., Ltd. | Method and internet of things (IoT) system for managing gas data |
| CN121050662A (en) * | 2025-11-03 | 2025-12-02 | 苏州元脑智能科技有限公司 | Erasure coding method of distributed storage system and electronic equipment |
| CN121050662B (en) * | 2025-11-03 | 2026-01-30 | 苏州元脑智能科技有限公司 | An erasure coding method and electronic device for a distributed storage system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106100801B (en) | 2019-04-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106100801A (en) | A kind of non-homogeneous erasure code method of cloud storage system | |
| US20220368457A1 (en) | Distributed Storage System Data Management And Security | |
| US10270468B2 (en) | Method for file updating and version control for linear erasure coded and network coded storage | |
| CN103336785B (en) | A kind of distributed storage method based on network code and device thereof | |
| CN101339524A (en) | Disk Fault Tolerance Method for Large-Scale Disk Array Storage System | |
| CN110750382A (en) | Minimum storage regeneration code coding method and system for improving data repair performance | |
| CN110442535A (en) | Method and system for improving reliability of distributed solid-state disk key-value cache system | |
| US20170083603A1 (en) | Co-derived data storage patterns for distributed storage systems | |
| EP3635554B1 (en) | Layered error correction encoding for large scale distributed object storage system | |
| CN102843212B (en) | Coding and decoding processing method and device | |
| WO2012007715A2 (en) | Distributed data storage and recovery | |
| WO2020029423A1 (en) | Construction method and repair method for repairing binary array code check matrix | |
| Ivanichkina et al. | Mathematical methods and models of improving data storage reliability including those based on finite field theory | |
| CN104268031A (en) | Erasure code configuration method for solid state disk in disk array storage system | |
| CN107665152B (en) | Decoding method of erasure code | |
| WO2020029418A1 (en) | Method for constructing repair binary code generator matrix and repair method | |
| CN104572987B (en) | A kind of method and system that simple regeneration code storage efficiency is improved by compressing | |
| WO2018165943A1 (en) | Storage controller, data processing chip, and data processing method | |
| CN111984443A (en) | Encoding method, decoding method and corresponding devices in distributed system environment | |
| CN105786656B (en) | Redundant array of independent disks disaster tolerance storage method based on random matrix | |
| CN117370067A (en) | Data layout and coding method of large-scale object storage system | |
| CN104866386A (en) | Encoding and decoding method with optimal update cost | |
| Wei et al. | expanCodes: Tailored LDPC codes for big data storage | |
| CN112905387B (en) | A RAID6 encoding and data recovery method based on the encoding | |
| CN110781163B (en) | Construction of Heterogeneous Partial Repetition Codes Based on Complete Graph and Repair Method of Faulty Nodes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |