WO2013163832A1

WO2013163832A1 - Cloud storage method and device

Info

Publication number: WO2013163832A1
Application number: PCT/CN2012/075841
Authority: WO
Inventors: 王东临
Original assignee: Tianjin Sursen Investment Co Ltd
Current assignee: Tianjin Sursen Investment Co Ltd
Priority date: 2012-05-02
Filing date: 2012-05-22
Publication date: 2013-11-07
Anticipated expiration: 2014-11-02
Also published as: CN103384256A

Description

一种云存储方法及装置 Cloud storage method and device

技术领域 Technical field

本发明涉及互联网领域，特别涉及一种云存储方法及装置。背景技术 The present invention relates to the field of the Internet, and in particular, to a cloud storage method and apparatus. Background technique

云存储（cloud storage ) 这个概念一经提出，就得到了众多厂商的支持和关注。云存储的本质是将海量数据存储在云端，各客户端通过互联网访问云端海量数据。但云端海量数据如何在云端存储，其实是云存储的一个本质问题。 Once the concept of cloud storage has been proposed, it has received support and attention from many vendors. The essence of cloud storage is to store massive amounts of data in the cloud, and each client accesses massive amounts of data in the cloud through the Internet. But how the massive data in the cloud is stored in the cloud is actually an essential issue of cloud storage.

目前很多云存储提供商在云端存储海量数据的方式为：为不同的用户分配一个相对独立的空间，不同用户的数据存放在相对的空间内。当数据量足够大时，云端会有很多重复数据，利用这种方式存储，会导致很多数据的重复存储。这种存储方式非常低效。发明内容 At present, many cloud storage providers store massive amounts of data in the cloud by assigning a relatively independent space to different users, and data of different users is stored in a relative space. When the amount of data is large enough, there will be a lot of duplicate data in the cloud. Using this method of storage will result in repeated storage of many data. This type of storage is very inefficient. Summary of the invention

本发明实施例提供了一种云存储方法及装置，以提供海量数据的高效存储。 Embodiments of the present invention provide a cloud storage method and apparatus to provide efficient storage of massive data.

本发明实施例提到的一种云存储方法，包括： A cloud storage method mentioned in the embodiment of the present invention includes:

计算文件的 Hash值，将文件的 Hash值转换为字符串作为文件名；根据预先定义的规则，利用文件 Hash值计算文件的存储路径；在索引表中查找所述文件的存储路径的实际存储位置，在所述实际存储位置中存储所述文件；其中，所述索引表中已经预先存储了所有可能的存储路径与存储磁盘实际位置的映射关系表。 Calculate the hash value of the file, convert the hash value of the file into a string as the file name; calculate the storage path of the file by using the file hash value according to a predefined rule; and find the actual storage location of the storage path of the file in the index table The file is stored in the actual storage location. The mapping table of all possible storage paths and the actual location of the storage disk has been pre-stored in the index table.

本发明实施例提到的一种云存储装置，包括： A cloud storage device mentioned in the embodiment of the present invention includes:

第一模块，用于根据预定义的 hash值算法，计算文件的 Hash值；第二模块，用于根据第一模块计算的文件的 Hash值，转换为字符串作为文件名； a first module, configured to calculate a hash value of the file according to a predefined hash value algorithm; a second module, configured to convert the hash value of the file calculated according to the first module into a string Is the file name;

第三模块，用于利用第一模块计算的文件的 Hash值，根据预定义的算法，计算文件的存储路径； a third module, configured to calculate a storage path of the file according to a predefined algorithm by using a hash value of the file calculated by the first module;

第四模块，用于存储了所有可能的存储路径与存储磁盘实际位置的映射关系表； The fourth module is configured to store a mapping table of all possible storage paths and actual locations of the storage disks;

第五模块，用于根据第三模块计算出来的文件的存储路径，在第四模块中查找所述文件的存储路径的实际存储位置； a fifth module, configured to search, according to a storage path of the file calculated by the third module, an actual storage location of the storage path of the file in the fourth module;

第六模块，用于将文件存储在第五模块查找到的存储磁盘上存储所述文件。 And a sixth module, configured to store the file on a storage disk found by the fifth module to store the file.

利用本发明实施例提供的一种云存储方法及装置存储海量数据存储时，由于是利用 hash值的文件名存储，这样可以保证海量数据的不重复存储。根据 hash值计算文件的存储路径，又可以保证将数量数据均勾地存放在各存储服务器上，保证系统的存储均衡，即使云存储系统的存储服务器无限扩展时，利用该方式也可以高效地管理文档的存储。附图说明 When a cloud storage method and device provided by the embodiment of the present invention stores a large amount of data storage, since the file name storage using the hash value is used, the non-repetitive storage of the massive data can be ensured. Calculating the storage path of the file according to the hash value, and ensuring that the quantity data is stored on each storage server to ensure the storage balance of the system. Even if the storage server of the cloud storage system is infinitely expanded, the method can be efficiently managed. The storage of the document. DRAWINGS

图 1所示为本发明实施例提供的一种云存储方法的流程图。 FIG. 1 is a flowchart of a cloud storage method according to an embodiment of the present invention.

图 2所示为本发明实施例提供的一种云存储装置的结构示意图。具体实施方式以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用于解释本发明，并不用于限定本发明。 FIG. 2 is a schematic structural diagram of a cloud storage device according to an embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be further described in detail with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

图 1所示为本发明实施例提供的一种云存储方法的流程图。如图 1所示，当将文件存储在云端时，该方法包括如下步骤： FIG. 1 is a flowchart of a cloud storage method according to an embodiment of the present invention. As shown in FIG. 1, when the file is stored in the cloud, the method includes the following steps:

步骤 101 : 计算文件的 Hash值，将文件的 Hash值转换为字符串作为文件名。 Step 101: Calculate the Hash value of the file, and convert the Hash value of the file into a string as the file name.

这里，可以根据系统的配置选择不同的 Hash值算法。比如，可以采取 MD2、 MD4、 MD5 和 SHA-1 等算法。在本发明一实施例中，可以采取双 Hash算法作为文件名，即利用两种不同的 Hash算法计算文件的 Hash值，将两个 Hash值连接起来作为文件的 Hash值。 Here, different hash value algorithms can be selected according to the configuration of the system. For example, can take Algorithms such as MD2, MD4, MD5, and SHA-1. In an embodiment of the present invention, a double hash algorithm may be adopted as the file name, that is, two different hash algorithms are used to calculate the hash value of the file, and the two hash values are connected as the hash value of the file.

在本发明一实施例中，将文件的 Hash值转换为 36进制字符串作为文件名，比如转换后的文件名为 HiHzHg ...... HNO In an embodiment of the invention, the Hash value of the file is converted into a hexadecimal string as a file name, for example, the converted file name is HiHzHg ...... HNO

步骤 102: 根据预先定义的规则，利用文件 Hash值计算文件的存储路径。 Step 102: Calculate the storage path of the file by using the file Hash value according to a predefined rule.

在本发明一实施例中，预定义的规则可以是：文件的存储路径由二级目录构成；比如，可以直接将文件名的第一二位作为存储该文件的第一级目录，将文件名的第三四位作为存储该文件的第二级目录。 In an embodiment of the present invention, the predefined rule may be: The storage path of the file is composed of a secondary directory; for example, the first two bits of the file name may be directly used as a first-level directory for storing the file, and the file name is The third four bits are used as the second level directory for storing the file.

比如，在本例中，将作为文件的第一级目录，将 H₃H₄作为文件的第二级目录，即文件名为 H₂H₃...... H_N的存储路径为 11₂\ 11₄。 For example, in this example, it will be the first level directory of the file, and H ₃ H ₄ will be used as the second level directory of the file, that is, the file name is H ₂ H ₃ ... H _{N has} a storage path of 11 ₂ \ 11 ₄ .

当以 36进制字符串作为文件名时，系统理论上最大可包含 36²=1296个一级目录， 36²*36²=1679616个二级目录。 When using a 36-character string as the file name, the system can theoretically contain up to 36 ² = 1296 first-level directories, 36 ² * 36 ² = 1679616 secondary directories.

通常对于一个 Linux系统而言，每个子目录下可以管理至少 1万个文件，那么利用本发明实施例提供的两位 36进制二级目录存储的方式，理论上可以存储管理超过 100亿的文档。 Generally, for a Linux system, at least 10,000 files can be managed in each subdirectory, and the two-digit 36-level secondary directory storage provided by the embodiment of the present invention can theoretically store and manage more than 10 billion documents. .

步骤 103 : 在索引表中查找所述文件的存储路径的实际存储位置，在所述实际存储位置中存储所述文件。其中，在所索引表中已经预先存储了所有可能的存储路径与存储磁盘实际位置的映射关系表。 Step 103: Find an actual storage location of a storage path of the file in an index table, and store the file in the actual storage location. The mapping table of all possible storage paths and the actual location of the storage disk has been pre-stored in the index table.

比如，继续以上述为例，索引表中已经记载有 1679616条存储路径（二级目录）所对应的存储磁盘位置，比如记录某一个二级目录位于某个存储服务器的哪个卷。通常这样的索引表大概只需要几兆的空间，而且可以是数组形式，通过下标就可以直接获取。具体来说，该索引表中记录有一级目录名为 AB , 二级目录名为 CD的路径存放在第一磁盘的第 3个逻辑卷上。 For example, continue to use the above as an example. The storage disk location corresponding to the 1679616 storage path (second-level directory) is already recorded in the index table, for example, which volume of a storage directory is located in a secondary directory. Usually such an index table only needs a few megabytes of space, and can be in the form of a group, which can be directly obtained by subscripting. Specifically, the index table records a primary directory named AB, and the secondary directory named CD is stored on the third logical volume of the first disk.

在该索引表中记录的该文件存储路径对应的存储磁盘上存储所述文件。步骤 104: 当查找（调用）某一文件时，根据同样的预定义规则，利用文件 Hash值计算待查找文件的文件名和存储路径； The file is stored on a storage disk corresponding to the file storage path recorded in the index table. Step 104: When looking up (calling) a file, according to the same predefined rules, use The file hash value calculates the file name and storage path of the file to be searched;

比如，在本发明一实施例中，提取文件名的前四位，得到该文件的一级存储目录和二级存储目录。 For example, in an embodiment of the present invention, the first four digits of the file name are extracted, and the primary storage directory and the secondary storage directory of the file are obtained.

步骤 105: 根据计算得到的所述待查找文件的存储路径，在索引表中找到所述待查找文件对应的物理磁盘； Step 105: Find a physical disk corresponding to the file to be searched in the index table according to the calculated storage path of the file to be searched;

步骤 106: 根据所述待查找文件的文件名，在其存储路径对应的物理磁盘中找到所述待查找文件。 Step 106: Find the file to be searched in the physical disk corresponding to the storage path according to the file name of the file to be searched.

这里，之所以预先在索引表中记录所有可能的存储路径的实际存储位置，是为了加快存储和查找的速度。 Here, the actual storage location of all possible storage paths is recorded in the index table in advance in order to speed up the storage and lookup.

当扩展云存储系统的存储服务器时，将现有存储服务器中的部分目录迁移到扩展的存储服务器上，并同时更新索引表中的记录。 When expanding the storage server of the cloud storage system, some of the directories in the existing storage server are migrated to the extended storage server, and the records in the index table are also updated.

在实际操作过程中，通常是一个存储服务器上携带多个存储磁盘。当在当前存储服务器上增加存储磁盘时，将现有存储磁盘上的部分目录拷贝到新的存储磁盘。当增加新的存储服务器时，可以是将原有存储服务器上的部分磁盘直接插入到新的存储服务器中；虽然原有的存储服务器中还需要增加新的存储磁盘，还需要从存储服务器中剩下的存储磁盘拷贝到新增磁盘上，但拷贝。 In actual operation, usually one storage server carries multiple storage disks. When you add a storage disk to the current storage server, copy some of the directories on the existing storage disk to the new storage disk. When adding a new storage server, you can insert some of the disks on the original storage server directly into the new storage server. Although the original storage server needs to add new storage disks, you need to leave it from the storage server. Copy the storage disk to the new disk, but copy it.

本发明实施例提供的存储方法可用于海量数据存储。当利用这种方式存储海量数据存储时，由于是利用 hash值的文件名存储，这样可以保证海量数据的不重复存储。另外，当数据量足够大时，根据 hash值计算文件的存储路径，又可以保证将数量数据均勾地存放在各存储服务器上，保证系统的存储均衡，即使云存储系统的存储服务器无限扩展时，利用该方式也可以高效地管理文档的存储。另外，可以将索引表存储在网站服务器中，这样当用户需要查找一个文件时，也可以快速地在海量服务器中查找到这个文件。 The storage method provided by the embodiment of the present invention can be used for mass data storage. When storing a large amount of data storage in this way, since the file name storage using the hash value is used, it is possible to ensure the non-repetitive storage of the massive data. In addition, when the amount of data is large enough, the storage path of the file is calculated according to the hash value, and the quantity data is stored on each storage server to ensure the storage balance of the system, even if the storage server of the cloud storage system is infinitely expanded. This way, you can also efficiently manage the storage of documents. In addition, the index table can be stored in the web server, so that when the user needs to find a file, the file can be quickly found in the mass server.

在本发明另一实施例中，可以根据系统所需要支持的存储量级，调整系统的参数设置。比如，可以将文件的 Hash值转换为 10进制，这种情况下，在上一实施例其他参数不变的情况下，系统理论上可以支持的二级目录数为： 10²*10²=10⁴。又或者可以是将 Hash值的第一位作为一级目录，第二三四位作为二级目录，同样在上一实施例其他参数不变的情况下，系统理论上可以支持的二级目录数依然为： 36*36³=1679616。或者，也可以分三级存储，比如，将 Hash值的第一位作为一级目录，第二三位作为二级目录，将第四位作为三级目录等。总之，本领域技术人员可以根据上次例子，依据系统需要，自己配置系统的参数。 In another embodiment of the present invention, the parameter settings of the system can be adjusted according to the storage level that the system needs to support. For example, you can convert the hash value of a file to decimal, in which case In the case where the other parameters of the previous embodiment are unchanged, the number of secondary directories that the system can theoretically support is: 10 ² *10 ² =10 ⁴ . Alternatively, the first bit of the hash value may be used as the primary directory, and the second and fourth digits may be used as the secondary directory. Similarly, in the case where the other parameters of the previous embodiment are unchanged, the number of secondary directories that the system can theoretically support may be Still: 36*36 ³ =1679616. Alternatively, it may be stored in three levels, for example, the first bit of the hash value is used as the primary directory, the second three is used as the secondary directory, and the fourth is used as the tertiary directory. In summary, those skilled in the art can configure the parameters of the system according to the needs of the system according to the previous example.

图 2所示为本发明实施例提供的一种云存储装置的结构示意图。如图 2 所示，该云存储装置运行在具有处理器、存储模块的装置上。该装置包括： FIG. 2 is a schematic structural diagram of a cloud storage device according to an embodiment of the present invention. As shown in FIG. 2, the cloud storage device runs on a device having a processor and a storage module. The device includes:

Hash值计算模块，用于根据预定义的 hash值算法，计算文件的 Hash 值； a Hash value calculation module, configured to calculate a Hash value of the file according to a predefined hash value algorithm;

文件名计算模块，用于根据 Hash值计算模块计算的文件的 Hash值，转换为字符串作为文件名； a file name calculation module, configured to calculate a Hash value of the file calculated by the module according to the Hash value, and convert the file into a string as a file name;

存储路径计算模块，用于利用 Hash值计算模块计算的文件的 Hash值，根据预定义的算法，计算文件的存储路径； a storage path calculation module, configured to calculate a Hash value of the file calculated by the Hash value calculation module, and calculate a storage path of the file according to a predefined algorithm;

索 )表模块，用于存储了所有可能的存储路径与存储磁盘实际位置的映射关系表； a table module for storing a mapping table of all possible storage paths and the actual location of the storage disk;

索引表查找模块，用于在存储模块的调用下，根据存储路径计算模块计算出来的文件的存储路径，在索引表模块中查找所述文件的存储路径的实际存储位置； An index table searching module, configured to search, according to a storage module, a storage path of the file calculated by the storage path calculation module, and find an actual storage location of the storage path of the file in the index table module;

存储模块，用于调用索引表查找模块，将文件存储在索引表查找模块返回的存储磁盘上存储所述文件。 The storage module is configured to invoke an index table lookup module, and store the file on a storage disk returned by the index table lookup module to store the file.

当查找（调用 ) 某一文件时， Hash 值计算模块计算待查找的文件名，文件名计算模块计算待查找文件的文件名，存储路径计算模块计算该待查找文件的存储路径，索引表查找模块根据存储路径计算模块计算的待查找文件的存储路径，在索引表模块中找到所述待查找文件对应的物理磁盘，此时，该云存储装置进一步包括：查找模块，用于根据存储路径计算模块计算的待查找文件的存储路径，调用索引表查找模块；根据文件名计算模块计算出的待查找文件的文件名，在其存储路径对应的物理磁盘中找到所述待查找文件。 When a file is searched (called), the Hash value calculation module calculates the file name to be searched, the file name calculation module calculates the file name of the file to be searched, and the storage path calculation module calculates the storage path of the file to be searched, and the index table search module And the physical storage disk corresponding to the file to be searched is found in the index table module, and the cloud storage device further includes: a search module, configured to calculate a storage path of the file to be searched according to the storage path calculation module, and call an index table to search for a module; and calculate a file name of the file to be searched according to the file name calculation module, and find the physical disk corresponding to the storage path. The file to be found.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。 The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are included in the spirit and scope of the present invention, should be included in the present invention. Within the scope of protection.

Claims

Rights request

A cloud storage method, comprising:

Calculate the hash value of the file, convert the hash value of the file into a string as the file name; calculate the storage path of the file by using the file hash value according to a predefined rule; and find the actual storage location of the storage path of the file in the index table The file is stored in the actual storage location. The mapping table of all possible storage paths and the actual location of the storage disk has been pre-stored in the index table.

2. The method according to claim 1, wherein when searching for a file, further comprising:

Calculate the file name and storage path of the file to be searched by using the file Hash value according to the same predefined rule;

And finding, according to the calculated storage path of the file to be searched, a physical disk corresponding to the file to be searched in the index table;

The file to be found is found in the physical disk corresponding to the storage path according to the file name of the file to be searched.

3. The method according to claim 1 or 2, wherein the converting the Hash value of the file into a string as the file name comprises:

Convert the Hash value of the file to a decimal or hex string as the file name.

The method according to claim 1 or 2, wherein the predefined rule is: The storage path of the file is composed of a secondary directory.

5. The method according to claim 4, wherein the storage path of the file is composed of a secondary directory including:

The first two bits of the file name are directly used as the first level directory for storing the file, and the third and fourth bits of the file name are used as the second level directory for storing the file.

6. A cloud storage device, comprising:

a first module, configured to calculate a hash value of the file according to a predefined hash value algorithm; a second module, configured to convert a hash value of the file calculated according to the first module into a string as a file name;

a third module, configured to calculate a storage path of the file according to a predefined algorithm by using a hash value of the file calculated by the first module;

The fourth module is configured to store a mapping table of all possible storage paths and actual locations of the storage disks;

a fifth module, configured to search, according to a storage path of the file calculated by the third module, an actual storage location of the storage path of the file in the fourth module;

And a sixth module, configured to store the file on a storage disk found by the fifth module to store the file.

7. The device of claim 6, further comprising:

a seventh module, configured to: according to the storage path of the file to be searched calculated by the third module, invoke the fifth module; according to the file name of the file to be searched calculated by the second module, the physical disk corresponding to the storage path returned by the fifth module Find the file to be found in .