[go: up one dir, main page]

CN103971066A - Verification method for integrity of big data migration in HDFS - Google Patents

Verification method for integrity of big data migration in HDFS Download PDF

Info

Publication number
CN103971066A
CN103971066A CN201410212726.1A CN201410212726A CN103971066A CN 103971066 A CN103971066 A CN 103971066A CN 201410212726 A CN201410212726 A CN 201410212726A CN 103971066 A CN103971066 A CN 103971066A
Authority
CN
China
Prior art keywords
file
hdfs
new
fileinfo
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410212726.1A
Other languages
Chinese (zh)
Inventor
赵仁明
辛国茂
亓开元
房体盈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410212726.1A priority Critical patent/CN103971066A/en
Publication of CN103971066A publication Critical patent/CN103971066A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种HDFS中大数据迁移完整性验证的方法,其具体实现过程如下:获取原始HDFS文件及目录结构详细信息和迁移后的新HDFS文件信息;对原始文件信息以及新文件信息分片处理;输出新旧文件信息的对比验证和验证结果。该一种HDFS中大数据迁移完整性验证的方法和现有技术相比,不需要进行程序的编译、打包,只需要简单的脚本即可完成验证;更加突出大数据灵活、便捷的优势,使得用户可以非常快速简便的找到可能存在的不完整的数据;适用范围广泛,该方法适用于多种HDFS环境,实用性强。

The present invention provides a method for verifying the integrity of large data migration in HDFS. The specific implementation process is as follows: obtain the detailed information of the original HDFS file and directory structure and the information of the new HDFS file after migration; fragment the original file information and the new file information Processing; output the comparison verification and verification results of the old and new file information. Compared with the existing technology, this method for verifying the integrity of big data migration in HDFS does not need to compile and package programs, and only needs simple scripts to complete the verification; it highlights the advantages of flexibility and convenience of big data, making Users can find incomplete data that may exist very quickly and easily; the scope of application is wide, and this method is suitable for a variety of HDFS environments and has strong practicability.

Description

一种HDFS中大数据迁移完整性验证的方法A Method of Integrity Verification of Big Data Migration in HDFS

技术领域 technical field

本发明涉及计算机技术领域,具体的说是一种HDFS中大数据迁移完整性验证的方法。 The invention relates to the field of computer technology, in particular to a method for verifying the integrity of large data migration in HDFS.

背景技术 Background technique

大数据(big data),或称巨量资料,指的是所涉及的资料量规模巨大到无法透过目前主流软件工具,在合理时间内达到撷取、管理、处理、并整理成为帮助企业经营决策更积极目的的资讯。 Big data, or huge amount of data, refers to the amount of data involved is so large that it cannot be captured, managed, processed, and organized within a reasonable time through current mainstream software tools to help enterprises operate Information for more active purposes in decision-making.

Hadoop Distributed File System(HDFS)被设计成适合运行在通用硬件(commodity hardware)上的分布式文件系统。HDFS是一个高度容错性的系统,适合部署在廉价的机器上。HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。运行在HDFS之上的程序有很大量的数据集。典型的HDFS文件大小是TB的级别。所以,HDFS被调整成支持大文件。它应该提供很高的聚合数据带宽,一个集群中支持数百个节点,一个集群中还应该支持千万级别的文件。HDFS被设计成可以简便地实现平台间的迁移,这将推动需要大数据集的应用更广泛地采用HDFS作为平台。 Hadoop Distributed File System (HDFS) is designed as a distributed file system suitable for running on commodity hardware. HDFS is a highly fault-tolerant system suitable for deployment on cheap machines. HDFS can provide high-throughput data access and is very suitable for applications on large-scale data sets. Programs running on HDFS have very large datasets. Typical HDFS file sizes are on the order of terabytes. Therefore, HDFS is tuned to support large files. It should provide high aggregate data bandwidth, support hundreds of nodes in a cluster, and support tens of millions of files in a cluster. HDFS is designed to allow easy migration between platforms, which will drive wider adoption of HDFS as a platform for applications that require large data sets.

本技术提供了一种简便宜行的验证HDFS数据迁移之后数据完整性的方法,可以使管理员迅速、方便的验证迁移后数据是否完整、有效,并将验证结果记录到日志文件中。 This technology provides a simple and cheap method for verifying data integrity after HDFS data migration, which enables administrators to quickly and conveniently verify whether the data after migration is complete and valid, and record the verification results in log files.

发明内容 Contents of the invention

本发明的技术任务是解决现有技术的不足,提供一种HDFS中大数据迁移完整性验证的方法。 The technical task of the present invention is to solve the deficiencies of the prior art, and provide a method for verifying the integrity of large data migration in HDFS.

本发明的技术方案是按以下方式实现的,该一种HDFS中大数据迁移完整性验证的方法,其具体实现过程如下: The technical scheme of the present invention is realized in the following manner, the method for verifying the integrity of large data migration in this a kind of HDFS, its specific implementation process is as follows:

1)获取原始HDFS文件及目录结构详细信息和迁移后的新HDFS文件信息; 1) Obtain the original HDFS file and directory structure details and the new HDFS file information after migration;

2)对原始文件信息以及新文件信息分片处理; 2) Segment processing of original file information and new file information;

3)输出新旧文件信息的对比验证和验证结果。 3) Output the comparative verification and verification results of the old and new file information.

所述步骤1)的详细过程为: The detailed process of the step 1) is:

在原始的HDFS文件系统中通过执行hadoop fs –lsr / > oldInfo命令,获取原始HDFS文件的详细信息,并将结果重定向到oldInfo文件中; In the original HDFS file system, execute the hadoop fs –lsr / > oldInfo command to obtain the detailed information of the original HDFS file, and redirect the result to the oldInfo file;

在迁移后新的HDFS文件系统中,通过执行相同的命令hadoop fs –lsr / >newInfo,获取新的HDFS文件信息,并将结果重定向到newInfo文件中。 In the new HDFS file system after migration, execute the same command hadoop fs –lsr / >newInfo to obtain the new HDFS file information, and redirect the result to the newInfo file.

所述步骤2)的详细过程为:将原始文件信息oldInfo和新文件信息newInfo按相同的规则进行分片,这里的规则是指按照行数分割成相同的文件数。 The detailed process of the step 2) is: segment the original file information oldInfo and the new file information newInfo according to the same rule, where the rule refers to splitting into the same number of files according to the number of lines.

所述步骤3)的详细过程为:通过将分片后的新旧HDFS文件信息进行对应的逐个对比,将对比后的结果保存在日志文件中,这里的对比是指对比文件或文件夹的名字,以及文件的大小是否相匹配。 The detailed process of step 3) is: by comparing the new and old HDFS file information after fragmentation correspondingly one by one, the result after the comparison is saved in the log file, where the comparison refers to the name of the comparison file or folder, and whether the file size matches.

所述匹配过程为: The matching process is:

一、用旧的文件信息为基准,逐条匹配新的文件信息; 1. Using the old file information as the benchmark, match the new file information one by one;

二、若完全匹配,则取旧文件信息的下一条继续步骤二过程的匹配; 2. If it matches completely, take the next item of the old file information and continue the matching of the process of step 2;

三、若文件大小未能完全匹配上,代表该文件迁移不完整,将文件信息记录至日志文件后,继续步骤二; 3. If the file size does not match exactly, it means that the file migration is incomplete. After recording the file information to the log file, proceed to step 2;

四、若文件信息为找到,代表该文件未被迁移至新文件系统,将文件信息记录至日志文件后,继续步骤二; 4. If the file information is found, it means that the file has not been migrated to the new file system. After recording the file information to the log file, proceed to step 2;

五、当所有的旧文件信息全都被提取过一遍之后,本次完整性验证结束。 5. After all the old file information has been extracted, the integrity verification ends.

本发明与现有技术相比所产生的有益效果是: The beneficial effect that the present invention produces compared with prior art is:

本发明的一种HDFS中大数据迁移完整性验证的方法是一种高效、快速且易实施操作的对HDFS中迁移出的数据完整性验证的方法,最终实现利用该技术,高效、简便的验证迁移出的新数据的完整性,进一步减少了人工逐一进行数据验证的工作量,且大大减少了编程的工作量;不需要进行程序的编译、打包,只需要简单的脚本即可完成验证;更加突出大数据灵活、便捷的优势,使得用户可以非常快速简便的找到可能存在的不完整的数据;适用范围广泛,该方法适用于多种HDFS环境,实用性强,易于推广。 A method for verifying the integrity of large data migration in HDFS according to the present invention is an efficient, fast and easy-to-operate method for verifying the integrity of data migrated from HDFS, and finally realizes efficient and simple verification using this technology The integrity of the new data migrated further reduces the workload of manual data verification one by one, and greatly reduces the workload of programming; no need to compile and package the program, and only need simple scripts to complete the verification; more Highlight the flexible and convenient advantages of big data, so that users can find incomplete data that may exist very quickly and easily; the scope of application is wide, this method is suitable for a variety of HDFS environments, strong practicability, and easy to promote.

附图说明 Description of drawings

附图1为本发明的实现流程示意图。 Accompanying drawing 1 is the realization flow diagram of the present invention.

具体实施方式 Detailed ways

下面结合附图对本发明的一种HDFS中大数据迁移完整性验证的方法作以下详细说明。 A method for verifying the integrity of large data migration in HDFS according to the present invention will be described in detail below in conjunction with the accompanying drawings.

如附图1所示,现提供一种HDFS中大数据迁移完整性验证的方法,该方法的具体思路是依次取出迁移之前的每个HDFS文件信息,在新的HDFS文件信息中搜索,若文件信息存在,且大小等信息符合。则继续在迁移前的HDFS信息中取出下一条,继续比较。如果在新的HDFS文件信息中没找到或找到后文件大小不相符,则代表该条数据未被成功迁移。 As shown in Figure 1, a method for verifying the integrity of big data migration in HDFS is now provided. The specific idea of this method is to sequentially take out the information of each HDFS file before migration, and search in the new HDFS file information. If the file The information exists, and the size and other information match. Then continue to take out the next item in the HDFS information before migration, and continue to compare. If it is not found in the new HDFS file information or the file size does not match after being found, it means that the data has not been migrated successfully.

其具体实现过程如下: Its specific implementation process is as follows:

一、原始HDFS文件及目录结构详细信息和迁移后的新HDFS文件信息的获取。 1. Acquisition of original HDFS file and directory structure details and new HDFS file information after migration.

在原始的HDFS文件系统中通过执行hadoop fs –lsr / > oldInfo命令,获取原始HDFS文件的详细信息,并将结果重定向到oldInfo文件中。 In the original HDFS file system, execute the hadoop fs –lsr / > oldInfo command to obtain the detailed information of the original HDFS file, and redirect the result to the oldInfo file.

在迁移后新的HDFS文件系统中,通过执行相同的命令hadoop fs –lsr / >newInfo,获取新的HDFS文件信息,并将结果重定向到newInfo文件中。 In the new HDFS file system after migration, execute the same command hadoop fs –lsr / >newInfo to obtain the new HDFS file information and redirect the result to the newInfo file.

二、 原始文件信息以及新文件信息的分片处理。 2. Fragmentation processing of original file information and new file information.

由于hadoop文件系统中的文件数量巨大,且目录结构及其复杂。为了便于新旧信息对比验证。可将原始文件信息oldInfo和新文件信息newInfo按相同的规则进行分片。具体的分片方法见如下脚本: Due to the huge number of files in the hadoop file system, and the directory structure is extremely complex. In order to facilitate the comparison and verification of new and old information. The original file information oldInfo and the new file information newInfo can be segmented according to the same rules. For the specific sharding method, see the following script:

#!/bin/env python #!/bin/env python

import sys import sys

if __name__ == "__main__": if __name__ == "__main__":

subdir_file = open("oldInfo") subdir_file = open("oldInfo")

line_count = 0 line_count = 0

subdir_list = [] subdir_list = []

for line in subdir_file: for line in subdir_file:

line_count += 1 line_count += 1

if line_count == 1: if line_count == 1:

continue continue

items = line.strip().split() items = line.strip().split()

subdir_list.append(items[-1]) subdir_list.append(items[-1])

output_file_list = [] output_file_list = []

FILE_COUNT = 8 FILE_COUNT = 8

for i in range(0, FILE_COUNT + 1): for i in range(0, FILE_COUNT + 1):

output_file_name = "split_subdir_" + str(i) output_file_name = "split_subdir_" + str(i)

output_file_list.append(open(output_file_name, 'w')) output_file_list. append(open(output_file_name, 'w'))

count_per_file = len(subdir_list) / FILE_COUNT count_per_file = len(subdir_list) / FILE_COUNT

total_count = 0 total_count = 0

for subdir in subdir_list: for subdir in subdir_list:

file_index = total_count / count_per_file file_index = total_count / count_per_file

output_file_list[file_index].write("%s\n" % subdir) output_file_list[file_index].write("%s\n" % subdir)

total_count += 1 total_count += 1

for i in range(0, FILE_COUNT + 1): for i in range(0, FILE_COUNT + 1):

output_file_list[i].close() output_file_list[i].close()

print "total count", total_count print "total count", total_count

使用该方法,按行数将oldInfo中的信息分片到10个split_subdir文件中。 Use this method to split the information in oldInfo into 10 split_subdir files according to the number of rows.

二、新旧文件信息的对比验证和验证结果的输出。 2. Comparison and verification of old and new file information and output of verification results.

通过将分片后的新旧HDFS文件信息进行对应的逐个对比,将对比后的结果保存在日志文件中。其中,主要是对比文件或文件夹的名字,以及文件的大小是否相匹配。 By comparing the fragmented new and old HDFS file information one by one, the comparison results are saved in the log file. Among them, it is mainly to compare the names of files or folders, and whether the sizes of files match.

具体的对比方法见如下脚本: For the specific comparison method, see the following script:

#!/bin/env python #!/bin/env python

import sys import sys

#!/bin/env python #!/bin/env python

import sys import sys

def load_info(info_file_name): def load_info(info_file_name):

info = {} info = {}

for line in open(info_file_name): for line in open(info_file_name):

items = line.strip().split() items = line.strip().split()

if len(items) != 8: if len(items) != 8:

print "wrong info, %s" % line print "wrong info, %s" % line

sys.exit(1) sys. exit(1)

size = items[4] size = items[4]

file_name = items[7] file_name = items[7]

info[file_name] = size info[file_name] = size

return info return info

if __name__ == "__main__": if __name__ == "__main__":

if len(sys.argv) < 4: if len(sys.argv) < 4:

print "%s <info.old> <info.new> <log_file> [dirlist_file_name]" % sys.argv[0] print "%s <info.old> <info.new> <log_file> [dirlist_file_name]" % sys.argv[0]

sys.exit(1) sys. exit(1)

dir_list = [] dir_list = []

if len(sys.argv) >= 5: if len(sys.argv) >= 5:

for line in open(sys.argv[4]): for line in open(sys.argv[4]):

dir_list.append(line.strip()) dir_list.append(line.strip())

old_info = load_info(sys.argv[1]) old_info = load_info(sys.argv[1])

new_info = load_info(sys.argv[2]) new_info = load_info(sys.argv[2])

log_file = open(sys.argv[3], 'w') log_file = open(sys. argv[3], 'w')

total_file_count = 0 total_file_count = 0

missed_file_count = 0 missed_file_count = 0

mismatch_file_count = 0 mismatch_file_count = 0

for file_name in old_info: for file_name in old_info:

if len(dir_list) > 0: if len(dir_list) > 0:

match = False match=False

for target_dir in dir_list: for target_dir in dir_list:

if file_name.startswith(target_dir): if file_name.startswith(target_dir):

match = True match=True

break break

if not match: if not match:

continue continue

total_file_count += 1 total_file_count += 1

if file_name not in new_info: if file_name not in new_info:

log_file.write("[MISSING] %s\n" % file_name) log_file.write("[MISSING] %s\n" % file_name)

missed_file_count += 1 missed_file_count += 1

elif new_info[file_name] != old_info[file_name]: elif new_info[file_name] != old_info[file_name]:

log_file.write("[MISMATCH] %s [%s != %s]\n" % (file_name, old_info[file_name], new_info[file_name])) log_file.write("[MISMATCH] %s [%s != %s]\n" % (file_name, old_info[file_name], new_info[file_name]))

mismatch_file_count += 1 mismatch_file_count += 1

log_file.close() log_file. close()

通过Command<info.old> <info.new> <log_file> [dirlist_file_name]方式,传入相应参数进行新旧HDFS文件信息的对比。 Through Command<info.old> <info.new> <log_file> [dirlist_file_name], pass in the corresponding parameters to compare the information of the old and new HDFS files.

实施例: Example:

本发明实施例中的实施步骤如下: The implementation steps in the embodiment of the present invention are as follows:

1、获取迁移前和迁移后的文件信息。 1. Obtain the file information before and after migration.

2、对文件信息进行分割。 2. Segment the file information.

3、用旧的文件信息为基准,逐条匹配新的文件信息。 3. Using the old file information as a benchmark, match the new file information one by one.

4、若完全匹配上,则取旧文件信息的下一条继续3过程的匹配。 4. If there is a complete match, take the next item of the old file information and continue the matching in process 3.

5、若文件大小未能完全匹配上,代表该文件迁移不完整。将文件信息记录至日志文件后,继续步骤3。 5. If the file size does not match exactly, it means that the file migration is incomplete. After recording the file information to the log file, go to step 3.

6、若文件信息为找到,代表该文件未被迁移至新文件系统。将文件信息记录至日志文件后,继续步骤3。 6. If the file information is found, it means that the file has not been migrated to the new file system. After recording the file information to the log file, go to step 3.

7、当所有的旧文件信息全都被提取过一遍之后,本次完整性验证结束。 7. After all the old file information has been extracted, the integrity verification ends.

以上所述仅为本发明的实施例而已,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above description is only an embodiment of the present invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims (5)

1. a method for large Data Migration integrity verification in HDFS, is characterized in that its specific implementation process is as follows:
1) obtain the new HDFS fileinfo after original HDFS file and bibliographic structure details and migration;
2) to original file information and new fileinfo burst processing;
3) export contrast verification and the result of new and old fileinfo.
2. the method for large Data Migration integrity verification in a kind of HDFS according to claim 1, is characterized in that: the detailed process of described step 1) is:
In original HDFS file system, by carrying out hadoop fs – lsr/> oldInfo order, obtain the details of original HDFS file, and result is redirected in oldInfo file;
, by carrying out identical order hadoop fs – lsr/>newInfo, obtain new HDFS fileinfo, and result is redirected in newInfo file afterwards in new HDFS file system in migration.
3. the method for large Data Migration integrity verification in a kind of HDFS according to claim 1 and 2, it is characterized in that: described step 2) detailed process be: original file information oldInfo and new fileinfo newInfo are carried out to burst by identical rule, and the rule here refers to according to line number and is divided into identical number of files.
4. the method for large Data Migration integrity verification in a kind of HDFS according to claim 3, it is characterized in that: the detailed process of described step 3) is: by the new and old HDFS fileinfo after burst being carried out to corresponding contrast one by one, result after contrast is kept in journal file, the contrast here refers to the name of documents or file, and whether the size of file matches.
5. the method for large Data Migration integrity verification in a kind of HDFS according to claim 4, is characterized in that: described matching process is:
One, be benchmark with old fileinfo, mate one by one new fileinfo;
If two mate completely, get the coupling of next continuation step 2 process of ancient deed information;
If three file sizes fail to match completely, represent that this file migration is imperfect, fileinfo is recorded to after journal file, continue step 2;
If four fileinfos for finding, represent that this file is not migrated to new file system, fileinfo is recorded to after journal file, continue step 2;
Five, after all ancient deed information is all extracted and goes over, this integrity verification finishes.
CN201410212726.1A 2014-05-20 2014-05-20 Verification method for integrity of big data migration in HDFS Pending CN103971066A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410212726.1A CN103971066A (en) 2014-05-20 2014-05-20 Verification method for integrity of big data migration in HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410212726.1A CN103971066A (en) 2014-05-20 2014-05-20 Verification method for integrity of big data migration in HDFS

Publications (1)

Publication Number Publication Date
CN103971066A true CN103971066A (en) 2014-08-06

Family

ID=51240546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410212726.1A Pending CN103971066A (en) 2014-05-20 2014-05-20 Verification method for integrity of big data migration in HDFS

Country Status (1)

Country Link
CN (1) CN103971066A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408047A (en) * 2014-10-28 2015-03-11 浪潮电子信息产业股份有限公司 Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server
CN105389312A (en) * 2014-09-04 2016-03-09 上海福网信息科技有限公司 Big data migration method and tool
CN105808612A (en) * 2014-12-31 2016-07-27 北京嘀嘀无限科技发展有限公司 Method and equipment used for migrating data of database
CN108415853A (en) * 2018-03-15 2018-08-17 深圳市江波龙电子有限公司 A kind of method, apparatus and storage device of garbage reclamation
CN113448613A (en) * 2021-08-30 2021-09-28 湖南省佳策测评信息技术服务有限公司 Software delivery data checking method and device
CN115061979A (en) * 2022-07-08 2022-09-16 建信金融科技有限责任公司 P-level data migration method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005174192A (en) * 2003-12-15 2005-06-30 Hitachi Ltd Electronic application data management method and management system, and electronic pen and server constituting the management system
CN1776670A (en) * 2004-11-19 2006-05-24 国际商业机器公司 Method and system of verifying metadata of a migrated file
CN102724306A (en) * 2012-06-13 2012-10-10 中山大学 Cloud computing based method and system for data migration
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005174192A (en) * 2003-12-15 2005-06-30 Hitachi Ltd Electronic application data management method and management system, and electronic pen and server constituting the management system
CN1776670A (en) * 2004-11-19 2006-05-24 国际商业机器公司 Method and system of verifying metadata of a migrated file
CN102724306A (en) * 2012-06-13 2012-10-10 中山大学 Cloud computing based method and system for data migration
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389312A (en) * 2014-09-04 2016-03-09 上海福网信息科技有限公司 Big data migration method and tool
CN104408047A (en) * 2014-10-28 2015-03-11 浪潮电子信息产业股份有限公司 Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server
CN105808612A (en) * 2014-12-31 2016-07-27 北京嘀嘀无限科技发展有限公司 Method and equipment used for migrating data of database
CN105808612B (en) * 2014-12-31 2019-08-27 北京嘀嘀无限科技发展有限公司 The method and apparatus of data for migrating data library
CN108415853A (en) * 2018-03-15 2018-08-17 深圳市江波龙电子有限公司 A kind of method, apparatus and storage device of garbage reclamation
CN113448613A (en) * 2021-08-30 2021-09-28 湖南省佳策测评信息技术服务有限公司 Software delivery data checking method and device
CN115061979A (en) * 2022-07-08 2022-09-16 建信金融科技有限责任公司 P-level data migration method and system
CN115061979B (en) * 2022-07-08 2024-12-20 建信金融科技有限责任公司 P-level data migration method and system

Similar Documents

Publication Publication Date Title
CN103971066A (en) Verification method for integrity of big data migration in HDFS
US12301724B2 (en) Using a tree structure to segment and distribute records across one or more decentralized, acyclic graphs of cryptographic hash pointers
US12259993B2 (en) Fragmenting data for the purposes of persistent storage across multiple immutable data structures
US11212107B2 (en) Decentralized database optimizations
US11288144B2 (en) Query optimized distributed ledger system
US9898225B2 (en) Content aligned block-based deduplication
CN111405074B (en) Data center network fault diagnosis and automatic configuration method based on hybrid chain
US10922213B2 (en) Embedded quality indication data for version control systems
US11928352B2 (en) Maintaining the benefit of parallel splitting of ops between primary and secondary storage clusters in synchronous replication while adding support for op logging and early engagement of op logging
US8949561B2 (en) Systems, methods, and computer program products providing change logging in a deduplication process
US9892122B2 (en) Method and apparatus for determining a range of files to be migrated
CN105141681A (en) RPKI file synchronizing method and device
CN104376088A (en) Distributed synchronization method of cloud database and database system
US10514988B2 (en) Method and system of migrating applications to a cloud-computing environment
CN103714300A (en) Encryption and analysis system based on distributed GPU and rainbow table and method of encryption and analysis system
Du et al. Deduplicated disk image evidence acquisition and forensically-sound reconstruction
CN104156420B (en) The management method and device of transaction journal
CN105430078A (en) A Distributed Storage Method for Massive Data
US20190005053A1 (en) Data transfer appliance method and system
Abidin et al. Comparative analysis on techniques for big data testing
Jin et al. MapReduce-based entity matching with multiple blocking functions
CN105765908B (en) A kind of multi-site automatic update method, client and system
Sheng et al. GraBi: Communication-efficient and workload-balanced partitioning for bipartite graphs
CN115640170B (en) Big data synchronous backup and verification method
US11163447B2 (en) Dedupe file system for bulk data migration to cloud platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140806

WD01 Invention patent application deemed withdrawn after publication