CN113312147B

CN113312147B - Method and system for migrating object storage across cluster mass data

Info

Publication number: CN113312147B
Application number: CN202110654199.XA
Authority: CN
Inventors: 张致江; 凌震华; 王智国; 王芝斌
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2022-12-30
Anticipated expiration: 2041-06-11
Also published as: CN113312147A

Abstract

The present invention discloses a method and system for object storage cross-cluster mass data migration. The method includes: step S1, receiving a migration task request from a user and establishing a subtask; step S2, generating a migration task based on information related to establishing a subtask Correspond to the configuration file, and store the relevant information of the established subtask and the corresponding configuration file of the migration task in the task queue of the back-end database as an OSS object; step S3, scan the task queue stored in the back-end database according to the predetermined duration, and save the The migration task in the waiting state and the paused state is scheduled and executed; step S4, the invoked migration task is run in any mode of Docker container, K8S job, or process; step S5, start according to the type of the migration task The migration plug-in of the corresponding type performs the migration operation; in step S6, the called corresponding migration plug-in completes the data migration in a multi-level task manner. This method can realize elastic migration of massive data across clusters.

Description

Method and system for mass data migration across object storage clusters

技术领域technical field

本发明涉及云存储数据迁移领域，尤其涉及一种对象存储跨集群海量数据迁移系统及系统。The invention relates to the field of cloud storage data migration, in particular to an object storage cross-cluster massive data migration system and system.

背景技术Background technique

数据迁移是各大云服务厂商必备的产品和工具，比如微软Azure、阿里云、华为云、腾讯云，都有自己的数据迁移工具；各种存储集群和框架也有自己的数据迁移和均衡方案。Data migration is an essential product and tool for major cloud service providers, such as Microsoft Azure, Alibaba Cloud, Huawei Cloud, and Tencent Cloud, all of which have their own data migration tools; various storage clusters and frameworks also have their own data migration and balancing solutions .

目前阿里云的DTS(Data Transmission service)方案集中于OSS集群之间的数据迁移，以一个云化的中心服务提供多种不同的OSS存储到阿里云OSS的数据迁移，不支持异构类型集群的数据迁移；也不支持非阿里云存储之间的数据迁移。At present, Alibaba Cloud's DTS (Data Transmission service) solution focuses on data migration between OSS clusters. It uses a cloud-based central service to provide data migration from various OSS storage to Alibaba Cloud OSS, and does not support heterogeneous clusters. Data migration; also does not support data migration between non-Alibaba cloud storage.

华为云的迁移是以事务的方式进行数据迁移，在迁移失败时回滚全部操作。HUAWEI CLOUD migrates data in a transactional manner, and rolls back all operations when the migration fails.

现有云存储数据迁移技术大多以一个中心化的服务作为迁移中心，大多数为特定类型数据集群之间的迁移，如果有不同的需求需要重新建设一套迁移系统；同时除了少数公有云厂商，大多数的迁移系统都是临时搭建响应完需求后该系统即废弃；同时现有技术很少会实现异构集群间的数据迁移，或者仅实现特种需求的异构集群间的数据迁移。Most of the existing cloud storage data migration technologies use a centralized service as the migration center, and most of them are migrations between specific types of data clusters. Most of the migration systems are temporarily built to respond to the needs, and then the system is discarded; at the same time, the existing technology rarely realizes data migration between heterogeneous clusters, or only realizes data migration between heterogeneous clusters with special needs.

发明内容Contents of the invention

基于现有技术所存在的问题，本发明的目的是提供一种对象存储跨集群海量数据迁移系统及系统，能解决现有云存储数据迁移系统，所存在大多不支持对异构集群间的数据迁移或者仅支持特种需求的异构集群间数据迁移的问题。Based on the problems existing in the prior art, the object of the present invention is to provide an object storage cross-cluster mass data migration system and system, which can solve the existing cloud storage data migration system, most of which do not support data transfer between heterogeneous clusters Migration or data migration between heterogeneous clusters that only support special requirements.

本发明的目的是通过以下技术方案实现的：The purpose of the present invention is achieved through the following technical solutions:

本发明实施方式提供一种对象存储跨集群海量数据迁移方法，包括：The embodiment of the present invention provides a method for object storage cross-cluster mass data migration, including:

步骤S1，接收用户发出的迁移任务请求和该迁移任务请求发出的建立子任务的请求；Step S1, receiving the migration task request issued by the user and the request for establishing subtasks issued by the migration task request;

步骤S2，根据所述建立子任务的相关信息，生成被创建迁移任务的对应配置文件，并将所述建立子任务和被创建迁移任务的对应配置文件相关信息以OSS对象存储至后端数据库；Step S2, generating a configuration file corresponding to the created migration task according to the relevant information of the established subtask, and storing the related information of the established subtask and the corresponding configuration file of the created migration task in the backend database as an OSS object;

步骤S3，定时扫描所述后端数据库中已存储的任务信息，将处于正在等待状态和暂停状态的迁移任务调度起执行迁移任务；Step S3, regularly scan the task information stored in the back-end database, and schedule the migration tasks in the waiting state and the paused state to execute the migration tasks;

步骤S4，被调起的迁移任务以Docker容器、K8S的job、进程方式中的任一种方式运行；Step S4, the invoked migration task runs in any mode of Docker container, K8S job, or process;

步骤S5，根据任务的类型以对应类型的迁移插件启动任务实例进行迁移操作；Step S5, start the task instance with the corresponding type of migration plug-in according to the type of the task to perform the migration operation;

步骤S6，所述迁移插件采用多级任务的方式完成数据的迁移。In step S6, the migration plug-in completes data migration in a multi-level task manner.

本发明实施方式提供一种对象存储跨集群海量数据迁移系统，用于实现本发明所述的方法，包括：The embodiment of the present invention provides an object storage cross-cluster mass data migration system for implementing the method described in the present invention, including:

入口服务单元、迁移任务调度单元、迁移任务执行单元和后端数据库；其中，The entry service unit, the migration task scheduling unit, the migration task execution unit and the backend database; among them,

所述入口服务单元，与所述后端数据库通信连接，能接收用户发出的迁移任务请求和该迁移任务请求建立的子任务，根据建立子任务的相关信息，生成所述迁移任务的对应配置文件，并将建立的所述子任务和所述迁移任务的对应配置文件相关信息以OSS对象存储至后端数据库的任务队列中；The entry service unit communicates with the back-end database, can receive the migration task request sent by the user and the subtask established by the migration task request, and generate the corresponding configuration file of the migration task according to the information related to the establishment of the subtask , and store the established subtasks and corresponding configuration file related information of the migration task in the task queue of the backend database as OSS objects;

所述迁移任务调度单元，与所述后端数据库通信连接，能按预定时长扫描所述后端数据库中已存储的任务队列，将处于正在等待状态和暂停状态的迁移任务调度起并执行，所述被调起的所述迁移任务以Docker容器、K8S的job、进程中的任一种方式运行；The migration task scheduling unit is connected in communication with the back-end database, and can scan the task queue stored in the back-end database according to a predetermined period of time, and schedule and execute the migration tasks in the waiting state and the paused state. The above-mentioned transferred migration task runs in any mode of Docker container, K8S job, or process;

所述迁移任务执行单元，分别与所述迁移任务调度单元和所述后端数据库通信连接，能根据迁移任务的类型启动对应类型的迁移插件以进行迁移操作，所述迁移插件采用多级任务的方式完成数据迁移。The migration task execution unit is connected to the migration task scheduling unit and the back-end database respectively, and can start a corresponding type of migration plug-in according to the type of the migration task to perform the migration operation, and the migration plug-in adopts a multi-level task way to complete the data migration.

由上述本发明提供的技术方案可以看出，本发明实施例提供的对象存储跨集群海量数据迁移系统及系统，其有益效果为：It can be seen from the above-mentioned technical solution provided by the present invention that the object storage cross-cluster massive data migration system and system provided by the embodiment of the present invention have the following beneficial effects:

通过根据迁移任务的联系以对应类型的迁移插件启动任务实例进行迁移操作，能方便的实现异构集群间的海量数据迁移，由于是基于云原生技术的数据迁移异构插件，支持迁移任务服务组件一键自动化部署与销毁，对于不同存储系统的数据迁移支持插件式接入。By starting the task instance with the corresponding type of migration plug-in according to the connection of the migration task, it can easily realize the migration of massive data between heterogeneous clusters. Because it is a data migration heterogeneous plug-in based on cloud native technology, it supports migration task service components One-click automatic deployment and destruction, support plug-in access for data migration of different storage systems.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative work.

图1为本发明实施例提供的对象存储跨集群海量数据迁移方法的流程图；FIG. 1 is a flowchart of a method for object storage cross-cluster massive data migration provided by an embodiment of the present invention;

图2为本发明实施例提供的对象存储跨集群海量数据迁移系统的框图；FIG. 2 is a block diagram of an object storage cross-cluster massive data migration system provided by an embodiment of the present invention;

图3为本发明实施例提供的对象存储跨集群海量数据迁移系统的入口服务单元框图；FIG. 3 is a block diagram of an entry service unit of an object storage cross-cluster mass data migration system provided by an embodiment of the present invention;

图4为本发明实施例提供的对象存储跨集群海量数据迁移系统的迁移任务调度单元框图；FIG. 4 is a block diagram of a migration task scheduling unit of an object storage cross-cluster mass data migration system provided by an embodiment of the present invention;

图5为本发明实施例提供的对象存储跨集群海量数据迁移系统的迁移任务执行单元框图；FIG. 5 is a block diagram of a migration task execution unit of an object storage cross-cluster mass data migration system provided by an embodiment of the present invention;

图6为本发明实施例提供的对象存储跨集群海量数据迁移系统的入口服务单元处理流程图；FIG. 6 is a flow chart of the processing of the entry service unit of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention;

图7为本发明实施例提供的对象存储跨集群海量数据迁移系统的任务调度单元处理流程图；FIG. 7 is a processing flowchart of the task scheduling unit of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention;

图8为本发明实施例提供的对象存储跨集群海量数据迁移系统的任务执行单元处理流程图；FIG. 8 is a processing flowchart of the task execution unit of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention;

图9为本发明实施例提供的对象存储跨集群海量数据迁移方法的入口服务处理流程图；FIG. 9 is a flow chart of ingress service processing of the object storage cross-cluster massive data migration method provided by the embodiment of the present invention;

图10为本发明实施例提供的对象存储跨集群海量数据迁移系统的具体构成示意图；FIG. 10 is a schematic diagram of the specific composition of the object storage cross-cluster massive data migration system provided by the embodiment of the present invention;

图11为本发明实施例提供的对象存储跨集群海量数据迁移方法中的任务生成和调度流程图；FIG. 11 is a flow chart of task generation and scheduling in the object storage cross-cluster massive data migration method provided by the embodiment of the present invention;

图12为本发明实施例提供的对象存储跨集群海量数据迁移系统的account级别数据迁移插件的逻辑结构示意图；12 is a schematic diagram of the logic structure of the account-level data migration plug-in of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention;

图13为本发明实施例提供的对象存储跨集群海量数据迁移系统的container/bucket级别数据迁移插件的逻辑结构示意图；FIG. 13 is a schematic diagram of the logical structure of the container/bucket level data migration plug-in of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention;

图14为本发明实施例提供的对象存储跨集群海量数据迁移系统的FileList级别数据迁移插件的逻辑结构示意图。FIG. 14 is a schematic diagram of the logical structure of the FileList-level data migration plug-in of the object storage cross-cluster mass data migration system provided by the embodiment of the present invention.

具体实施方式detailed description

下面结合本发明的具体内容，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。本发明实施例中未作详细描述的内容属于本领域专业技术人员公知的现有技术。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the specific content of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. The content not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art.

参见图1，本发明实施方式提供一种对象存储跨集群海量数据迁移方法，包括：Referring to Figure 1, an embodiment of the present invention provides a method for object storage cross-cluster massive data migration, including:

上述方法的步骤S6中，迁移插件采用多级任务的方式完成数据迁移的方式如下：In step S6 of the above method, the migration plug-in uses multi-level tasks to complete the data migration in the following manner:

步骤S61)按接收的用户迁移任务的请求调度起对应的迁移任务镜像，如果用户需要迁移整个账户(account)，则启动账户迁移任务，该账户迁移任务遍历所有桶(即container或bucket)后创建桶迁移任务；Step S61) Scheduling the corresponding migration task image according to the received user migration task request, if the user needs to migrate the entire account (account), then start the account migration task, the account migration task traverses all buckets (ie container or bucket) and creates bucket migration task;

步骤S62)创建的所述桶迁移任务被调度起后遍历桶下的所有文件，并按预先规定的数量n，每n个文件创建一个文件迁移任务；Step S62) The bucket migration task created is scheduled to traverse all the files under the bucket, and create a file migration task for every n files according to the predetermined number n;

步骤S63)创建的文件迁移任务被调度起后，将比较源端集群的源文件和目标端集群的目标文件，若源文件和目标文件相同则跳过该文件的复制迁移，如果源文件与目标文件不同，则复制迁移该文件。After the file migration task created in step S63) is scheduled, the source file of the source cluster and the target file of the target cluster will be compared. If the source file and the target file are the same, copying and migration of the file will be skipped. If the source file is the same as the target file If the file is different, copy and migrate the file.

上述方法还包括：The above method also includes:

步骤S7，故障恢复，所有被调起的任务，会定时向后端数据库插入检查点(即checkpoint)，当故障后重新调度起则从上一个完成的检查点开始重新运行；Step S7, failure recovery, all tasks that are called up will regularly insert checkpoints (ie checkpoints) into the back-end database, and re-run from the last completed checkpoint when rescheduling after a failure;

所述检查点为内部事件，该事件激活后会触发数据库写进程将数据缓冲中的脏数据块写出到数据文件中。The checkpoint is an internal event, which will trigger the database writing process to write out the dirty data blocks in the data buffer to the data file after the event is activated.

参见图2，本发明实施例还提供一种对象存储跨集群海量数据迁移系统，用于实现上述的方法，包括：Referring to Fig. 2, the embodiment of the present invention also provides an object storage cross-cluster mass data migration system for implementing the above method, including:

入口服务单元1、迁移任务调度单元2、迁移任务执行单元3和后端数据库4；其中，Entrance service unit 1, migration task scheduling unit 2, migration task execution unit 3 and backend database 4; wherein,

图2是整个系统的逻辑架构图：分为入口服务单元1、迁移任务调度单元2、迁移任务执行单元3这三个主要的逻辑单元；其中，Figure 2 is a logical architecture diagram of the entire system: it is divided into three main logical units: the entry service unit 1, the migration task scheduling unit 2, and the migration task execution unit 3; among them,

入口服务单元以接收来自用户或者任务执行过程中的指令并执行；The entry service unit is used to receive and execute instructions from users or during task execution;

迁移任务调度单元按照一定的调度算法调度迁移任务，实现任务的并发执行；The migration task scheduling unit schedules migration tasks according to a certain scheduling algorithm to achieve concurrent execution of tasks;

迁移任务执行单元是真实迁移任务的执行者，访问源和目标集群收集信息，并执行数据迁移拷贝。The migration task execution unit is the executor of the real migration task, which visits the source and target clusters to collect information and execute data migration copy.

参见图3，上述系统中，所述入口服务单元1包括：Referring to Fig. 3, in the above system, the entrance service unit 1 includes:

任务创建请求模块11、任务请求子模块12、生成配置信息模块13、插入队列模块14、数据库操作模块15、任务调度模块16、调整容器云资源模块17和启动容器模块18；其中，Task creation request module 11, task request sub-module 12, generating configuration information module 13, inserting queue module 14, database operation module 15, task scheduling module 16, adjusting container cloud resource module 17 and starting container module 18; wherein,

所述任务创建请求模块，用于接收用户发出的迁移任务请求，并根据所述迁移任务请求建立迁移任务；The task creation request module is configured to receive a migration task request from a user, and create a migration task according to the migration task request;

所述任务请求子模块，与所述任务创建请求模块通信连接，用于根据所述迁移任务建立对应的子任务；The task request submodule is communicatively connected with the task creation request module, and is used to establish a corresponding subtask according to the migration task;

所述生成配置信息模块，分别与所述任务创建请求模块和所述任务请求子模块通信连接，用于根据建立子任务的相关信息，生成所述迁移任务的对应配置文件；The generating configuration information module is respectively connected to the task creation request module and the task request sub-module, and is used to generate the corresponding configuration file of the migration task according to the relevant information of establishing sub-tasks;

所述插入队列模块，分别与所述任务创建请求模块、任务请求子模块和生成配置信息模块通信连接，将建立的所述子任务和所述迁移任务的对应配置文件相关信息插入任务队列中；The insertion queue module is respectively connected to the task creation request module, the task request sub-module and the configuration information generation module, and inserts the established sub-task and the corresponding configuration file related information of the migration task into the task queue;

所述数据库操作模块，与所述插入队列模块通信连接，将任务队列的各任务以OSS对象存储至后端数据库中；The database operation module is communicated with the insertion queue module, and stores each task of the task queue in the back-end database as an OSS object;

所述任务调度模块，分别与所述迁移任务调度单元、所述数据库操作模块和所述调整容器云资源模块通信连接，能将调度任务请求发送至所述迁移任务调度单元，并接收所述迁移任务调度单元调度起的迁移任务，并分别发送至所述数据库操作模块和所述调整容器云资源模块；The task scheduling module is connected to the migration task scheduling unit, the database operation module, and the container cloud resource adjustment module respectively, and can send a scheduling task request to the migration task scheduling unit and receive the migration task scheduling unit. The migration tasks scheduled by the task scheduling unit are sent to the database operation module and the adjustment container cloud resource module respectively;

所述调整容器云资源模块，与所述任务调度模块通信连接，用于根据所述任务调度模块发送的迁移任务调整容器云资源；The module for adjusting container cloud resources is connected in communication with the task scheduling module, and is used to adjust container cloud resources according to the migration task sent by the task scheduling module;

所述启动容器模块，与所述调整容器云资源模块通信连接，用于将所述被调起的所述迁移任务以Docker容器、K8S的job、进程中的任一种方式启动运行。The starting container module is connected in communication with the adjusting container cloud resource module, and is used to start and run the invoked migration task in any mode of Docker container, K8S job, or process.

图6示意了上述入口服务单元的处理流程。Fig. 6 schematically shows the processing flow of the above-mentioned entry service unit.

参见图4，上述系统中，所述迁移任务调度单元2包括：Referring to FIG. 4, in the above system, the migration task scheduling unit 2 includes:

任务扫描判断模块21和迁移任务确定模块22；其中，task scanning judging module 21 and migration task determining module 22; wherein,

所述任务扫描判断模块，与所述迁移任务确定模块通信连接，能按预定时长扫描所述后端数据库中已存储的任务队列，将处于正在等待状态和暂停状态的迁移任务发送至所述迁移任务确定模块；The task scanning judgment module is communicated with the migration task determination module, and can scan the task queue stored in the back-end database according to a predetermined duration, and send the migration tasks in the waiting state and paused state to the migration task task determination module;

所述迁移任务确定模块，将所述任务扫描判断模块发送的迁移任务调度起并执行。The migration task determining module schedules and executes the migration task sent by the task scanning and judging module.

图7示意了上述迁移任务调度单元的处理流程。FIG. 7 schematically illustrates the processing flow of the above-mentioned migration task scheduling unit.

参见图5，上述系统中，所述迁移任务执行单元3包括：Referring to FIG. 5, in the above system, the migration task execution unit 3 includes:

迁移插件选择模块31和多个迁移插件模块32、32、33……3n；其中，Migration plug-in selection module 31 and multiple migration plug-in modules 32, 32, 33... 3n; wherein,

所述插件选择模块，分别与各迁移插件模块通信连接，能根据迁移任务的类型启动对应类型的迁移插件以进行迁移操作；The plug-in selection module is communicated with each migration plug-in module respectively, and can start a corresponding type of migration plug-in according to the type of the migration task to perform the migration operation;

每个迁移插件模块，能在被所述迁移插件选择模块启动后采用多级任务的方式完成数据迁移。Each migration plug-in module can complete data migration in a multi-level task manner after being activated by the migration plug-in selection module.

上述系统中，所述迁移任务执行单元还包括：故障恢复模块，分别与所述后端数据库和各迁移插件通信连接，用于插件模块在对所有被调度起的文件迁移任务时，按预先设定时长向所述后端数据库插入检查点(即checkpoint)，当故障后重新调度起文件迁移任务时，则从上一个完成的检查点开始重新运行；所述检查点是一个内部事件，这个事件激活后会触发后端数据库写进程将数据缓冲中的脏数据块写出到数据文件中。In the above system, the migration task execution unit further includes: a failure recovery module, which is respectively connected to the back-end database and each migration plug-in in communication, and is used for the plug-in module to perform all scheduled file migration tasks according to preset Insert a checkpoint (i.e. checkpoint) to the back-end database at a fixed time, and when the file migration task is rescheduled after a failure, it will start to run again from the last completed checkpoint; the checkpoint is an internal event, and this event After activation, the backend database write process will be triggered to write out the dirty data blocks in the data buffer to the data file.

图8示意了上述迁移任务执行单元的处理流程。FIG. 8 schematically illustrates the processing flow of the above-mentioned migration task execution unit.

本发明的数据迁移系统，是一种完全弹性的插件式跨集群数据迁移系统，通过设置可按迁移任务类型可对应选择的不同类型的迁移插件，能弹性化调度实现迁移系统在无工作时不占用资源，在需要工作时调度必要资源，在完成工作后释放所使用资源；通过迁移插件来实现不同的迁移需求，支持各式各样的异构集群间的数据迁移；任务失败时可以从过程中中断的位置继续运行。The data migration system of the present invention is a completely flexible plug-in cross-cluster data migration system. By setting different types of migration plug-ins that can be selected according to the type of migration task, it can be flexibly scheduled to realize that the migration system does not work when there is no work. Occupies resources, schedules necessary resources when work is required, and releases used resources after work is completed; different migration requirements are realized through migration plug-ins, and data migration between various heterogeneous clusters is supported; when tasks fail, they can be transferred from the process continue at the point where it was interrupted.

下面对本发明实施例具体作进一步地详细描述。The embodiments of the present invention will be further described in detail below.

参见图1、9、10和11，本发明的对象存储跨集群海量数据迁移方法，包括以下步骤：Referring to Figures 1, 9, 10 and 11, the object storage cross-cluster massive data migration method of the present invention includes the following steps:

步骤S1，构建用于接收用户发出的迁移任务请求的InterFaceServer上的服务，该服务除了接收来及用户的创建任务请求，还接收来自任务发出的建立子任务的请求；Step S1, constructing a service on InterFaceServer for receiving the migration task request sent by the user. In addition to receiving the creation task request from the user, the service also receives the request from the task to create a subtask;

步骤S2，InterFaceServer将创建任务的相关信息写入后端数据库，生成需要被创建起的任务的配置文件并上传至后端的统一存储，本实施例中使用了OSS对象存储；Step S2, InterFaceServer writes the relevant information of the created task into the back-end database, generates the configuration file of the task to be created and uploads it to the unified storage of the back-end, and OSS object storage is used in this embodiment;

步骤S3，任务调度器定时扫描后端数据库中的任务队列，将处在waiting和pause状态的任务调度起来执行迁移任务；Step S3, the task scheduler regularly scans the task queue in the back-end database, and schedules the tasks in the waiting and pause states to execute the migration task;

步骤S4，被调度起的迁移任务可以运行在Docker容器中，也可以以K8S的job方式启动运行，也可以在物理机上生成一个进程运行任务；In step S4, the scheduled migration task can run in a Docker container, can also be started and run as a K8S job, or can generate a process running task on a physical machine;

步骤S5，任务调度器会根据任务的类型，调度不同的插件启动任务实例；本实施例中的例子，是根据任务类型，下载对应的Docker容器的镜像，启动job实例执行迁移操作；Step S5, the task scheduler will schedule different plug-ins to start the task instance according to the type of task; the example in this embodiment is to download the image of the corresponding Docker container according to the task type, and start the job instance to perform the migration operation;

步骤S6，具体的，采用多级任务的方式实现迁移插件：Step S6, specifically, implement the migration plug-in in a multi-level task manner:

步骤S61)接收到用户任务后，按用户任务的需求调度起对应的迁移任务镜像，如果用户需要迁移整个account，则启动account迁移任务，该account迁移任务遍历所有container，创建container迁移任务；Step S61) After receiving the user task, schedule the corresponding migration task image according to the requirements of the user task, if the user needs to migrate the entire account, start the account migration task, the account migration task traverses all containers, and creates a container migration task;

步骤S62)被创建的container迁移任务被调度后将遍历container下的文件，并按照规定的文件数量(假设为n)，则每n个文件创建一个文件迁移任务；Step S62) After the created container migration task is scheduled, it will traverse the files under the container, and create a file migration task for every n files according to the specified number of files (assumed to be n);

步骤S63)被创建的文件迁移任务被调度后，将比较源端集群的源文件和目标端集群的目标文件，如果源文件与目标文件一样，则跳过该源文件的复制迁移，如果源文件与目标文件不一样，则复制迁移该文件；Step S63) After the created file migration task is scheduled, it will compare the source file of the source cluster with the target file of the target cluster. If the source file is the same as the target file, skip the copy migration of the source file. If the source file If it is different from the target file, copy and migrate the file;

步骤S7，故障恢复，所有被调度起的任务，会定时向数据库插入checkpoint，当故障后重新调度起，则从上一个完成的checkpoint开始重新运行(参见图9)。Step S7, fault recovery, all scheduled tasks will regularly insert checkpoints into the database, and when rescheduled after a fault, rerun from the last completed checkpoint (see Figure 9).

上述方法的各步骤具体按如下方式实现：图9示意了InterFaceServer的执行流程：用户创建任务及子任务的创建、运维指令的执行过程以及调度，图10示意了对象存储跨集群海量数据迁移系统的具体构成；Each step of the above method is specifically implemented as follows: Figure 9 illustrates the execution process of InterFaceServer: user creation of tasks and creation of sub-tasks, execution process and scheduling of operation and maintenance instructions, and Figure 10 illustrates the object storage cross-cluster mass data migration system specific composition;

实现和部署InterFaceServer；InterFaceServer接收用户的请求可以使用RestAPI,可以基于RPC或者CMD(命令行)模式；Implement and deploy InterFaceServer; InterFaceServer can use RestAPI to receive user requests, and can be based on RPC or CMD (command line) mode;

(1)InterfaceServer接收到创建人物的请求后，将需要创建的任务信息写入到数据库或者将需要创建的任务写入到可以公共读的存储或者内存块中，并置状态为等待；(1) After InterfaceServer receives the request to create a character, it writes the task information to be created into the database or writes the task to be created into a storage or memory block that can be read by the public, and sets the state to wait;

(2)调度器根据数据库或者公共读的存储或者内存块中的任务信息，在物理机中调度起对应类型的任务进程，并将该任务所需的相关信息一同指定，置状态为运行中；(2) The scheduler schedules the corresponding type of task process in the physical machine according to the task information in the database or public read storage or memory block, and specifies the relevant information required by the task together, and sets the status as running;

(3)调度器也可以在K8s中调度起对应的任务或者pod，置状态为运行中；(3) The scheduler can also schedule the corresponding tasks or pods in K8s, and set the status to running;

(4)调度器也可以在Docker容器中调度器对应的任务，置状态为运行中；(4) The scheduler can also set the task corresponding to the scheduler in the Docker container, and set the status to running;

(5)任务结束后关闭自身进程/job/pod，并置状态为结束；(5) After the task ends, close its own process/job/pod, and set the status to end;

(6)迁移插件的实现：(6) Implementation of the migration plug-in:

参见图12，(61)account级别数据迁移插件：See Figure 12, (61) account-level data migration plug-in:

(611)任务启动后向数据库或者共享存储中写入checkpoint；(611) After the task is started, checkpoint is written into the database or shared storage;

(612)根据任务配置信息从源端集群list该account下所有container/bucket信息；(612) List all container/bucket information under the account from the source cluster list according to the task configuration information;

(613)根据list出的container信息向InterfaceServer申请创建container级别迁移任务；(613) Apply to InterfaceServer to create a container level migration task according to the container information in the list;

(614)每扫描一定数量的container就创建一条checkpoint信息写入共享存储或者数据库中；(614) Create a checkpoint information and write it into shared storage or database every time a certain number of containers are scanned;

参见图13，(62)Container级别数据迁移插件See Figure 13, (62) Container level data migration plug-in

(621)任务启动后向数据库或者共享存储中写入checkpoint；(621) Write the checkpoint to the database or shared storage after the task starts;

(622)根据任务配置信息从源端集群list该container下所有文件信息；(622) List all file information under the container from the source cluster list according to the task configuration information;

(623)将list出的的文件列表每1000条(或其他数目)组织成数据库表或者串化文件形式信息向InterfaceServer申请创建FileList级别迁移任务；(623) Every 1000 (or other numbers) of the file list that list goes out is organized into database table or serialized file form information and applies to InterfaceServer to create FileList level migration task;

(624)每扫描一定数量的FileList就创建一条checkpoint信息写入共享存储或者数据库中；(624) Create a checkpoint information and write it into shared storage or database every time a certain number of FileLists are scanned;

参见图14，(63)FileList级别数据迁移插件：See Figure 14, (63) FileList level data migration plug-in:

(631)任务启动后向数据库或者共享存储中写入checkpoint；(631) Write the checkpoint to the database or shared storage after the task starts;

(632)根据任务配置信息获取该任务对应的数据库表或者串化文件；(632) Obtain the database table or serialization file corresponding to the task according to the task configuration information;

(633)根据获取到的文件列表将文件从源端集群拷贝置目标集群；(633) copy the file from the source cluster to the target cluster according to the obtained file list;

(634)完成后结束自身。(634) End itself when done.

本发明的迁移系统及方法至少具有以下优点：The migration system and method of the present invention have at least the following advantages:

(1)本发明采用流行的容器化方案，可以很容易的实现基于K8S编排的任务，从而实现完全弹性化调度，在设计上更贴近迁移任务的特点：任务相对集中，无任务时无需系统资源；(1) The present invention adopts the popular containerization scheme, which can easily implement tasks based on K8S orchestration, thereby realizing fully flexible scheduling, and the design is closer to the characteristics of migration tasks: tasks are relatively concentrated, and no system resources are needed when there are no tasks ;

(2)本发明采用插件式的扩展方案，可以很容易的扩展新的需求，在不同类型的集群之间迁移，而非传统的只能在同一种集群中进行数据迁移；(2) The present invention adopts a plug-in expansion scheme, which can easily expand new requirements and migrate between different types of clusters, instead of the traditional data migration that can only be performed in the same cluster;

(3)本发明基于checkpoint的方式实现故障自动恢复，具有恢复速度快，故障恢复点易追踪，恢复点记录信息少，实现方法简单便捷。(3) The present invention realizes automatic fault recovery based on the checkpoint method, which has the advantages of fast recovery speed, easy tracking of fault recovery points, less record information of recovery points, and simple and convenient implementation method.

本领域普通技术人员可以理解：实现本发明实施例方法中的全部或部分流程是可以通过程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(Random AccessMemory，RAM)等。Those of ordinary skill in the art can understand that all or part of the process in the method of the embodiment of the present invention can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, the program During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM) and the like.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person familiar with the technical field can easily conceive of changes or changes within the technical scope disclosed in the present invention. Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. A method for migrating object storage across cluster mass data is characterized by comprising the following steps:

step S1, receiving a migration task request sent by a user and a subtask established by the migration task request;

step S2, generating a corresponding configuration file of the migration task according to the relevant information of the established subtasks, and storing the established subtasks and the relevant information of the corresponding configuration file of the migration task into a task queue of a back-end database by an OSS object;

s3, scanning the task queue stored in the back-end database according to a preset time length, and scheduling and executing the migration tasks in the waiting state and the pause state;

s4, running the called migration task in any one mode of a Docker container, a jobs of K8S and a process;

s5, starting a migration plug-in of a corresponding type according to the type of the migration task to perform migration operation;

s6, the called migration plug-in completes data migration in a multi-level task mode;

in step S6, the migration plug-in completes data migration in a multi-level task manner as follows:

step S61) scheduling a corresponding migration task mirror image according to the received request of the user migration task, if the user needs to migrate the whole account, starting the account migration task, and creating a bucket migration task after the account migration task traverses all the buckets;

step S62) created the barrel migration task is scheduled and then traverses all files under the barrel, and a file migration task is created according to every n files, wherein n is a preset number;

and S63) after the created file migration task is scheduled, comparing the source file of the source end cluster with the target file of the target end cluster, skipping the copying and migration of the source file if the source file and the target file are the same, and copying and migrating the source file to the target end cluster as the target file if the source file is different from the target file.

2. The method for migrating object storage across cluster mass data according to claim 1, further comprising:

s7, recovering the fault, inserting check points into the back-end database according to preset time length for all scheduled file migration tasks, and when the file migration tasks are rescheduled after the fault, restarting from the last completed check point;

the check point is an internal event, and the event triggers a back-end database writing process to write out the dirty data blocks in the data buffer into the data file after being activated.

3. The method for migrating the object storage across the cluster mass data according to claim 1, wherein in the step S5, the migration plug-in of the corresponding type includes: any of an account level data migration plug-in, a bucket level data migration plug-in, and a file list level data migration plug-in.

4. An object storage cross-cluster mass data migration system for implementing the method of any one of claims 1 to 3, comprising:

the system comprises an entrance service unit, a migration task scheduling unit, a migration task execution unit and a back-end database; wherein,

the entrance service unit is in communication connection with the back-end database, can receive a migration task request sent by a user and a subtask established by the migration task request, generates a corresponding configuration file of the migration task according to related information for establishing the subtask, and stores the established subtask and the related information of the corresponding configuration file of the migration task into a task queue of the back-end database by an OSS object;

the migration task scheduling unit is in communication connection with the back-end database, can scan a task queue stored in the back-end database according to preset time length, schedules and executes the migration tasks in a waiting state and a pause state, and the scheduled migration tasks run in any mode of a Docker container, a job of K8S and a process;

the migration task execution unit is respectively in communication connection with the migration task scheduling unit and the back-end database, and can start a migration plug-in of a corresponding type according to the type of the migration task to perform migration operation, and the migration plug-in completes data migration in a multi-level task mode; the migration plug-ins of the corresponding type include: any of an account level data migration plug-in, a bucket level data migration plug-in, and a file list level data migration plug-in.

5. The system for migrating object storage across cluster mass data according to claim 4, wherein said portal service unit comprises:

the system comprises a task creating request module, a task request submodule, a configuration information generating module, an insertion queue module, a database operation module, a task scheduling module, a container cloud resource adjusting module and a container starting module; wherein,

the task creation request module is used for receiving a migration task request sent by a user and establishing a migration task according to the migration task request;

the task request submodule is in communication connection with the task creation request module and is used for creating corresponding subtasks according to the migration tasks;

the configuration information generating module is respectively in communication connection with the task creating request module and the task request submodule and is used for generating a corresponding configuration file of the migration task according to the relevant information for establishing the subtasks;

the inserting queue module is respectively in communication connection with the task creating request module, the task request submodule and the configuration information generating module, and inserts the related information of the corresponding configuration files of the established subtasks and the migration tasks into a task queue;

the database operation module is in communication connection with the insertion queue module and stores each task of the task queue into a back-end database by using an OSS object;

the task scheduling module is respectively in communication connection with the migration task scheduling unit, the database operation module and the adjustment container cloud resource module, can send a scheduling task request to the migration task scheduling unit, receives the migration task scheduled by the migration task scheduling unit, and respectively sends the migration task request to the database operation module and the adjustment container cloud resource module;

the adjustment container cloud resource module is in communication connection with the task scheduling module and is used for adjusting container cloud resources according to the migration tasks sent by the task scheduling module;

the starting container module is in communication connection with the adjusting container cloud resource module and is used for starting and running the called migration task in any one mode of a Docker container, a jobs of K8S and a process.

6. The system for migrating the object storage across the cluster mass data according to claim 4 or 5, wherein the migration task scheduling unit comprises:

the task scanning and judging module and the migration task determining module; wherein,

the task scanning and judging module is in communication connection with the migration task determining module, can scan the task queue stored in the back-end database according to preset time, and sends the migration tasks in a waiting state and a suspension state to the migration task determining module;

and the migration task determining module schedules and executes the migration task sent by the task scanning and judging module.

7. The system for migrating the mass data of the object storage across the clusters according to claim 4 or 5, wherein the migration task execution unit comprises:

the system comprises a migration plug-in selection module and a plurality of migration plug-in modules; wherein,

the migration plug-in selection module is respectively in communication connection with each migration plug-in module and can start the migration plug-in modules of corresponding types according to the types of the migration tasks to perform migration operation;

and each migration plug-in module can complete data migration in a multi-level task mode after being started by the migration plug-in selection module.

8. The system for migrating object storage across cluster mass data according to claim 7, wherein the migration task execution unit further comprises:

the fault recovery module is respectively in communication connection with the back-end database and each migration plug-in, and is used for inserting check points into the back-end database according to preset time length when the plug-in module migrates all the scheduled files, and when the file migration tasks are rescheduled after the fault, restarting operation from the last finished checkpoint;