CN114442907B - Data migration method and device, server and network system - Google Patents
Data migration method and device, server and network system Download PDFInfo
- Publication number
- CN114442907B CN114442907B CN202011219098.1A CN202011219098A CN114442907B CN 114442907 B CN114442907 B CN 114442907B CN 202011219098 A CN202011219098 A CN 202011219098A CN 114442907 B CN114442907 B CN 114442907B
- Authority
- CN
- China
- Prior art keywords
- data
- migration
- queue
- data queue
- original data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本申请涉及数据存储技术领域,尤其涉及一种数据迁移方法和装置、服务器、网络系统。The present application relates to the technical field of data storage, and in particular to a data migration method and device, a server, and a network system.
背景技术Background technique
数据通道产品多用于上游系统发布消息下游系统订阅使用消息,消息在数据通道中以一种先进先出队列的形式进行存储,下游任务也是顺序读取数据并处理。Data channel products are mostly used for upstream systems to publish messages and downstream systems to subscribe to and use messages. Messages are stored in the data channel in the form of a first-in-first-out queue, and downstream tasks also read and process data sequentially.
当前大部分的消息通道产品的数据高可靠都是通过数据冗余实现,即业务数据在集群中多份副本存储。当出现部分节点故障或节点扩缩容时就需要进行必要数据副本迁移,以保证原有数据的高可靠特性。现有技术的数据副本迁移数据量较大,迁移速度较慢,并且会消耗大量的磁盘io、网络带宽等资源,势必会对原有正常业务产生冲击。Currently, the high reliability of data in most message channel products is achieved through data redundancy, that is, business data is stored in multiple copies in the cluster. When some nodes fail or nodes are expanded or reduced, necessary data copies need to be migrated to ensure the high reliability of the original data. The existing data copy migration technology has a large amount of data, a slow migration speed, and consumes a lot of disk IO, network bandwidth and other resources, which is bound to have an impact on the original normal business.
发明内容Summary of the invention
本申请实施例提供了一种数据迁移方法和装置、服务器、网络系统,在保证上下游数据能够有效传输且不会发生数据遗失问题的前提下,能够针对消费者的不同消费情况选择不同迁移策略,有助于提升副本迁移完成的速度和降低存储数据的磁盘的io资源消耗,进而减少副本迁移对正常业务的影响。The embodiments of the present application provide a data migration method and device, a server, and a network system. Under the premise of ensuring that upstream and downstream data can be effectively transmitted and that data loss does not occur, different migration strategies can be selected according to different consumption situations of consumers, which helps to improve the speed of completing replica migration and reduce the IO resource consumption of the disk storing data, thereby reducing the impact of replica migration on normal business.
第一方面,本申请实施例提供了一种数据迁移方法,所述数据迁移方法包括:响应迁移第一服务器中的原始数据队列的数据的迁移指令,预测消费者读取所述原始数据队列的第一数据的总次数,其中,所述第一数据为收到所述迁移指令时所述原始数据队列中已写入的数据;当所述总次数小于或等于设定次数时,选择第一迁移策略来确定开始迁移位置,其中,所述第一迁移策略的开始迁移位置位于所述原始数据队列的最早数据位置的下游;按照设定顺序将所述原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列,其中,所述设定顺序为从所选迁移策略的开始迁移位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序。In a first aspect, an embodiment of the present application provides a data migration method, the data migration method comprising: responding to a migration instruction for migrating data in an original data queue in a first server, predicting the total number of times a consumer reads the first data in the original data queue, wherein the first data is the data that has been written in the original data queue when the migration instruction is received; when the total number of times is less than or equal to a set number of times, selecting a first migration strategy to determine a start migration position, wherein the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue; copying the data in the original data queue to a second server in a set order to form a migration data queue, wherein the set order is an order from the start migration position of the selected migration strategy to the position of the data most recently written to the original data queue by the producer during the migration process.
在上述方案中,当接收到迁移第一服务器中的原始数据队列中的数据的迁移指令时,可先预测消费者在收到迁移指令后再次读取原始数据队列的第一数据的总次数,若总次数小于或等于设定次数,即消费者读取原始数据队列的第一数据的次数较少,则选择第一迁移策略来确定开始迁移位置,并按照从所选迁移策略的开始迁移位置至生产者在迁移过程中最新写入原始数据队列的数据的位置的顺序,将原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列,由于第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,这样减少了第一数据的拷贝数量,并且由于数据先存储至页面缓存中,一定时间后页面缓存中的数据再存入磁盘中,即未拷贝的数据一般是位于磁盘中的数据,从而增大了从页面缓存中读取数据的比例,使得发生真正的磁盘读取的概率减小,从而降低存储数据的磁盘的io资源消耗,减少副本迁移对正常业务的影响,并且不需要迁移原始数据队列中所有的数据,从而能够提升副本迁移完成的速度。In the above scheme, when a migration instruction for migrating data in the original data queue in the first server is received, the total number of times the consumer reads the first data in the original data queue again after receiving the migration instruction can be predicted. If the total number is less than or equal to the set number, that is, the number of times the consumer reads the first data in the original data queue is small, the first migration strategy is selected to determine the start migration position, and the data in the original data queue is copied to the second server in the order from the start migration position of the selected migration strategy to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue. Since the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue, the number of copies of the first data is reduced, and since the data is first stored in the page cache, the data in the page cache is stored in the disk after a certain period of time, that is, the data that is not copied is generally the data located in the disk, thereby increasing the proportion of data read from the page cache, reducing the probability of a real disk read, thereby reducing the io resource consumption of the disk storing the data, reducing the impact of the copy migration on normal business, and there is no need to migrate all the data in the original data queue, thereby improving the speed of completing the copy migration.
在一种可能的实现方式中,所述选择第一迁移策略来确定开始迁移位置,具体包括:获取当前时刻所述原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,所述当前时刻为确定所述总次数小于所述设定次数的时刻,所述最新数据位置为所述生产者在所述当前时刻写入所述原始数据队列的数据的位置;当所述最早活跃消费者的读取位置与所述最新数据位置之间的距离小于或等于设定距离时,确定所述第一迁移策略的开始迁移位置为所述最新数据位置。In a possible implementation, the selecting of the first migration strategy to determine the starting migration position specifically includes: obtaining the reading position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when it is determined that the total number of times is less than the set number of times, and the latest data position is the position of the data written by the producer to the original data queue at the current moment; when the distance between the reading position of the earliest active consumer and the latest data position is less than or equal to the set distance, determining that the starting migration position of the first migration strategy is the latest data position.
也就是说,在该实现方式中,确定总次数小于设定次数的当前时刻原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离,最早活跃消费者为原始数据队列的实时消费者,由于最早活跃消费者离最新数据位置最远,因此当其为实时消费者时,其他活跃消费者也均为实时消费者,故可确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置,这样无需拷贝原始数据队列的最早数据位置与确定总次数小于设定次数时刻的最新数据位置之间的第一数据,能够提升副本迁移完成的速度,并且这样主要是从页面缓存中读取数据,减小了发生真正的磁盘读取的概率,降低存储数据的磁盘的io资源消耗,使得磁盘io资源可主要用于正常业务,减少了副本迁移对正常业务的影响。That is to say, in this implementation, it is determined that the distance between the reading position of the earliest active consumer in the original data queue at the current moment when the total number of times is less than the set number and the latest data position is less than or equal to the set distance. The earliest active consumer is the real-time consumer of the original data queue. Since the earliest active consumer is farthest from the latest data position, when it is a real-time consumer, other active consumers are also real-time consumers. Therefore, the starting migration position of the first migration strategy can be determined to be the latest data position of the original data queue. In this way, there is no need to copy the first data between the earliest data position of the original data queue and the latest data position at the moment when the total number of times is determined to be less than the set number, which can improve the speed of completing the copy migration. In this way, data is mainly read from the page cache, which reduces the probability of real disk reads and reduces the io resource consumption of the disk storing data, so that the disk io resources can be mainly used for normal business, reducing the impact of copy migration on normal business.
在一种可能的实现方式中,当所述开始迁移位置为所述最新数据位置时,所述数据迁移方法还包括:在数据迁移过程中获取所述原始数据队列的最早活跃消费者的读取位置;当所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据时,确定数据迁移过程中所述原始数据队列的最早活跃消费者的读取位置进入所述原始数据队列的第二数据的范围内,完成数据迁移,其中,所述第二数据为所述迁移数据队列已拷贝的数据。In a possible implementation, when the starting migration position is the latest data position, the data migration method further includes: obtaining the reading position of the earliest active consumer of the original data queue during the data migration process; when the migration data queue is copied to the data most recently written to the original data queue by the producer, determining that the reading position of the earliest active consumer of the original data queue during the data migration process enters the range of the second data of the original data queue, and completing the data migration, wherein the second data is the data that has been copied by the migration data queue.
也就是说,在该实现方式中,由于确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置,该原始数据队列中的所有消费者为原始数据队列的实时消费者,当迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据时,只要迁移过程中的最早活跃消费者的读取位置进入原始数据队列的已被迁移数据队列拷贝的第二数据的范围,就可保证原始数据队列的所有活跃消费者的读取位置均进入第二数据的范围内,即可完成数据迁移。That is to say, in this implementation, since the starting migration position of the first migration strategy is determined to be the latest data position of the original data queue, all consumers in the original data queue are real-time consumers of the original data queue. When the migrated data queue is copied to the data most recently written to the original data queue by the producer, as long as the reading position of the earliest active consumer in the migration process enters the range of the second data of the original data queue that has been copied by the migrated data queue, it can be ensured that the reading positions of all active consumers of the original data queue are within the range of the second data, and the data migration can be completed.
在一种可能的实现方式中,所述数据迁移方法还包括:当所述最早活跃消费者的读取位置与所述最新数据位置之间的距离大于设定距离时,确定所述第一迁移策略的开始迁移位置为所述最早活跃消费者的读取位置。In a possible implementation, the data migration method further includes: when the distance between the reading position of the earliest active consumer and the latest data position is greater than a set distance, determining the starting migration position of the first migration strategy to be the reading position of the earliest active consumer.
也就是说,在该实现方式中,当原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离时,即在所有活跃消费者中至少最早活跃消费者为非实时消费者,其在原始数据队列的第一数据位置停留时间较长,此时,可确定第一迁移策略的开始迁移位置为最早活跃消费者的读取位置,使得最早活跃消费者在数据迁移的整个过程中能够一直正常工作,并且无需拷贝原始数据队列的最早数据位置与最早活跃消费者的读取位置之间的第一数据,能够提升副本迁移完成的速度,一定程度上可降低发生真实磁盘读取的概率,从而降低存储数据的磁盘的io资源消耗,减少副本迁移对正常业务的影响。That is to say, in this implementation, when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, that is, at least the earliest active consumer among all active consumers is a non-real-time consumer, and it stays in the first data position of the original data queue for a long time, at this time, the starting migration position of the first migration strategy can be determined as the reading position of the earliest active consumer, so that the earliest active consumer can work normally throughout the entire process of data migration, and there is no need to copy the first data between the earliest data position of the original data queue and the reading position of the earliest active consumer, which can improve the speed of completing the copy migration and reduce the probability of real disk reading to a certain extent, thereby reducing the io resource consumption of the disk storing data and reducing the impact of copy migration on normal business.
在一种可能的实现方式中,当所述开始迁移位置为所述最早活跃消费者的读取位置,所述数据迁移方法还包括:确定所述迁移数据队列是否拷贝到所述生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, when the migration start position is the earliest active consumer's reading position, the data migration method further includes: determining whether the migration data queue is copied to the data most recently written by the producer to the original data queue to complete the data migration.
也就是说,在该实现方式中,由于第一迁移策略的开始迁移位置为最早活跃消费者的读取位置,这样在数据迁移过程中迁移数据队列拷贝到生产者最新写入所述原始数据队列的数据时,可保证所有活跃消费者位于原始数据队列的已被拷贝的第二数据的范围内,从而完成数据迁移。That is to say, in this implementation, since the starting migration position of the first migration strategy is the reading position of the earliest active consumer, when the migration data queue is copied to the data most recently written by the producer to the original data queue during the data migration process, it can be ensured that all active consumers are within the range of the copied second data of the original data queue, thereby completing the data migration.
在一种可能的实现方式中,所述获取所述原始数据队列的最早活跃消费者的读取位置,具体包括:获取每个活跃消费者在读取的所有数据队列中的读取位置,其中,所述所有数据队列包括所述原始数据队列;将所述每个活跃消费者与所述每个活跃消费者在读取的所述所有数据队列中的读取位置的映射结构转存为每个数据队列与所述每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得所述原始数据队列的所有活跃消费者的读取位置;比较所述原始数据队列中的所有活跃消费者的读取位置,以获取所述原始数据队列的最早活跃消费者的读取位置。In a possible implementation, obtaining the reading position of the earliest active consumer of the original data queue specifically includes: obtaining the reading position of each active consumer in all the data queues that are read, wherein the all the data queues include the original data queue; transferring the mapping structure of each active consumer and the reading position of each active consumer in all the data queues that are read into a mapping structure of each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers of the original data queue; and comparing the reading positions of all the active consumers in the original data queues to obtain the reading position of the earliest active consumer of the original data queue.
也就是说,在该实现方式中,无法直接获得原始数据队列存在的所有活跃消费者以及该些活跃消费者的读取位置,但可以获得每个活跃消费者读取的数据队列以及在数据队列中的读取位置,这样可先获取每个活跃消费者在读取的所有数据队列中的读取位置,然后将每个活跃消费者与每个活跃消费者在读取的所有数据队列中的读取位置的映射结构/对应关系转存为每个数据队列与每个数据队列中的所有活跃消费者的读取位置的映射结构/对应关系,从而获得了原始数据队列中存在的活跃消费者及其读取位置,再比较原始数据队列中的所有活跃消费者的读取位置,即可获取原始数据队列中的最早活跃消费者的读取位置。That is to say, in this implementation, it is impossible to directly obtain all active consumers existing in the original data queue and the reading positions of these active consumers, but it is possible to obtain the data queue read by each active consumer and the reading position in the data queue. In this way, the reading position of each active consumer in all the data queues read can be obtained first, and then the mapping structure/correspondence between each active consumer and the reading position of each active consumer in all the data queues read can be transferred to the mapping structure/correspondence between each data queue and the reading positions of all active consumers in each data queue, thereby obtaining the active consumers existing in the original data queue and their reading positions, and then comparing the reading positions of all active consumers in the original data queue to obtain the reading position of the earliest active consumer in the original data queue.
在一种可能的实现方式中,在所述获取每个活跃消费者在读取的所有数据队列中的读取位置前,所述获取所述原始数据队列的最早活跃消费者的读取位置,还包括:获取所有消费者在读取的所述数据队列中的读取位置,其中,所述所有消费者包括非活跃消费者和活跃消费者;查询所述所有消费者的读取状态以将所述所有消费者中的每个消费者划分为所述活跃消费者或所述非活跃消费者,其中,所述活跃消费者的读取状态为正在工作中,所述非活跃消费者的读取状态为暂停;将所述非活跃消费者和所述非活跃消费者在读取的所述数据队列中的读取位置的信息去除。In a possible implementation, before obtaining the reading position of each active consumer in all the data queues being read, obtaining the reading position of the earliest active consumer in the original data queue also includes: obtaining the reading positions of all consumers in the data queue being read, wherein all consumers include inactive consumers and active consumers; querying the reading status of all consumers to classify each of the all consumers as the active consumer or the inactive consumer, wherein the reading status of the active consumer is working and the reading status of the inactive consumer is paused; removing the information of the inactive consumer and the reading position of the inactive consumer in the data queue being read.
也就是说,在该实现方式中,不方便直接获取所有活跃消费者在读取的所有数据队列中的读取位置,可先获取所有消费者在读取的所述数据队列中的读取位置,所有消费者包括非活跃消费者和活跃消费者。然后,查询所有消费者的读取状态以将消费者划分为活跃消费者和非活跃消费者,活跃消费者的读取状态为正在工作中,非活跃消费者的读取状态为暂停。接着,将非活跃消费者和非活跃消费者在读取的数据队列中的读取位置的信息去除,即可获取每个活跃消费者在读取的所有数据队列中的读取位置。That is to say, in this implementation, it is not convenient to directly obtain the reading positions of all active consumers in all data queues being read. You can first obtain the reading positions of all consumers in the data queue being read, and all consumers include inactive consumers and active consumers. Then, query the reading status of all consumers to divide consumers into active consumers and inactive consumers. The reading status of active consumers is working, and the reading status of inactive consumers is paused. Then, remove the information of the reading positions of inactive consumers and inactive consumers in the data queue being read, and you can obtain the reading position of each active consumer in all data queues being read.
在一种可能的实现方式中,在完成数据迁移后,所述数据迁移方法还包括:确定所述消费者有读取所述原始数据队列的未迁移的所述第一数据的情形,按照设定策略删除所述原始数据队列;或,确定所述消费者没有读取所述原始数据队列的未迁移的所述第一数据的情形,将所述原始数据队列删除。In a possible implementation, after completing the data migration, the data migration method further includes: determining that the consumer has read the first data that has not been migrated in the original data queue, and deleting the original data queue according to a set strategy; or determining that the consumer has not read the first data that has not been migrated in the original data queue, and deleting the original data queue.
也就是说,在该实现方式中,当选择第一迁移策略来确定开始迁移位置时,由于第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,因此原始数据队列中的最早数据位置至开始迁移位置之间的数据没有拷贝到迁移数据队列中,在完成数据迁移后需要判断原始数据队列中未迁移的第一数据是否有被消费者读取的情形,如果没有读取的情形,可以删除原始数据队列;如果有读取的情形,则可按照设定策略删除原始数据队列,设定策略例如可为在消费者读取未迁移的第一数据后再删除原始数据队列,保证不会影响正常工作。That is to say, in this implementation, when the first migration strategy is selected to determine the starting migration position, since the starting migration position of the first migration strategy is located downstream of the earliest data position of the original data queue, the data between the earliest data position in the original data queue and the starting migration position are not copied to the migration data queue. After the data migration is completed, it is necessary to determine whether the first data that has not been migrated in the original data queue has been read by the consumer. If it has not been read, the original data queue can be deleted; if it has been read, the original data queue can be deleted according to the set strategy. The set strategy can, for example, be to delete the original data queue after the consumer reads the first data that has not been migrated, to ensure that it will not affect normal work.
在一种可能的实现方式中,所述数据迁移方法包括:当所述总次数大于设定次数时,则选择第二迁移策略来确定开始迁移位置,其中,所述第二迁移策略的开始迁移位置为所述原始数据队列的最早数据位置。In a possible implementation, the data migration method includes: when the total number of times is greater than a set number of times, selecting a second migration strategy to determine a start migration position, wherein the start migration position of the second migration strategy is the earliest data position of the original data queue.
也就是说,在该实现方式中,当消费者读取原始数据队列的第一数据的总次数大于设定次数时,由于选择第一迁移策略完成数据迁移后,虽然可按照设定策略删除原始数据队列,以保证消费者在迁移完成后还可以再次读取原始数据队列中未迁移的第一数据,但使消费者从迁移数据队列切换至原始数据队列的消耗较大,因此在消费者读取原始数据队列的第一数据的次数较多时,为了降低消耗,可选择第二迁移策略,以便从原始数据队列的最早数据位置开始迁移,保证消费者在迁移数据队列中能够完成所有读取业务。That is to say, in this implementation, when the total number of times the consumer reads the first data in the original data queue is greater than the set number of times, after the data migration is completed by selecting the first migration strategy, although the original data queue can be deleted according to the set strategy to ensure that the consumer can read the unmigrated first data in the original data queue again after the migration is completed, the consumption of switching the consumer from the migrated data queue to the original data queue is large. Therefore, when the consumer reads the first data in the original data queue a large number of times, in order to reduce consumption, the second migration strategy can be selected to start migration from the earliest data position in the original data queue to ensure that the consumer can complete all reading operations in the migrated data queue.
在一种可能的实现方式中,所述数据迁移方法还包括:确定所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, the data migration method further includes: determining that the migration data queue is copied to the data most recently written by the producer into the original data queue, thereby completing the data migration.
也就是说,在该实现方式中,当选择第二迁移策略从原始数据队列的最早数据位置开始迁移时,迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,可保证迁移数据队列拷贝到原始数据队列的所有数据,则完成数据迁移。That is to say, in this implementation, when the second migration strategy is selected to start migration from the earliest data position of the original data queue, when the migration data queue copies the data most recently written by the producer to the original data queue, it can be guaranteed that the migration data queue copies all the data to the original data queue, and the data migration is completed.
在一种可能的实现方式中,所述数据迁移方法还包括:在完成数据迁移后将所述原始数据队列删除。In a possible implementation manner, the data migration method further includes: deleting the original data queue after completing the data migration.
也就是说,在该实现方式中,由于第二迁移策略的开始迁移位置为原始数据队列的最早数据位置,迁移数据队列拷贝到生产者最新写入原始数据队列的数据而完成数据迁移后,迁移数据队列可拷贝到原始数据队列的所有数据,因此在完成数据迁移后可将原始数据队列立即删除,以便释放内存。That is to say, in this implementation, since the starting migration position of the second migration strategy is the earliest data position of the original data queue, after the migration data queue copies the data most recently written by the producer to the original data queue and completes the data migration, the migration data queue can copy all the data of the original data queue. Therefore, after the data migration is completed, the original data queue can be deleted immediately to free up memory.
第二方面,本申请实施例提供了一种数据迁移方法,所述数据迁移方法包括:响应迁移第一服务器中的原始数据队列的数据的迁移指令,获取当前时刻所述原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,所述当前时刻为收到所述迁移指令的时刻,所述最新数据位置为生产者在所述当前时刻写入所述原始数据队列的数据的位置;当所述最早活跃消费者的读取位置与所述最新数据位置之间的距离小于或等于设定距离时,将所述原始数据队列中的数据按照从所述最新数据位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列;和/或,当所述最早活跃消费者的读取位置与所述最新数据位置之间的距离大于设定距离时,将所述原始数据队列中的数据按照从所述最早活跃消费者的读取位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列。In a second aspect, an embodiment of the present application provides a data migration method, which includes: responding to a migration instruction for migrating data in an original data queue in a first server, obtaining the read position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when the migration instruction is received, and the latest data position is the position of the data written by the producer to the original data queue at the current moment; when the distance between the read position of the earliest active consumer and the latest data position is less than or equal to a set distance, copying the data in the original data queue to the second server in the order from the latest data position to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue; and/or, when the distance between the read position of the earliest active consumer and the latest data position is greater than the set distance, copying the data in the original data queue to the second server in the order from the read position of the earliest active consumer to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue.
在上述方案中,当收到迁移指令的当前时刻的原始数据队列的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离时,原始数据队列中的所有活跃消费者为实时消费者,因此从所述原始数据队列的最新数据位置开始迁移数据;当收到迁移指令的当前时刻的原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离时,至少原始数据队列中的最早活跃消费者不是实时消费者,因此从所述原始数据队列中的最早活跃消费者的读取位置处开始迁移数据,并且,将第一服务器中的的原始数据队列的数据按照从开始迁移位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,相对于从最早数据位置开始迁移数据的方案,由于生产者写入原始数据队列的数据先存放至页面缓存中一定时间后再由页面缓存中存入磁盘中,所以本申请实施例的数据迁移方法主要从页面缓存中拷贝数据进行迁移,这样降低了从磁盘中拷贝第一数据的概率,即降低了发生真正的磁盘读取的概率,减少了副本迁移对磁盘的正常业务的影响,并且不需要迁移原始数据队列的所有数据,能够提升副本迁移完成的速度。In the above scheme, when the distance between the reading position of the earliest active consumer in the original data queue at the current moment when the migration instruction is received and the latest data position is less than or equal to the set distance, all active consumers in the original data queue are real-time consumers, and therefore data migration starts from the latest data position of the original data queue; when the distance between the reading position of the earliest active consumer in the original data queue at the current moment when the migration instruction is received and the latest data position is greater than the set distance, at least the earliest active consumer in the original data queue is not a real-time consumer, and therefore data migration starts from the reading position of the earliest active consumer in the original data queue, and the data of the original data queue in the first server is migrated. The data is copied to the second server in the order from the starting migration position to the position of the data most recently written by the producer to the original data queue during the migration process to form a migration data queue. Compared with the scheme of migrating data from the earliest data position, since the data written by the producer to the original data queue is first stored in the page cache for a certain period of time and then stored in the disk from the page cache, the data migration method of the embodiment of the present application mainly copies data from the page cache for migration, which reduces the probability of copying the first data from the disk, that is, reduces the probability of a real disk read, reduces the impact of the copy migration on the normal business of the disk, and does not need to migrate all the data in the original data queue, which can improve the speed of completing the copy migration.
在一种可能的实现方式中,当从所述最新数据位置开始迁移数据时,所述数据迁移方法包括:在数据迁移过程中获取所述原始数据队列的最早活跃消费者的读取位置;当所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据时,确定数据迁移过程中所述原始数据队列的最早活跃消费者的读取位置进入所述原始数据队列的第二数据范围内,完成数据迁移,其中,所述第二数据为所述迁移数据队列已拷贝的数据。In one possible implementation, when migrating data starting from the latest data position, the data migration method includes: obtaining the reading position of the earliest active consumer of the original data queue during the data migration process; when the migrated data queue is copied to the data most recently written to the original data queue by the producer, determining that the reading position of the earliest active consumer of the original data queue during the data migration process enters the second data range of the original data queue, and completing the data migration, wherein the second data is the data that has been copied by the migrated data queue.
也就是说,在该实现方式中,由于开始迁移位置为原始数据队列的最新数据位置,原始数据队列中的所有消费者为实时消费者,当迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据时,只要迁移过程中距离最新数据位置最远的最早活跃消费者的读取位置进入迁移数据队列的范围,就可保证原始数据队列的所有活跃消费者的读取位置均进入迁移数据队列的范围内,即可完成数据迁移。That is to say, in this implementation, since the starting migration position is the latest data position of the original data queue, all consumers in the original data queue are real-time consumers, when the migration data queue is copied to the data most recently written by the producer to the original data queue, as long as the reading position of the earliest active consumer farthest from the latest data position during the migration process enters the range of the migration data queue, it can be ensured that the reading positions of all active consumers of the original data queue are within the range of the migration data queue, and the data migration can be completed.
在一种可能的实现方式中,当从所述最早活跃消费者的读取位置开始迁移数据时,所述数据迁移方法还包括:确定所述迁移数据队列拷贝到生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, when migrating data starting from the read position of the earliest active consumer, the data migration method further includes: determining that the migration data queue is copied to the data most recently written by the producer to the original data queue, to complete the data migration.
也就是说,在该实现方式中,由于开始迁移位置为最早活跃消费者的读取位置,这样在数据迁移过程中迁移数据队列拷贝到生产者最新写入所述原始数据队列的数据时,可保证所有活跃消费者位于迁移数据队列的范围内,从而完成数据迁移。That is to say, in this implementation, since the starting migration position is the reading position of the earliest active consumer, when the migration data queue is copied to the data most recently written by the producer to the original data queue during the data migration process, it can be ensured that all active consumers are within the range of the migration data queue, thereby completing the data migration.
在一种可能的实现方式中,所述获取所述原始数据队列中的最早活跃消费者的读取位置,具体包括:获取每个活跃消费者在读取的所有数据队列中的读取位置,其中,所述所有数据队列包括所述原始数据队列;将所述每个活跃消费者与所述每个活跃消费者在读取的所述所有数据队列中的读取位置的映射结构转存为每个数据队列与所述每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得所述原始数据队列的所有活跃消费者的读取位置;比较所述原始数据队列中的所有活跃消费者的读取位置,以获取所述原始数据队列的最早活跃消费者的读取位置。In a possible implementation, obtaining the reading position of the earliest active consumer in the original data queue specifically includes: obtaining the reading position of each active consumer in all the data queues that are read, wherein the all the data queues include the original data queue; transferring the mapping structure of each active consumer and the reading position of each active consumer in all the data queues that are read into a mapping structure of each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers in the original data queue; and comparing the reading positions of all the active consumers in the original data queue to obtain the reading position of the earliest active consumer of the original data queue.
也就是说,在该实现方式中,无法直接获得原始数据队列存在的所有活跃消费者以及该些活跃消费者的读取位置,但可以获得每个活跃消费者读取的数据队列以及在数据队列中的读取位置,这样可将每个活跃消费者与每个活跃消费者在读取的所有数据队列中的读取位置的映射结构/对应关系,转存为每个数据队列与每个数据队列中的所有活跃消费者的读取位置的映射结构/对应关系,从而获得了原始数据队列中存在的活跃消费者及其读取位置,再比较原始数据队列中的所有活跃消费者的读取位置,即可获取原始数据队列中的最早活跃消费者的读取位置。That is to say, in this implementation, it is impossible to directly obtain all active consumers in the original data queue and the reading positions of these active consumers, but it is possible to obtain the data queue read by each active consumer and the reading position in the data queue. In this way, the mapping structure/correspondence between each active consumer and the reading position of each active consumer in all data queues read by each active consumer can be transferred to the mapping structure/correspondence between each data queue and the reading positions of all active consumers in each data queue, thereby obtaining the active consumers existing in the original data queue and their reading positions, and then comparing the reading positions of all active consumers in the original data queue to obtain the reading position of the earliest active consumer in the original data queue.
在一种可能的实现方式中,在所述获取每个活跃消费者在读取的所有数据队列中的读取位置前,所述获取所述原始数据队列的最早活跃消费者的读取位置,还包括:获取所有消费者在读取的所述数据队列中的读取位置,其中,所述所有消费者包括非活跃消费者和活跃消费者;查询所述所有消费者的读取状态以将所述所有消费者中的每个消费者划分为所述活跃消费者和所述非活跃消费者,其中,所述活跃消费者的读取状态为正在工作中,所述非活跃消费者的读取状态为暂停;将所述非活跃消费者和所述非活跃消费者在读取的所述数据队列中的读取位置的信息去除。In a possible implementation, before obtaining the reading position of each active consumer in all the data queues being read, obtaining the reading position of the earliest active consumer in the original data queue also includes: obtaining the reading positions of all consumers in the data queue being read, wherein all consumers include inactive consumers and active consumers; querying the reading status of all consumers to divide each of the all consumers into the active consumer and the inactive consumer, wherein the reading status of the active consumer is working and the reading status of the inactive consumer is paused; removing the information of the inactive consumer and the reading position of the inactive consumer in the data queue being read.
也就是说,在该实现方式中,不方便直接获取所有活跃消费者在读取的所有数据队列中的读取位置,可先获取所有消费者在读取的所述数据队列中的读取位置,所有消费者包括非活跃消费者和活跃消费者。然后,查询所有消费者的读取状态以将消费者划分为活跃消费者和非活跃消费者,活跃消费者的读取状态为正在工作中,非活跃消费者的读取状态为暂停。接着,将非活跃消费者和非活跃消费者在读取的数据队列中的读取位置的信息去除,即可获取每个活跃消费者在读取的所有数据队列中的读取位置。That is to say, in this implementation, it is not convenient to directly obtain the reading positions of all active consumers in all data queues being read. You can first obtain the reading positions of all consumers in the data queue being read, and all consumers include inactive consumers and active consumers. Then, query the reading status of all consumers to divide consumers into active consumers and inactive consumers. The reading status of active consumers is working, and the reading status of inactive consumers is paused. Then, remove the information of the reading positions of inactive consumers and inactive consumers in the data queue being read, and you can obtain the reading position of each active consumer in all data queues being read.
在一种可能的实现方式中,在完成数据迁移后,所述数据迁移方法还包括:确定所述消费者有读取所述原始数据队列的未迁移的第一数据的情形,按照设定策略删除所述原始数据队列,其中,所述第一数据为收到所述迁移指令时所述原始数据队列中已写入的数据;或,确定所述消费者没有读取所述原始数据队列的未迁移的所述第一数据的情形,将所述原始数据队列删除。In a possible implementation, after completing the data migration, the data migration method further includes: determining that the consumer has read the unmigrated first data of the original data queue, and deleting the original data queue according to a set strategy, wherein the first data is the data written in the original data queue when the migration instruction is received; or determining that the consumer has not read the unmigrated first data of the original data queue, and deleting the original data queue.
也就是说,在该实现方式中,在完成数据迁移后需要判断原始数据队列中未迁移的第一数据是否有被消费者再次读取的情形,如果没有再次读取的情况,可以立即删除原始数据队列;如果有再次读取的情形,则可按照设定策略删除原始数据队列,例如在消费者再次读取未迁移的第一数据后再删除原始数据队列,从而保证不会影响正常工作。That is to say, in this implementation, after completing the data migration, it is necessary to determine whether the first data that has not been migrated in the original data queue has been read again by the consumer. If it has not been read again, the original data queue can be deleted immediately; if it has been read again, the original data queue can be deleted according to the set strategy, for example, the original data queue can be deleted after the consumer reads the first data that has not been migrated again, so as to ensure that it will not affect normal work.
第三方面,本申请实施例提供了一种数据迁移装置,所述数据迁移装置包括:预测单元,用于响应迁移第一服务器中的原始数据队列的数据的迁移指令,预测所述消费者读取所述原始数据队列的第一数据的总次数,其中,所述第一数据为收到所述迁移指令时所述原始数据队列中已写入的数据;选择单元,用于在所述总次数小于或等于设定次数时,选择第一迁移策略来确定开始迁移位置,其中,所述第一迁移策略的开始迁移位置位于所述原始数据队列的最早数据位置的下游;迁移单元,用于按照设定顺序将所述原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列,其中,所述设定顺序为从所选迁移策略的开始迁移位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序。In a third aspect, an embodiment of the present application provides a data migration device, comprising: a prediction unit, for responding to a migration instruction for migrating data in an original data queue in a first server, and predicting the total number of times the consumer reads the first data in the original data queue, wherein the first data is the data that has been written in the original data queue when the migration instruction is received; a selection unit, for selecting a first migration strategy to determine a start migration position when the total number of times is less than or equal to a set number of times, wherein the start migration position of the first migration strategy is downstream of the earliest data position of the original data queue; a migration unit, for copying the data in the original data queue to the second server in a set order to form a migration data queue, wherein the set order is an order from the start migration position of the selected migration strategy to the position of the data most recently written to the original data queue by the producer during the migration process.
在一种可能的实现方式中,所述选择单元包括:获取模块,用于获取当前时刻所述原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,所述当前时刻为确定所述总次数小于所述设定次数的时刻,所述最新数据位置为所述生产者在所述当前时刻写入所述原始数据队列的数据的位置;确定模块,用于在所述最早活跃消费者的读取位置与所述最新数据位置之间的距离小于或等于设定距离时,确定所述第一迁移策略的开始迁移位置为所述最新数据位置。In a possible implementation, the selection unit includes: an acquisition module, used to acquire the read position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when it is determined that the total number of times is less than the set number of times, and the latest data position is the position of the data written by the producer to the original data queue at the current moment; a determination module, used to determine that the starting migration position of the first migration strategy is the latest data position when the distance between the read position of the earliest active consumer and the latest data position is less than or equal to the set distance.
在一种可能的实现方式中,当所述开始迁移位置为所述最新数据位置时,所述获取模块还用于在数据迁移过程中获取所述原始数据队列的最早活跃消费者的读取位置;所述数据迁移装置还包括:第一确定单元,用于在所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据时,确定数据迁移过程中所述原始数据队列的最早活跃消费者的读取位置进入所述原始数据队列的第二数据的范围内,完成数据迁移,其中,所述第二数据为所述迁移数据队列已拷贝的数据。In a possible implementation, when the starting migration position is the latest data position, the acquisition module is also used to obtain the reading position of the earliest active consumer of the original data queue during the data migration process; the data migration device also includes: a first determination unit, used to determine that the reading position of the earliest active consumer of the original data queue during the data migration process enters the range of the second data of the original data queue when the migration data queue is copied to the data most recently written to the original data queue by the producer, thereby completing the data migration, wherein the second data is the data that has been copied by the migration data queue.
在一种可能的实现方式中,所述确定模块,还用于在所述最早活跃消费者的读取位置与所述最新数据位置之间的距离大于设定距离时,确定所述第一迁移策略的开始迁移位置为所述最早活跃消费者的读取位置。In a possible implementation, the determination module is further configured to determine that the starting migration position of the first migration strategy is the reading position of the earliest active consumer when the distance between the reading position of the earliest active consumer and the latest data position is greater than a set distance.
在一种可能的实现方式中,当所述开始迁移位置为所述最早活跃消费者的读取位置时,所述数据迁移装置还包括:第二确定单元,用于确定所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, when the starting migration position is the reading position of the earliest active consumer, the data migration device also includes: a second determination unit, used to determine the data copied from the migration data queue to the data most recently written by the producer to the original data queue to complete the data migration.
在一种可能的实现方式中,所述获取模块包括:第一获取子模块,用于获取每个活跃消费者在读取的所有数据队列中的读取位置,其中,所述所有数据队列包括所述原始数据队列;转存子模块,用于将所述每个活跃消费者与所述每个活跃消费者在读取的所述所有数据队列中的读取位置的映射结构转存为每个数据队列与所述每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得所述原始数据队列的所有活跃消费者的读取位置;比较获取子模块,用于比较所述原始数据队列中的所有活跃消费者的读取位置,以获取所述原始数据队列的最早活跃消费者的读取位置。In a possible implementation, the acquisition module includes: a first acquisition submodule, used to acquire the reading position of each active consumer in all data queues read, wherein the all data queues include the original data queue; a transfer submodule, used to transfer the mapping structure of each active consumer and the reading position of each active consumer in all data queues read into a mapping structure of each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers of the original data queue; and a comparison acquisition submodule, used to compare the reading positions of all active consumers in the original data queue to obtain the reading position of the earliest active consumer of the original data queue.
在一种可能的实现方式中,所述获取模块还包括:第二获取子模块,用于获取所有消费者在读取的所述数据队列中的读取位置,其中,所述所有消费者包括非活跃消费者和活跃消费者;查询子模块,用于查询所述所有消费者的读取状态以将所述所有消费者中的每个消费者划分为所述活跃消费者或所述非活跃消费者,其中,所述活跃消费者的读取状态为正在工作中,所述非活跃消费者的读取状态为暂停;去除子模块,用于将所述非活跃消费者和所述非活跃消费者在读取的所述数据队列中的读取位置的信息去除。In a possible implementation, the acquisition module also includes: a second acquisition submodule, used to acquire the reading position of all consumers in the data queue being read, wherein all consumers include inactive consumers and active consumers; a query submodule, used to query the reading status of all consumers to divide each of all consumers into the active consumer or the inactive consumer, wherein the reading status of the active consumer is working, and the reading status of the inactive consumer is paused; a removal submodule, used to remove the inactive consumer and the information of the reading position of the inactive consumer in the data queue being read.
在一种可能的实现方式中,在完成数据迁移后,所述数据迁移装置还包括:第一删除单元,用于确定所述消费者有读取所述原始数据队列的未迁移的所述第一数据的情形,按照设定策略删除所述原始数据队列;或,用于确定所述消费者没有读取所述原始数据队列的未迁移的所述第一数据的情形,将所述原始数据队列删除。In a possible implementation, after completing the data migration, the data migration device also includes: a first deletion unit, used to determine whether the consumer has read the first data that has not been migrated in the original data queue, and delete the original data queue according to a set strategy; or, used to determine whether the consumer has not read the first data that has not been migrated in the original data queue, and delete the original data queue.
在一种可能的实现方式中,所述选择单元,还用于在所述总次数大于设定次数时,选择第二迁移策略来确定开始迁移位置,其中,所述第二迁移策略的开始迁移位置为所述原始数据队列的最早数据位置。In a possible implementation, the selection unit is further configured to select a second migration strategy to determine a start migration position when the total number of times is greater than a set number of times, wherein the start migration position of the second migration strategy is the earliest data position of the original data queue.
在一种可能的实现方式中,所述数据迁移装置还包括:第三确定单元,用于确定所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, the data migration device further includes: a third determination unit, configured to determine that the migration data queue is copied to the data most recently written by the producer into the original data queue, to complete the data migration.
在一种可能的实现方式中,所述数据迁移装置还包括:第二删除单元,用于在完成数据迁移后将所述原始数据队列删除。In a possible implementation manner, the data migration device further includes: a second deleting unit, configured to delete the original data queue after completing the data migration.
第四方面,本申请实施例提供了一种数据迁移装置,所述数据迁移装置包括:获取模块,用于响应迁移第一服务器中的原始数据队列的数据的迁移指令,获取当前时刻所述原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,所述当前时刻为收到所述迁移指令的时刻,所述最新数据位置为生产者在所述当前时刻写入所述原始数据队列的数据的位置;迁移模块,用于在所述最早活跃消费者的读取位置与所述最新数据位置之间的距离小于或等于设定距离时,将所述原始数据队列中的数据按照从最新数据位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列;和/或,用于在所述最早活跃消费者的读取位置与所述最新数据位置之间的距离大于设定距离时,将所述原始数据队列中的数据按照从所述最早活跃消费者的读取位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列。In a fourth aspect, an embodiment of the present application provides a data migration device, which includes: an acquisition module, which is used to respond to a migration instruction for migrating data of an original data queue in a first server, and acquire the read position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when the migration instruction is received, and the latest data position is the position of the data written by the producer to the original data queue at the current moment; a migration module, which is used to copy the data in the original data queue in the order from the latest data position to the position of the data most recently written to the original data queue by the producer during the migration process to the second server to form a migration data queue when the distance between the read position of the earliest active consumer and the latest data position is less than or equal to a set distance; and/or, when the distance between the read position of the earliest active consumer and the latest data position is greater than a set distance, copy the data in the original data queue in the order from the read position of the earliest active consumer to the position of the data most recently written to the original data queue by the producer during the migration process to the second server to form a migration data queue.
在一种可能的实现方式中,当从所述最新数据位置开始迁移数据时,所述获取模块还用于在数据迁移过程中获取所述原始数据队列的最早活跃消费者的读取位置;所述数据迁移装置还包括:第一确定模块,用于在所述迁移数据队列拷贝到生产者最新写入所述原始数据队列的数据时,确定数据迁移过程中所述原始数据队列的最早活跃消费者的读取位置进入所述原始数据队列的第二数据范围内,完成数据迁移,其中,所述第二数据为所述迁移数据队列已拷贝的数据。In one possible implementation, when migrating data starting from the latest data position, the acquisition module is also used to obtain the reading position of the earliest active consumer of the original data queue during the data migration process; the data migration device also includes: a first determination module, which is used to determine that the reading position of the earliest active consumer of the original data queue during the data migration process enters the second data range of the original data queue when the migration data queue is copied to the data most recently written to the original data queue by the producer, thereby completing the data migration, wherein the second data is the data that has been copied by the migration data queue.
在一种可能的实现方式中,当从所述最早活跃消费者的读取位置开始迁移数据时,所述数据迁移装置还包括:第二确定模块,用于确定所述迁移数据队列拷贝到所述生产者最新写入所述原始数据队列的数据,完成数据迁移。In a possible implementation, when migrating data starting from the reading position of the earliest active consumer, the data migration device also includes: a second determination module, used to determine the data copied from the migration data queue to the data most recently written by the producer to the original data queue to complete the data migration.
在一种可能的实现方式中,所述获取模块包括:第一获取子模块,用于获取每个活跃消费者在读取的所有数据队列中的读取位置,其中,所述所有数据队列包括所述原始数据队列;转存子模块,用于将所述每个活跃消费者与所述每个活跃消费者在读取的所述所有数据队列中的读取位置的映射结构转存为每个数据队列与所述每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得所述原始数据队列的所有活跃消费者的读取位置;比较获取子模块,用于比较所述原始数据队列中的所有活跃消费者的读取位置,以获取所述原始数据队列的最早活跃消费者的读取位置。In a possible implementation, the acquisition module includes: a first acquisition submodule, used to acquire the reading position of each active consumer in all data queues read, wherein the all data queues include the original data queue; a transfer submodule, used to transfer the mapping structure of each active consumer and the reading position of each active consumer in all data queues read into a mapping structure of each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers of the original data queue; and a comparison acquisition submodule, used to compare the reading positions of all active consumers in the original data queue to obtain the reading position of the earliest active consumer of the original data queue.
在一种可能的实现方式中,所述获取模块还包括:第二获取子模块,用于获取所有消费者在读取的所述数据队列中的读取位置,其中,所述所有消费者包括非活跃消费者和活跃消费者;查询子模块,用于查询所述所有消费者的读取状态以将所述所有消费者中的每个消费者划分为所述活跃消费者或所述非活跃消费者,其中,所述活跃消费者的读取状态为正在工作中,所述非活跃消费者的读取状态为暂停;去除子模块,用于将所述非活跃消费者和所述非活跃消费者在读取的所述数据队列中的读取位置的信息去除。In a possible implementation, the acquisition module also includes: a second acquisition submodule, used to acquire the reading position of all consumers in the data queue being read, wherein all consumers include inactive consumers and active consumers; a query submodule, used to query the reading status of all consumers to divide each of all consumers into the active consumer or the inactive consumer, wherein the reading status of the active consumer is working, and the reading status of the inactive consumer is paused; a removal submodule, used to remove the inactive consumer and the information of the reading position of the inactive consumer in the data queue being read.
在一种可能的实现方式中,在完成数据迁移后,所述数据迁移装置还包括:删除模块,用于确定所述消费者有读取所述原始数据队列的未迁移的第一数据的情形,按照设定策略删除所述原始数据队列,其中,所述第一数据为收到所述迁移指令时所述原始数据队列中已写入的数据;或,用于确定所述消费者没有读取所述原始数据队列的未迁移的所述第一数据的情形,将所述原始数据队列删除。In a possible implementation, after completing the data migration, the data migration device also includes: a deletion module, used to determine whether the consumer has read the unmigrated first data of the original data queue, and delete the original data queue according to a set strategy, wherein the first data is the data written in the original data queue when the migration instruction is received; or, used to determine whether the consumer has not read the unmigrated first data of the original data queue, and delete the original data queue.
第五方面,本申请实施例提供了一种服务器,包括:收发器,用于接收和发送数据;存储器,存储有计算机程序;处理器,用于执行所述存储器所存储的计算机程序,以使所述服务器实现上述的数据迁移方法,其中,所述服务器为上述的第一服务器或第二服务器。In a fifth aspect, an embodiment of the present application provides a server, comprising: a transceiver for receiving and sending data; a memory storing a computer program; and a processor for executing the computer program stored in the memory so that the server implements the above-mentioned data migration method, wherein the server is the above-mentioned first server or second server.
第六方面,本申请实施例提供了一种网络系统,包括:第一服务器和第二服务器,其中,所述第一服务器或所述第二服务器能够执行上述的数据迁移方法。In a sixth aspect, an embodiment of the present application provides a network system, comprising: a first server and a second server, wherein the first server or the second server is capable of executing the above-mentioned data migration method.
第七方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现上述的数据迁移方法。In a seventh aspect, an embodiment of the present application provides a computer storage medium, in which a computer program is stored. When the computer program is executed by a processor, the above-mentioned data migration method is implemented.
第八方面,本申请实施例提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得所述计算机执行上述的数据迁移方法。In an eighth aspect, an embodiment of the present application provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the above-mentioned data migration method.
本申请实施例的方案,在保证上下游数据能够有效传输且不会发生数据遗失问题的前提下,能够针对消费者的不同消费情况选择不同的迁移策略,具体地,当消费者读取原始数据队列的第一数据的总次数较少时,即总次数小于或等于设定次数时,可选择第一迁移策略来确定开始迁移位置,其中,第一数据为收到迁移指令时原始数据队列中已写入的数据,第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,这样减少了第一数据的拷贝数量,由于数据先存储至页面缓存中,一定时间后页面缓存中的数据再存入磁盘中,即未拷贝的数据一般是位于磁盘中的数据,从而增大了从页面缓存中读取数据的比例,使得发生真正的磁盘读取的概率减小,从而降低了存储数据的磁盘的io资源消耗,减少了副本迁移对正常业务的影响,并且无需拷贝原始数据队列的所有数据,提升了副本迁移完成的速度。The scheme of the embodiment of the present application can select different migration strategies for different consumption situations of consumers, under the premise of ensuring that upstream and downstream data can be effectively transmitted and no data loss problem occurs. Specifically, when the total number of times the consumer reads the first data of the original data queue is small, that is, the total number is less than or equal to the set number, the first migration strategy can be selected to determine the starting migration position, wherein the first data is the data written in the original data queue when the migration instruction is received, and the starting migration position of the first migration strategy is located downstream of the earliest data position of the original data queue, thereby reducing the number of copies of the first data. Since the data is first stored in the page cache, and the data in the page cache is stored in the disk after a certain period of time, that is, the data that is not copied is generally the data located on the disk, thereby increasing the proportion of data read from the page cache, reducing the probability of real disk reading, thereby reducing the io resource consumption of the disk storing data, reducing the impact of copy migration on normal business, and there is no need to copy all the data in the original data queue, thereby improving the speed of completing the copy migration.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是一种Kafka集群的架构图;Figure 1 is an architectural diagram of a Kafka cluster;
图2是图1所示的Kafka集群进行数据迁移时的一种场景示意图;FIG2 is a schematic diagram of a scenario when the Kafka cluster shown in FIG1 performs data migration;
图3是图1所示的Kafka集群适用的一种数据迁移方法的过程图;FIG3 is a process diagram of a data migration method applicable to the Kafka cluster shown in FIG1 ;
图4是按照图3所示的数据迁移方法迁移数据时磁盘利用率的变化图;FIG4 is a diagram showing changes in disk utilization when migrating data according to the data migration method shown in FIG3 ;
图5是本申请实施例提供的一种数据迁移方法的流程图;FIG5 is a flow chart of a data migration method provided in an embodiment of the present application;
图6是图5中的步骤531的具体流程图;FIG6 is a specific flow chart of step 531 in FIG5 ;
图7是第一迁移策略的第一种方案的过程图;FIG7 is a process diagram of a first solution of a first migration strategy;
图8是第一迁移策略的第二种方案的过程图;FIG8 is a process diagram of a second solution of the first migration strategy;
图9是采用本申请实施例优化后的数据迁移方法与采用原来的数据迁移方法时磁盘利用率的对比图;FIG9 is a comparison chart of disk utilization when the optimized data migration method according to the embodiment of the present application is used and the original data migration method is used;
图10是本申请实施例提供的另一种数据迁移方法的流程图;FIG10 is a flow chart of another data migration method provided in an embodiment of the present application;
图11是本申请实施例提供的一种数据迁移装置的结构示意图;FIG11 is a schematic diagram of the structure of a data migration device provided in an embodiment of the present application;
图12是图11中的选择单元的结构示意图;FIG12 is a schematic diagram of the structure of the selection unit in FIG11;
图13是图12中的获取模块的结构示意图;FIG13 is a schematic diagram of the structure of the acquisition module in FIG12;
图14是本申请实施例提供的另一种数据迁移装置的结构示意图;FIG14 is a schematic diagram of the structure of another data migration device provided in an embodiment of the present application;
图15是本申请实施例提供的一种服务器的结构示意图;FIG15 is a schematic diagram of the structure of a server provided in an embodiment of the present application;
图16是本申请实施例提供的一种网络系统的结构示意图。FIG. 16 is a schematic diagram of the structure of a network system provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本申请实施例中的技术方案进行描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below in conjunction with the accompanying drawings.
在本申请实施例的描述中,“示例性的”、“例如”或者“举例来说”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”、“例如”或者“举例来说”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”、“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of the present application, words such as "exemplary", "for example" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary", "for example" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary", "for example" or "for example" is intended to present related concepts in a concrete way.
在本申请实施例的描述中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,单独存在B,同时存在A和B这三种情况。另外,除非另有说明,术语“多个”的含义是指两个或两个以上。例如,多个系统是指两个或两个以上的系统,多个屏幕终端是指两个或两个以上的屏幕终端。In the description of the embodiments of the present application, the term "and/or" is merely a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B may represent: A exists alone, B exists alone, and A and B exist at the same time. In addition, unless otherwise specified, the term "multiple" means two or more. For example, multiple systems refers to two or more systems, and multiple screen terminals refers to two or more screen terminals.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. The terms "include", "comprises", "has" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.
数据通道产品多用于上游系统发布消息下游系统订阅使用消息,消息在数据通道中以一种先进先出队列的形式进行存储,下游任务也是顺序获取数据并处理。消息队列一般在架构设计中起到解耦、削峰、异步处理的作用,生产者往队列里写消息,消费者从队列里读取消息进行业务逻辑。Kafka是一种高吞吐量的分布式发布订阅消息系统,具有高性能、持久化、多副本备份、横向扩展能力,它可以处理消费者在网站中的所有动作流数据。这种动作,如网页浏览、搜索和其他用户的行动,是在现代网络上的许多社会功能的一个关键因素。这些数据通常是由于吞吐量的要求而通过处理日志和日志聚合来解决。Data channel products are mostly used for upstream systems to publish messages and downstream systems to subscribe to and use messages. Messages are stored in the data channel in the form of a first-in-first-out queue, and downstream tasks also sequentially obtain and process data. Message queues generally play the role of decoupling, peak shaving, and asynchronous processing in architectural design. Producers write messages to the queue, and consumers read messages from the queue to perform business logic. Kafka is a high-throughput distributed publish-subscribe messaging system with high performance, persistence, multi-copy backup, and horizontal expansion capabilities. It can handle all action stream data of consumers on the website. Such actions, such as web browsing, searching, and other user actions, are a key factor in many social functions on modern networks. These data are usually solved by processing logs and log aggregation due to throughput requirements.
图1是一种Kafka集群的架构图。如图1所示,Kafka集群包含一个或多个服务器,这种服务器被称为代理(broker),每条发布到Kafka集群的消息都有一个主题(Topic),同一个主题下的消息/数据的类型相同。每个主题包含一个或多个分区(Partition),分区是物理上的概念,每个分区内需要保证消息有序。生产者(Productor)是向代理的指定主题中写入消息/数据的客户端,消费者(Consumer)是从代理中读取指定主题的消息/数据以进行业务处理的客户端。为了实现水平扩展,可以通过增加分区的数量来进行横向扩容。每新写一条消息/数据,Kafka就在对应的文件追加(append)副本数据,以保证原有数据的高可靠特性。Figure 1 is an architectural diagram of a Kafka cluster. As shown in Figure 1, a Kafka cluster contains one or more servers, which are called brokers. Each message published to the Kafka cluster has a topic, and the types of messages/data under the same topic are the same. Each topic contains one or more partitions. Partitions are physical concepts, and messages in each partition must be kept in order. Producers are clients that write messages/data to the specified topic of the broker, and consumers are clients that read messages/data from the specified topic from the broker for business processing. In order to achieve horizontal expansion, horizontal expansion can be achieved by increasing the number of partitions. For each new message/data written, Kafka appends a copy of the data to the corresponding file to ensure the high reliability of the original data.
在图1中,主题0有两个分区,即分区0和分区1。每个分区有三个副本备份。实线箭头指向生产者写入消息/数据的主题,其为领导者(leader)副本,提供数据生产和消费服务;虚线箭头指向的主题为跟随者(follower)副本,其从leader副本中复制(copy)消息/数据,在leader副本出现问题后,follower副本可被选为leader副本来提供服务。双点划线箭头连接的主题为消费者读取的副本。当代理实例进行扩、缩容时,为了副本在各个代理上尽可能均衡,就需要进行数据迁移的工作,也就是本申请实施例的方案主要使用的场景。In Figure 1, topic 0 has two partitions, namely partition 0 and partition 1. Each partition has three replica backups. The solid arrow points to the topic to which the producer writes messages/data, which is the leader replica, providing data production and consumption services; the topic pointed to by the dotted arrow is the follower replica, which copies messages/data from the leader replica. When a problem occurs in the leader replica, the follower replica can be selected as the leader replica to provide services. The topics connected by the double-dash arrows are the replicas read by the consumers. When the proxy instance is expanded or reduced in capacity, in order to make the replicas as balanced as possible on each proxy, data migration is required, which is the main scenario used in the solution of the embodiment of the present application.
图2是图1所示的Kafka集群进行数据迁移时的一种场景示意图。具体地,图2的场景为进行扩容时的示意图。如图2所示,在进行扩容时,可增加代理3,并可将代理0中的“主题0分区0”和代理1中的“主题0分区1”的数据拷贝至代理3中,如图2中的单点划线箭头所示。另外,当进行缩容时,假设此时Kafka集群包括代理1-5,每个代理包括4个分区,可将代理5中的四个分区分别迁移至代理1-4中。FIG2 is a schematic diagram of a scenario when the Kafka cluster shown in FIG1 performs data migration. Specifically, the scenario of FIG2 is a schematic diagram when capacity expansion is performed. As shown in FIG2, when capacity expansion is performed, agent 3 can be added, and the data of "Topic 0 Partition 0" in agent 0 and "Topic 0 Partition 1" in agent 1 can be copied to agent 3, as shown by the single-point dashed arrow in FIG2. In addition, when capacity reduction is performed, assuming that the Kafka cluster includes agents 1-5 at this time, each agent includes 4 partitions, and the four partitions in agent 5 can be migrated to agents 1-4 respectively.
图3是图1所示的Kafka集群适用的一种数据迁移方法的过程图。如图3所示,在该数据迁移方法中,从原始数据队列的最早数据位置开始拷贝,在数据迁移过程中,原始数据队列仍然在提供服务,消费者1和消费者2处于读取数据的工作状态中,并且,生产者仍然在将新的数据写入到原始数据队列中,图3的下部图“迁移完成”的原始数据队列中的黑色矩形代表的数据为数据拷贝开始后生产者新写入的数据。经过漫长的数据拷贝,当原始数据队列中所有的数据都拷贝到新的数据队列中后完成数据迁移。迁移数据队列开始提供服务,此后新写入的数据将写入到迁移数据队列中,原始数据队列中的数据被删除。FIG3 is a process diagram of a data migration method applicable to the Kafka cluster shown in FIG1. As shown in FIG3, in the data migration method, copying starts from the earliest data position of the original data queue. During the data migration process, the original data queue is still providing services, consumers 1 and 2 are in the working state of reading data, and the producer is still writing new data to the original data queue. The data represented by the black rectangle in the original data queue of the lower figure "Migration Completed" in FIG3 is the data newly written by the producer after the data copy starts. After a long data copy, the data migration is completed when all the data in the original data queue is copied to the new data queue. The migrated data queue starts to provide services, and thereafter the newly written data will be written to the migrated data queue, and the data in the original data queue will be deleted.
图4是按照图3所示的数据迁移方法迁移数据时磁盘利用率的变化图。如图3和图4所示,该数据迁移方法存在以下缺点:FIG4 is a diagram showing changes in disk utilization when data is migrated according to the data migration method shown in FIG3. As shown in FIG3 and FIG4, the data migration method has the following disadvantages:
1、在数据迁移结束前,仍有大量数据(如图3的下部图“迁移完成”的原始数据队列中的黑色矩形代表的数据)写入原始数据队列中,如果不是迁移速度远大于数据写入速度,则迁移数据过程会耗费大量时间。1. Before the data migration is completed, there is still a large amount of data (such as the data represented by the black rectangle in the original data queue of "Migration Completed" in the lower figure of Figure 3) written into the original data queue. If the migration speed is not much faster than the data writing speed, the data migration process will take a lot of time.
2、数据拷贝过程中有大量的历史数据读写工作,占用存储数据的磁盘的大量io资源,影响正常业务,并且Linux借助于页面缓存来缓存近期读写数据,而拷贝过早历史数据很难命中页面缓存,导致发生真实的磁盘读取,如图4所示,在开始迁移位置处,磁盘的利用率直线上升,并且在迁移过程中一直保持较高利用率,会对磁盘中的其他业务产生影响。2. During the data copy process, a large amount of historical data reading and writing work occurs, which occupies a large amount of IO resources of the disk storing the data, affecting normal business. In addition, Linux uses page cache to cache recent read and write data, and it is difficult for historical data copied too early to hit the page cache, resulting in real disk reading. As shown in Figure 4, at the beginning of the migration, the utilization rate of the disk rises sharply and remains at a high level during the migration process, which will affect other businesses on the disk.
由于消息通道产品的主要功能是提供上游业务数据发布以及下游业务的数据订阅,因此在数据迁移过程中要保证上下游数据能够有效地传输,不会遗失数据。本申请实施例提供了一种数据迁移方法、数据迁移装置、服务器和网络系统,可应用于消息通道产品如kafka,也可应用于其他类似的存储系统,如分布式文件系统(Hadoop Distributed FileSystem,hdfs),在保证上下游数据能够有效传输、不会遗失数据的前提下,能够针对消费者的不同消费情况选择迁移策略,有助于提升数据迁移效率和降低存储数据的磁盘的io资源消耗,进而减少副本迁移对正常业务的影响。Since the main function of the message channel product is to provide upstream business data publishing and downstream business data subscription, it is necessary to ensure that the upstream and downstream data can be effectively transmitted and no data will be lost during the data migration process. The embodiment of the present application provides a data migration method, a data migration device, a server and a network system, which can be applied to message channel products such as kafka, and can also be applied to other similar storage systems, such as a distributed file system (Hadoop Distributed FileSystem, HDFS). Under the premise of ensuring that the upstream and downstream data can be effectively transmitted and no data will be lost, the migration strategy can be selected according to the different consumption situations of consumers, which helps to improve the efficiency of data migration and reduce the IO resource consumption of the disk storing data, thereby reducing the impact of replica migration on normal business.
图5是本申请实施例提供的一种数据迁移方法的流程图。该数据迁移方法用于迁移数据队列中的数据,其中,生产者用于向数据队列中按顺序写入数据,至少一个消费者用于从数据队列中按顺序读取数据。如图5所示,该数据迁移方法包括:FIG5 is a flow chart of a data migration method provided by an embodiment of the present application. The data migration method is used to migrate data in a data queue, wherein a producer is used to write data to the data queue in sequence, and at least one consumer is used to read data from the data queue in sequence. As shown in FIG5 , the data migration method includes:
步骤51,响应迁移第一服务器中的原始数据队列的数据的迁移指令,预测消费者读取原始数据队列的第一数据的总次数,其中,第一数据为收到迁移指令时原始数据队列中已写入的数据,如图3中白色矩形代表的数据。另外,需说明的是,这里的“总次数”是在收到迁移指令后消费者再次读取第一数据的次数,总次数可基于消费者在收到迁移指令前的历史消费记录和消费者所属业务类型等进行预测。具体地,历史消费记录可包括每个消费者的重消费频率及其变化趋势。另外,若消费者的业务类型允许漏读数据,即当消费者需要重读第一数据时可以允许消费者不读取这些数据,该消费者的读取次数可按照0次计算。相应地,若消费者的业务类型不允许漏读数据,则消费者每次需要读取未迁移的第一数据时都需要计算次数。并且,总次数可以是不同消费者读取原始数据队列的第一数据的总次数,当同一个消费者多次读取第一数据时按照多次计算;若读取原始数据队列的消费者只有一个,总次数为该消费者读取原始数据队列的第一数据的次数。Step 51, in response to the migration instruction for migrating the data in the original data queue in the first server, predict the total number of times the consumer reads the first data in the original data queue, wherein the first data is the data written in the original data queue when the migration instruction is received, such as the data represented by the white rectangle in FIG3. In addition, it should be noted that the "total number of times" here refers to the number of times the consumer reads the first data again after receiving the migration instruction, and the total number of times can be predicted based on the historical consumption records of the consumer before receiving the migration instruction and the business type to which the consumer belongs. Specifically, the historical consumption records may include the re-consumption frequency of each consumer and its changing trend. In addition, if the business type of the consumer allows missed reading of data, that is, when the consumer needs to reread the first data, the consumer can be allowed not to read these data, and the number of times the consumer reads can be calculated as 0. Correspondingly, if the business type of the consumer does not allow missed reading of data, the number of times the consumer needs to read the unmigrated first data each time. In addition, the total number of times can be the total number of times different consumers read the first data in the original data queue, and when the same consumer reads the first data multiple times, it is calculated as multiple times; if there is only one consumer reading the original data queue, the total number of times is the number of times the consumer reads the first data in the original data queue.
步骤52,判断总次数是否小于或等于设定次数。Step 52, determining whether the total number of times is less than or equal to the set number of times.
其中,设定次数可根据工作需要进行设置,例如可为3次或者5次。The number of times can be set according to work needs, for example, 3 times or 5 times.
步骤53,若判断结果为是,则选择第一迁移策略来确定开始迁移位置,其中,第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游。Step 53: If the judgment result is yes, then select the first migration strategy to determine the start migration position, wherein the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue.
继续参考图3,这里的“原始数据队列的最早数据位置”是原始数据队列中的最早消息(即最早数据)的位置,在图3中,最早数据位置为原始数据队列最左侧的数据位置,“下游”是指位于“最早消息”后面的位置,如下面将介绍的“最新数据位置”,即生产者写入原始数据队列的数据的位置,在图3中,“最新数据位置”为原始数据队列最右侧的数据位置,再如“最早活跃消费者位置”,即位于原始数据队列最前面的活跃消费者的位置,在图3中,“最早活跃消费者位置”为消费者1的位置。Continuing to refer to Figure 3, the "earliest data position of the original data queue" here is the position of the earliest message (i.e., the earliest data) in the original data queue. In Figure 3, the earliest data position is the data position on the far left of the original data queue. "Downstream" refers to the position behind the "earliest message", such as the "latest data position" to be introduced below, which is the position of the data written by the producer to the original data queue. In Figure 3, the "latest data position" is the data position on the far right of the original data queue. Another example is the "earliest active consumer position", which is the position of the active consumer at the front of the original data queue. In Figure 3, the "earliest active consumer position" is the position of consumer 1.
步骤54,按照设定顺序将原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列,其中,设定顺序为从所选迁移策略的开始迁移位置至生产者在迁移过程中最新写入原始数据队列的数据的位置的顺序。Step 54, copying the data in the original data queue to the second server in a set order to form a migration data queue, wherein the set order is the order from the start migration position of the selected migration strategy to the position of the data most recently written into the original data queue by the producer during the migration process.
在上述方案中,当接收到迁移第一服务器中的原始数据队列的迁移指令时,可先预测消费者在收到迁移指令后再次读取原始数据队列的第一数据的总次数,若总次数小于或等于设定次数,即消费者读取原始数据队列的第一数据的次数较少,可选择第一迁移策略来确定开始迁移位置,由于第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,这样无需拷贝原始数据队列的最早数据位置与第一迁移策略的开始迁移位置之间的第一数据,能够提升副本迁移完成的速度。另外,生产者写入原始数据队列中的数据首先存放至页面缓存中,经过一段时间后,存放在页面缓存中的数据会存入磁盘中,迁移数据队列时,首先查看页面缓存中是否有需要迁移的数据,若有则从页面缓存中迁移数据,这样不会发生真实的磁盘读取,若页面缓存中没有需要迁移的数据,则从磁盘中迁移数据,会发生真实的磁盘读取。当开始迁移位置位于原始数据队列的最早数据位置的下游时,能够减少第一数据的迁移数量,从而增大从页面缓存中迁移数据的比例,可降低发生真实磁盘读取的概率,减少存储数据的磁盘的io资源消耗,磁盘io资源可主要用于正常业务,降低了副本迁移对正常业务的影响。In the above scheme, when receiving the migration instruction of the original data queue in the first server, the total number of times the consumer reads the first data of the original data queue again after receiving the migration instruction can be predicted. If the total number is less than or equal to the set number, that is, the number of times the consumer reads the first data of the original data queue is small, the first migration strategy can be selected to determine the start migration position. Since the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue, there is no need to copy the first data between the earliest data position of the original data queue and the start migration position of the first migration strategy, which can improve the speed of copy migration completion. In addition, the data written by the producer to the original data queue is first stored in the page cache. After a period of time, the data stored in the page cache will be stored in the disk. When migrating the data queue, first check whether there is data in the page cache that needs to be migrated. If there is, migrate the data from the page cache, so that no real disk read occurs. If there is no data in the page cache that needs to be migrated, migrate the data from the disk, and a real disk read will occur. When the start migration position is downstream of the earliest data position in the original data queue, the number of migrations of the first data can be reduced, thereby increasing the proportion of data migrated from the page cache, reducing the probability of real disk reads, and reducing the io resource consumption of the disk storing data. The disk io resources can be mainly used for normal business, reducing the impact of replica migration on normal business.
具体地,第一迁移策略可包括两种方案,第一种方案的开始迁移位置为原始数据队列的最新数据位置;第二种方案的开始迁移位置为原始数据队列中的最早活跃消费者的读取位置。另外,下面将介绍的第二迁移策略包括一种方案,即第三种方案,其开始迁移位置为原始数据队列的最早数据位置。其中,第一种方案和第二种方案的开始迁移位置位于原始数据队列的最早数据位置(第三种方案的开始迁移位置)的下游。Specifically, the first migration strategy may include two schemes. The starting migration position of the first scheme is the latest data position of the original data queue; the starting migration position of the second scheme is the reading position of the earliest active consumer in the original data queue. In addition, the second migration strategy to be introduced below includes a scheme, namely the third scheme, whose starting migration position is the earliest data position of the original data queue. Among them, the starting migration positions of the first scheme and the second scheme are located downstream of the earliest data position of the original data queue (the starting migration position of the third scheme).
下面先对本申请实施例的数据迁移方法的第一种方案进行详细介绍。在主流的业务场景中,消费者的消费任务的位置(即读取位置)一般接近生产者写入原始数据队列的最新消息/数据位置。此时,如图5所示,步骤53选择第一迁移策略来确定开始迁移位置,具体可包括:The first scheme of the data migration method of the embodiment of the present application is first introduced in detail below. In mainstream business scenarios, the location of the consumer's consumption task (i.e., the reading location) is generally close to the latest message/data location written by the producer to the original data queue. At this time, as shown in Figure 5, step 53 selects the first migration strategy to determine the starting migration location, which may specifically include:
步骤531,获取当前时刻原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,这里的“当前时刻”为确定总次数小于设定次数的时刻,最新数据位置为生产者在该当前时刻写入原始数据队列的数据的位置。Step 531, obtain the reading position and latest data position of the earliest active consumer of the original data queue at the current moment, where the "current moment" here is the moment when the total number is determined to be less than the set number, and the latest data position is the position of the data written by the producer to the original data queue at the current moment.
步骤532,比较最早活跃消费者的读取位置与最新数据位置之间的距离是否小于或等于设定距离;Step 532, comparing whether the distance between the earliest active consumer's reading position and the latest data position is less than or equal to a set distance;
步骤533,若比较结果为是,则确定第一迁移策略的开始迁移位置为最新数据位置,即本申请实施例的数据迁移方法的第一种方案。也就是说,确定总次数小于设定次数的当前时刻的原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离,最早活跃消费者为实时消费者,最早活跃消费者会紧密跟随最新数据,由于最早活跃消费者为距离最新数据位置最远的消费者,当其为实时消费者时,可确定其他活跃消费者均为实时消费者,则可从原始数据队列的最新数据位置开始迁移数据。Step 533, if the comparison result is yes, then determine that the starting migration position of the first migration strategy is the latest data position, that is, the first scheme of the data migration method of the embodiment of the present application. That is to say, determine that the distance between the reading position of the earliest active consumer in the original data queue at the current moment whose total number of times is less than the set number and the latest data position is less than or equal to the set distance, the earliest active consumer is a real-time consumer, and the earliest active consumer will closely follow the latest data. Since the earliest active consumer is the consumer farthest from the latest data position, when it is a real-time consumer, it can be determined that other active consumers are all real-time consumers, and data can be migrated from the latest data position of the original data queue.
接着,可执行步骤54,按照设定顺序将原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列。Next, step 54 may be executed to copy the data in the original data queue to the second server according to the set sequence to form a migration data queue.
由于开始迁移位置为最新数据位置,此时迁移数据全部或大部分是从页面缓存中迁移,能够避免或大大减少发生真实的磁盘读取的情况,降低存储数据的磁盘的io资源消耗,即磁盘io资源可主要用于正常业务,减少了副本迁移对正常业务的影响,并且,迁移数据量较少,能够提升副本迁移完成的速度。Since the migration starts at the location of the latest data, all or most of the migrated data is migrated from the page cache at this time, which can avoid or greatly reduce the occurrence of real disk reads and reduce the IO resource consumption of the disk storing data. That is, the disk IO resources can be mainly used for normal business, reducing the impact of replica migration on normal business. In addition, the amount of migrated data is small, which can increase the speed of replica migration completion.
图6是图5中的步骤531的具体流程图。如图6所示,步骤531的获取原始数据队列的最早活跃消费者的读取位置,具体可包括:Fig. 6 is a specific flow chart of step 531 in Fig. 5. As shown in Fig. 6, obtaining the reading position of the earliest active consumer of the original data queue in step 531 may specifically include:
步骤5314,获取每个活跃消费者在读取的所有数据队列中的读取位置,其中,所有数据队列包括原始数据队列。Step 5314, obtaining the reading position of each active consumer in all the data queues being read, where all the data queues include the original data queue.
步骤5315,将每个活跃消费者与每个活跃消费者在读取的所有数据队列中的读取位置的映射结构转存为每个数据队列与每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得原始数据队列的所有活跃消费者的读取位置。Step 5315, transfer the mapping structure between each active consumer and the reading position of each active consumer in all data queues read into a mapping structure between each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers of the original data queue.
步骤5316,比较原始数据队列中的所有活跃消费者的读取位置,以获取原始数据队列的最早活跃消费者的读取位置。Step 5316, compare the read positions of all active consumers in the original data queue to obtain the read position of the earliest active consumer in the original data queue.
Kafka只有指定的消费者组(consumer group)查询所有主题数据队列的读取状态的应用程序界面(Application Program Interface,api),即只能获得消费者组(如图1所示)中的每个消费者读取的数据队列。而本申请实施例的方案需要知道每个主题数据队列中的所有下游活跃消费者的读取状态,以便判断从哪个油标/起始位置(offset)进行迁移以及何时完成迁移工作。为解决上述问题,本申请实施例采用如下方案;Kafka only has an Application Program Interface (API) for a specified consumer group to query the read status of all topic data queues, that is, it can only obtain the data queue read by each consumer in the consumer group (as shown in Figure 1). However, the solution of the embodiment of the present application needs to know the read status of all downstream active consumers in each topic data queue in order to determine which oil mark/starting position (offset) to migrate from and when the migration work is completed. To solve the above problems, the embodiment of the present application adopts the following solution;
首先,获取全量/所有活跃消费者读取的主题数据队列的信息,该信息包括数据队列和读取位置。此时,形成了Key值为活跃消费者且Value值为主题数据队列的映射(map)结构,如下表1所示。接着,将信息转存为主题数据队列为Key值且所有活跃消费者及其读取位置为Value值的映射结构,如下表2所示,使得api新增查看主题数据队列中的所有活跃消费者的读取位置的服务。其中,该api剔除了下游非活跃消费者的消费情况。这是由于很多消费活动只是临时消费,完成后就退出,因此需要排除这种情况,以确保活跃消费者(即有效消费者)的读取位置一直在向新的数据靠近。First, obtain the information of the subject data queue read by all/all active consumers, which includes the data queue and the reading position. At this point, a mapping structure is formed with the Key value being the active consumer and the Value value being the subject data queue, as shown in Table 1 below. Next, the information is transferred to a mapping structure with the subject data queue as the Key value and all active consumers and their reading positions as the Value value, as shown in Table 2 below, so that the API adds a new service to view the reading positions of all active consumers in the subject data queue. Among them, the API excludes the consumption of downstream inactive consumers. This is because many consumption activities are only temporary consumption and exit after completion, so this situation needs to be excluded to ensure that the reading position of active consumers (ie, valid consumers) is always close to new data.
表1Table 1
表2Table 2
也就是说,在该实现方式中,无法直接获得原始数据队列存在的所有活跃消费者以及该些活跃消费者的读取位置,但可以获得每个活跃消费者读取的数据队列以及在数据队列中的读取位置,这样可将每个活跃消费者与每个活跃消费者在读取的所有数据队列中的读取位置的映射结构/对应关系转存为每个数据队列与每个数据队列中的所有活跃消费者的读取位置的映射结构/对应关系,即获得了每个数据队列中存在的活跃消费者及其读取位置,再比较原始数据队列中的所有活跃消费者的读取位置,从而可以获取原始数据队列中的最早活跃消费者的读取位置。That is to say, in this implementation, it is impossible to directly obtain all active consumers existing in the original data queue and the reading positions of these active consumers, but it is possible to obtain the data queue read by each active consumer and the reading position in the data queue. In this way, the mapping structure/correspondence between each active consumer and the reading position of each active consumer in all data queues read can be transferred to the mapping structure/correspondence between each data queue and the reading positions of all active consumers in each data queue, that is, the active consumers existing in each data queue and their reading positions are obtained, and then the reading positions of all active consumers in the original data queue are compared, so that the reading position of the earliest active consumer in the original data queue can be obtained.
继续参考图6,获取原始数据队列的最早活跃消费者的读取位置时,在进行步骤5314前,还可以先进行以下步骤:Continuing to refer to FIG. 6 , when obtaining the reading position of the earliest active consumer of the original data queue, before performing step 5314 , the following steps may be performed first:
步骤5311,获取所有消费者在读取的数据队列中的读取位置,其中,所有消费者包括非活跃消费者和活跃消费者。Step 5311, obtaining the reading positions of all consumers in the read data queue, wherein all consumers include inactive consumers and active consumers.
步骤5312,查询所有消费者的读取状态以将每个消费者划分为活跃消费者或非活跃消费者,其中,活跃消费者的读取状态为正在工作中,非活跃消费者的读取状态为暂停。Step 5312, query the reading status of all consumers to classify each consumer as an active consumer or an inactive consumer, wherein the reading status of an active consumer is working, and the reading status of an inactive consumer is paused.
步骤5313,将非活跃消费者和非活跃消费者在读取的数据队列中的读取位置的信息去除。Step 5313, remove the inactive consumers and the information of the reading positions of the inactive consumers in the read data queue.
也就是说,在该实现方式中,不方便直接获取所有活跃消费者在读取的所有数据队列中的读取位置,可先获取所有消费者在读取的数据队列中的读取位置,所有消费者包括非活跃消费者和活跃消费者。然后,查询所有消费者的读取状态以将每个消费者分为活跃消费者或非活跃消费者,活跃消费者的读取状态为正在工作中,非活跃消费者的读取状态为暂停。接着,将非活跃消费者和非活跃消费者在读取的数据队列中的读取位置的信息去除,即可获取每个活跃消费者在读取的所有数据队列中的读取位置。That is to say, in this implementation, it is not convenient to directly obtain the reading positions of all active consumers in all data queues being read. You can first obtain the reading positions of all consumers in the data queues being read, and all consumers include inactive consumers and active consumers. Then, query the reading status of all consumers to classify each consumer as an active consumer or an inactive consumer. The reading status of an active consumer is working, and the reading status of an inactive consumer is paused. Then, remove the information of the reading positions of inactive consumers and inactive consumers in the data queues being read, and you can obtain the reading position of each active consumer in all data queues being read.
继续参考图5,当在步骤533确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置并在步骤54按照设定顺序对原始数据队列进行迁移后,数据迁移方法还包括:5, after determining in step 533 that the starting migration position of the first migration strategy is the latest data position of the original data queue and migrating the original data queue according to the set order in step 54, the data migration method further includes:
步骤55,在数据迁移过程中获取原始数据队列的最早活跃消费者的读取位置;Step 55, obtaining the reading position of the earliest active consumer of the original data queue during the data migration process;
步骤56,判断迁移数据队列是否拷贝到生产者最新写入原始数据队列的数据;Step 56, determine whether the migration data queue is copied to the data most recently written into the original data queue by the producer;
步骤57,若判断结果为是,则判断数据迁移过程中原始数据队列的最早活跃消费者的读取位置是否进入原始数据队列的第二数据的范围内,其中,第二数据为迁移数据队列已拷贝的数据;Step 57, if the judgment result is yes, then judging whether the reading position of the earliest active consumer of the original data queue during the data migration process enters the range of the second data of the original data queue, wherein the second data is the data that has been copied by the migration data queue;
步骤58,若判断结果为是,则完成数据迁移。完成数据迁移后,原始数据队列中不再写入新的数据。当迁移数据队列作为leader副本时,生产者开始向迁移数据队列中写入数据;当迁移数据队列作为follower副本时,迁移数据队列继续拷贝新的Leader副本中的数据。Step 58: If the result of the judgment is yes, the data migration is completed. After the data migration is completed, no new data is written to the original data queue. When the migration data queue is used as the leader copy, the producer starts to write data to the migration data queue; when the migration data queue is used as the follower copy, the migration data queue continues to copy the data in the new leader copy.
由于确定开始迁移位置为原始数据队列的最新数据位置,原始数据队列中的所有消费者为实时消费者,会紧随最新数据移动,这样在迁移数据队列拷贝到生产者最新写入原始数据队列的数据后,只要再确认距离最新数据位置最远的最早活跃消费者的读取位置位于迁移数据队列的范围,即可保证原始数据队列的所有活跃消费者的读取位置均进入迁移数据队列的范围内,从而完成数据迁移。Since the starting position of migration is determined to be the latest data position of the original data queue, all consumers in the original data queue are real-time consumers and will move closely following the latest data. In this way, after the migration data queue is copied to the data that the producer has recently written to the original data queue, as long as it is confirmed that the reading position of the earliest active consumer farthest from the latest data position is within the range of the migration data queue, it can be ensured that the reading positions of all active consumers in the original data queue are within the range of the migration data queue, thereby completing the data migration.
另外,需说明的是,步骤57处的判断迁移是否完成时的“最早活跃消费者”可以与步骤532中确定第一迁移策略的开始迁移位置时提到的“最早活跃消费者”相同或不同。In addition, it should be noted that the “earliest active consumer” used in determining whether the migration is complete in step 57 may be the same as or different from the “earliest active consumer” used in determining the start migration position of the first migration strategy in step 532 .
图7是第一迁移策略的第一种方案的过程图。由于数据通道的主要使用场景中下游消费任务的消费进度一般会紧随最新数据以保障获取数据的实时性,并且对于原始副本中的大量第一数据已经被下游任务处理过,非特殊需求下不会从新消费,因此,在第一种方案中,开始迁移位置为原始数据队列的最新数据位置,即生产者当前写入原始数据队列的数据的位置,如图7的上部分图“迁移前状态”中所示,迁移数据时,最新数据位置前的第一数据(即图7中白色矩形代表的数据)不进行拷贝,仅对最新写入原始数据队列的数据(即图7中原始数据队列中的黑色矩形框代表的数据)进行拷贝,这样保证了迁移副本能够立刻跟随原始副本的最新位置。此时,副本迁移完成的标志为迁移数据队列拷贝到原始数据队列的最新数据,并且下游所有活跃消费者的读取位置全部进入迁移数据队列的范围内,如图7的下部分图“迁移完成”中所示,迁移数据队列拷贝到原始数据队列的最新数据,并且消费者1和消费者2的读取位置进入了原始数据队列的已被拷贝的第二数据的范围内,从而保证原始数据队列(原始副本)退出服务后,迁移副本能够有效地提供服务。Figure 7 is a process diagram of the first solution of the first migration strategy. Since the consumption progress of downstream consumption tasks in the main use scenario of the data channel generally follows the latest data to ensure the real-time acquisition of data, and a large amount of first data in the original copy has been processed by downstream tasks and will not be consumed again unless there is a special need, therefore, in the first solution, the starting migration position is the latest data position of the original data queue, that is, the position of the data currently written by the producer to the original data queue, as shown in the upper part of Figure 7 "Pre-migration State", when migrating data, the first data before the latest data position (that is, the data represented by the white rectangle in Figure 7) is not copied, and only the latest data written to the original data queue (that is, the data represented by the black rectangle in the original data queue in Figure 7) is copied, which ensures that the migrated copy can immediately follow the latest position of the original copy. At this time, the sign of completion of the replica migration is that the migration data queue copies the latest data to the original data queue, and the reading positions of all active downstream consumers all enter the range of the migration data queue. As shown in the lower part of Figure 7 "Migration Completed", the migration data queue copies the latest data to the original data queue, and the reading positions of consumer 1 and consumer 2 enter the range of the second data that has been copied in the original data queue, thereby ensuring that the migrated replica can effectively provide services after the original data queue (original replica) exits the service.
上面参考图5-图7对第一迁移策略的第一种方案的迁移过程进行了介绍,下面对第一迁移策略的第二种方案的迁移过程进行介绍。继续参考图5,此时,数据迁移方法包括:The migration process of the first solution of the first migration strategy is introduced above with reference to FIG5 to FIG7 , and the migration process of the second solution of the first migration strategy is introduced below. Continuing to refer to FIG5 , at this time, the data migration method includes:
步骤534,若步骤532的比较结果为否,则确定第一迁移策略的开始迁移位置为原始数据队列中的最早活跃消费者的读取位置,即本申请实施例的数据迁移方法的第二种方案。Step 534, if the comparison result of step 532 is no, then determine that the starting migration position of the first migration strategy is the reading position of the earliest active consumer in the original data queue, that is, the second scheme of the data migration method of the embodiment of the present application.
也就是说,在接收到迁移指令并选择第一迁移策略后,此时原始数据队列的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离,所有活跃消费者中至少最早活跃消费者为原始数据队列的非实时消费者,其在原始数据队列的第一数据位置停留时间较长,若按照第一种方案迁移数据,该最早活跃消费者可能不会很快进入迁移数据队列的范围内,故可确定第一迁移策略的开始迁移位置为最早活跃消费者的读取位置,这样可保证最早活跃消费者在数据迁移过程中能够一直正常工作,并且无需拷贝原始数据队列的最早数据位置与最早活跃消费者的读取位置之间的第一数据,能够提升副本迁移完成的速度,可降低发生真实磁盘读取的概率,从而降低磁盘io资源消耗,减少了副本迁移对正常业务的影响。That is to say, after receiving the migration instruction and selecting the first migration strategy, the distance between the reading position of the earliest active consumer of the original data queue and the latest data position is greater than the set distance. Among all active consumers, at least the earliest active consumer is a non-real-time consumer of the original data queue, and it stays in the first data position of the original data queue for a long time. If the data is migrated according to the first solution, the earliest active consumer may not enter the range of the migration data queue quickly. Therefore, the starting migration position of the first migration strategy can be determined as the reading position of the earliest active consumer. This can ensure that the earliest active consumer can always work normally during the data migration process, and there is no need to copy the first data between the earliest data position of the original data queue and the reading position of the earliest active consumer. The speed of completing the copy migration can be improved, and the probability of real disk reading can be reduced, thereby reducing disk io resource consumption and reducing the impact of copy migration on normal business.
接着,可执行步骤54,按照设定顺序将原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列。并且,当确定第一迁移策略的开始迁移位置为最早活跃消费者的读取位置时,数据迁移方法还可包括:Next, step 54 may be executed to copy the data in the original data queue to the second server in a set order to form a migration data queue. Furthermore, when the start migration position of the first migration strategy is determined to be the reading position of the earliest active consumer, the data migration method may further include:
步骤55’,判断迁移数据队列是否拷贝到生产者最新写入原始数据队列的数据;Step 55', determine whether the migration data queue has copied the data most recently written into the original data queue by the producer;
步骤56’,若判断结果为是,则完成数据迁移。Step 56': if the judgment result is yes, the data migration is completed.
也就是说,对于一些非实时数据通道的消费任务,其消费进度可能会长时间停滞。如果采取第一种方案可能会出现消费者长时间不能进入原始数据队列的已被拷贝的第二数据的区域。因此,提供了开始迁移位置为最早活跃消费者的读取位置的第二种方案,这样在数据迁移过程中迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,可确保所有活跃消费者位于迁移数据队列的范围内,即可完成数据迁移。That is to say, for some consumption tasks of non-real-time data channels, their consumption progress may stagnate for a long time. If the first solution is adopted, it may happen that consumers cannot enter the area of the copied second data in the original data queue for a long time. Therefore, a second solution is provided in which the migration position is the reading position of the earliest active consumer. In this way, when the migration data queue is copied to the data most recently written to the original data queue by the producer during the data migration process, it can be ensured that all active consumers are within the range of the migration data queue, and the data migration can be completed.
图8是第一迁移策略的第二种方案的过程图。在第二种方案中,开始迁移位置为原始数据队列中的最早活跃消费者的读取位置,如图8的上部分图“迁移前状态”中所示的消费者1的读取位置。副本迁移完成的标志为迁移副本跟随上原始副本中最新写入的数据,如图8的下部分图“迁移完成”所示,迁移数据队列已拷贝到原始数据队列最右侧的最新数据。由图8可知,该方案实际迁移的数据只是包含原始数据中未被最早活跃消费者如消费者1读取的数据(包括迁移过程中新写入的数据,即原始数据队列中黑色矩形代表的数据),相比较现有的迁移方案,在保证消费者1和消费者2正常读取的前提下,能够大幅减少拷贝的数据量。FIG8 is a process diagram of the second scheme of the first migration strategy. In the second scheme, the starting migration position is the reading position of the earliest active consumer in the original data queue, such as the reading position of consumer 1 shown in the upper part of FIG8 "Pre-migration State". The sign of the completion of the copy migration is that the migrated copy follows the latest written data in the original copy, as shown in the lower part of FIG8 "Migration Completed", and the migrated data queue has been copied to the latest data on the far right of the original data queue. As can be seen from FIG8, the data actually migrated by this scheme only includes the data in the original data that has not been read by the earliest active consumer such as consumer 1 (including the newly written data during the migration process, that is, the data represented by the black rectangle in the original data queue). Compared with the existing migration scheme, the amount of copied data can be greatly reduced while ensuring the normal reading of consumers 1 and 2.
图9是采用本申请实施例优化后的数据迁移方法与采用原来的数据迁移方法时磁盘利用率的对比图。如图9所示,在本申请实施例优化后的第一种和第二种方案中,实际迁移的数据只有很少的一部分,并且因为拷贝原始副本的的数据都是最新写入的,数据大概率仍然存在操作系统的页面缓存(page Cache)中,页面缓存是对源自辅助存储设备(如硬盘驱动器或固态驱动器)的页面的透明缓存,因此读取原始副本一般不会触发真正的磁盘读取操作,这样不仅提升了读取性能,也减少了存储数据的磁盘io冲击给正常业务来的的影响。而采用原来的方案,即从最早数据位置开始迁移,拷贝过早的第一数据很难命中页面缓存,导致发生真实的磁盘读取,会对磁盘处理正常业务产生影响。FIG9 is a comparison chart of disk utilization when the optimized data migration method of the embodiment of the present application is used and the original data migration method is used. As shown in FIG9, in the first and second schemes optimized in the embodiment of the present application, only a small part of the data is actually migrated, and because the data of the original copy is the latest written, the data is likely to still exist in the page cache of the operating system. The page cache is a transparent cache of pages from auxiliary storage devices (such as hard disk drives or solid-state drives), so reading the original copy generally does not trigger a real disk read operation, which not only improves the read performance, but also reduces the impact of the disk io impact of the stored data on normal business. While using the original scheme, that is, starting the migration from the earliest data position, the first data copied too early is difficult to hit the page cache, resulting in a real disk read, which will affect the normal business of the disk processing.
另外,在上述第一种和第二种方案中,数据迁移完成后,迁移副本可以承担原始副本的工作,可根据数据是否存在读取原始数据队列的未迁移的第一数据的情形以及原始节点的磁盘负载情况,选择删除原始数据队列的时机,具体地,继续参考图5,在完成数据迁移后,该数据迁移方法还包括:In addition, in the first and second solutions above, after the data migration is completed, the migrated copy can take over the work of the original copy, and the timing of deleting the original data queue can be selected according to whether the data has a situation of reading the unmigrated first data of the original data queue and the disk load of the original node. Specifically, referring to FIG5 , after the data migration is completed, the data migration method further includes:
步骤59,判断是否有读取原始数据队列的未迁移的第一数据的情形。Step 59: determine whether there is a situation of reading the unmigrated first data in the original data queue.
步骤510,若判断结果为是,则按照设定策略删除原始数据队列。也就是说,原始数据队列不提供数据写入服务,数据会按照设定策略逐渐减少,直至彻底删除。业务如果存在读取第一数据的场景,仍然可以使用。Step 510: If the result of the judgment is yes, the original data queue is deleted according to the set policy. That is, the original data queue does not provide data writing service, and the data will be gradually reduced according to the set policy until it is completely deleted. If the business has a scenario of reading the first data, it can still be used.
步骤511,若判断结果为否,则将原始数据队列删除。这样可立刻释放空间,减少集群负载。Step 511: If the result of the judgment is no, the original data queue is deleted, so that space can be released immediately and the cluster load can be reduced.
也就是说,由于第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,当选择第一迁移策略来确定开始迁移位置时,原始数据队列中的最早数据位置至开始迁移位置之间的数据没有拷贝到迁移数据队列中,在完成数据迁移后需要判断原始数据队列中未迁移的第一数据是否有被消费者读取的情形,如果没有读取的情形,即在迁移完成后读取原始数据队列的未迁移的第一数据的次数为0,可以立即删除原始数据队列;如果在迁移完成后有读取原始数据队列的未迁移的第一数据的情形,即在迁移完成后读取原始数据队列的未迁移的第一数据的次数不为0,但小于或等于设定次数,例如可为1次或2次,则可按照设定策略删除原始数据队列,例如在消费者读取未迁移的第一数据后再删除原始数据队列,以保证不会影响正常工作。That is to say, since the starting migration position of the first migration strategy is located downstream of the earliest data position of the original data queue, when the first migration strategy is selected to determine the starting migration position, the data between the earliest data position in the original data queue and the starting migration position are not copied to the migration data queue. After completing the data migration, it is necessary to determine whether the unmigrated first data in the original data queue has been read by the consumer. If it has not been read, that is, the number of times the unmigrated first data of the original data queue is read after the migration is completed is 0, the original data queue can be deleted immediately; if the unmigrated first data of the original data queue is read after the migration is completed, that is, the number of times the unmigrated first data of the original data queue is read after the migration is completed is not 0, but less than or equal to the set number, for example, it can be 1 or 2 times, then the original data queue can be deleted according to the set strategy, for example, the original data queue is deleted after the consumer reads the unmigrated first data, to ensure that it will not affect normal work.
上面参考图5-图9对本申请实施例的数据迁移方法的第一迁移策略的两种方案进行了介绍,下面对本申请实施例的数据迁移方法的第二迁移策略的方案进行介绍。具体地,继续参考图5,该数据迁移方法还包括:The above two schemes of the first migration strategy of the data migration method of the embodiment of the present application are introduced with reference to Figures 5 to 9. The scheme of the second migration strategy of the data migration method of the embodiment of the present application is introduced below. Specifically, continuing to refer to Figure 5, the data migration method also includes:
步骤53’,若步骤52的判断结果为否,即消费者读取原始数据队列的第一数据的总次数大于设定次数,则选择第二迁移策略来确定开始迁移位置,其中,第二迁移策略的开始迁移位置为原始数据队列的最早数据位置,即本申请实施例的数据迁移方法的第三种方案。Step 53', if the judgment result of step 52 is no, that is, the total number of times the consumer reads the first data of the original data queue is greater than the set number, then the second migration strategy is selected to determine the starting migration position, wherein the starting migration position of the second migration strategy is the earliest data position of the original data queue, that is, the third scheme of the data migration method of the embodiment of the present application.
也就是说,虽然选择第一迁移策略的第一种方案和第二种方案完成数据迁移后,可按照设定策略删除原始数据队列,来保证消费者可以读取原始数据队列中的未迁移的第一数据,但使消费者从第二服务器中的迁移数据队列切换至第一服务器中的原始数据队列的消耗较大,因此在消费者读取原始数据队列的第一数据的次数较多时,为了降低消耗,可选择第二迁移策略,以便从原始数据队列的最早数据位置开始迁移,保证消费者的读取业务能够正常进行。That is to say, although the first and second schemes of the first migration strategy can delete the original data queue according to the set strategy after completing the data migration to ensure that the consumer can read the unmigrated first data in the original data queue, the consumption of switching the consumer from the migrated data queue in the second server to the original data queue in the first server is relatively large. Therefore, when the consumer reads the first data in the original data queue many times, in order to reduce consumption, the second migration strategy can be selected so as to start migration from the earliest data position in the original data queue to ensure that the consumer's reading business can proceed normally.
并且,此时,数据迁移方法还可包括:In addition, at this time, the data migration method may further include:
步骤55”,判断迁移数据队列是否拷贝到生产者最新写入原始数据队列的数据;Step 55”, determine whether the migration data queue has copied the data most recently written into the original data queue by the producer;
步骤56”,若判断结果为是,则完成数据迁移。In step 56, if the judgment result is yes, the data migration is completed.
也就是说,当选择第二迁移策略从原始数据队列的最早数据位置开始迁移时,迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,可保证迁移数据队列拷贝到原始数据队列的所有数据,则完成数据迁移。That is to say, when the second migration strategy is selected to start migration from the earliest data position of the original data queue, when the migration data queue copies the data most recently written by the producer to the original data queue, it can be guaranteed that the migration data queue copies all the data to the original data queue, and the data migration is completed.
步骤57”,在完成数据迁移后将原始数据队列删除。In step 57, the original data queue is deleted after the data migration is completed.
由于第二迁移策略的开始迁移位置为原始数据队列的最早数据位置,迁移数据队列拷贝到生产者最新写入原始数据队列的数据而完成数据迁移后,迁移数据队列可拷贝到原始数据队列的所有数据,因此在完成数据迁移后可将原始数据队列立即删除,以便释放内存。Since the starting migration position of the second migration strategy is the earliest data position of the original data queue, the migrated data queue copies the data most recently written by the producer to the original data queue. After the data migration is completed, the migrated data queue can copy all the data of the original data queue. Therefore, after the data migration is completed, the original data queue can be deleted immediately to free up memory.
本申请实施例提供的数据迁移方法,在保证上下游数据能够有效传输且不会发生数据遗失问题的前提下,能够针对消费者的不同消费情况选择不同迁移方案,有助于提升副本迁移完成的速度,降低存储数据的磁盘的io资源消耗,进而减少副本迁移对正常业务的影响。The data migration method provided in the embodiment of the present application can select different migration plans according to different consumption situations of consumers, while ensuring that upstream and downstream data can be effectively transmitted and no data loss problem occurs. This helps to improve the speed of completing the copy migration, reduce the IO resource consumption of the disk storing data, and thus reduce the impact of the copy migration on normal business.
综上所述,以使用最广泛的消息通道产品kafka为例介绍了本申请实施例要求保护的技术方案,关键技术点如下:In summary, the technical solution claimed for protection in the embodiments of this application is introduced by taking Kafka, the most widely used message channel product, as an example. The key technical points are as follows:
1、根据消费者读取原始数据队列中的数据的情况确定从原始leader副本的哪个位置开始迁移,即选择开始迁移位置。具体地,开始迁移位置可为原始数据队列的最新数据位置(第一种方案)、原始数据队列中的最早活跃消费者的读取位置(第二种方案)和原始数据队列的最早数据位置(第三种方案)中的一者。1. Determine the position of the original leader replica from which to start migration based on the situation of consumers reading data in the original data queue, that is, select the starting migration position. Specifically, the starting migration position can be one of the latest data position of the original data queue (the first solution), the reading position of the earliest active consumer in the original data queue (the second solution), and the earliest data position of the original data queue (the third solution).
2、获取原始数据队列存在的最早活跃消费者的读取位置,参见步骤5314-步骤5316或步骤5311-步骤5316。2. Obtain the reading position of the earliest active consumer in the original data queue, see steps 5314 to 5316 or steps 5311 to 5316.
3、关于数据迁移完成的判断标准3. Criteria for judging the completion of data migration
对于第一种方案,首先,需要判断迁移数据队列是否拷贝到生产者最新写入原始数据队列(即领导者副本)的数据,当判断结果为是时,接着,需要调用上面介绍的应用程序界面api确认迁移过程中的最早活跃消费者的读取位置进入原始数据队列的已被迁移数据队列拷贝的第二数据的范围内,即完成数据迁移。For the first solution, first, it is necessary to determine whether the migrated data queue has copied the data most recently written by the producer to the original data queue (i.e., the leader copy). When the judgment result is yes, then, it is necessary to call the application interface API introduced above to confirm that the reading position of the earliest active consumer in the migration process enters the range of the second data of the original data queue that has been copied by the migrated data queue, thereby completing the data migration.
对于第二种方案和第三种方案,迁移数据队列拷贝到生产者在迁移过程中最新写入所述原始数据队列的数据时,即完成数据迁移。For the second and third solutions, the data migration is completed when the migration data queue is copied to the data that the producer has most recently written into the original data queue during the migration process.
4、关于是否删除原始数据队列4. Whether to delete the original data queue
对于第一迁移策略的两种方案,判断是否读取原始数据队列中的未迁移的第一数据的情形,如果有读取情形,按照设定策略删除原始数据队列,即迁移完成后原始数据队列的删除策略沿用Kafka本身的删除策略,如只保存固定时间的最新数据或者固定大小的数据文件;如果没有读取情形,则在完成数据迁移后可立即删除原始数据队列。For the two schemes of the first migration strategy, determine whether to read the unmigrated first data in the original data queue. If so, delete the original data queue according to the set strategy, that is, after the migration is completed, the deletion strategy of the original data queue follows Kafka's own deletion strategy, such as only saving the latest data at a fixed time or data files of a fixed size; if there is no reading situation, the original data queue can be deleted immediately after the data migration is completed.
对于第三种方案,由于迁移数据队列已经拷贝了原始数据队列的所有数据,因此在完成数据迁移后可以将原始数据队列立即删除,以便尽快释放磁盘空间。For the third solution, since the migrated data queue has copied all the data of the original data queue, the original data queue can be deleted immediately after the data migration is completed, so as to release the disk space as soon as possible.
图10是本申请实施例提供的另一种数据迁移方法的流程图。该数据迁移方法用于迁移数据队列中的数据,其中,生产者用于向数据队列中按顺序写入数据,至少一个消费者用于从数据队列中按顺序读取数据,如图10所示,数据迁移方法包括:FIG10 is a flow chart of another data migration method provided by an embodiment of the present application. The data migration method is used to migrate data in a data queue, wherein a producer is used to write data to the data queue in sequence, and at least one consumer is used to read data from the data queue in sequence. As shown in FIG10 , the data migration method includes:
步骤1001,响应迁移第一服务器中的原始数据队列的数据的迁移指令,获取当前时刻原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,这里的“当前时刻”为收到迁移指令的时刻,最新数据位置为生产者在该当前时刻写入原始数据队列的数据的位置;Step 1001, in response to a migration instruction for migrating data in an original data queue in a first server, obtain a read position and a latest data position of the earliest active consumer of the original data queue at the current moment, wherein the "current moment" here is the moment when the migration instruction is received, and the latest data position is the position of the data written by the producer into the original data queue at the current moment;
步骤1002,比较最早活跃消费者的读取位置与最新数据位置之间的距离是否小于或等于设定距离;Step 1002, comparing whether the distance between the earliest active consumer's reading position and the latest data position is less than or equal to a set distance;
步骤1003,若比较结果为是,则将原始数据队列中的数据按照从最新数据位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,即开始迁移位置为原始数据队列的最新数据位置;和/或,Step 1003, if the comparison result is yes, the data in the original data queue is copied to the second server in the order from the latest data position to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue, that is, the starting migration position is the latest data position of the original data queue; and/or,
步骤1004,若比较结果为否,则将原始数据队列中的数据按照从最早活跃消费者的读取位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,即开始迁移位置为原始数据队列的最早活跃消费者的读取位置。Step 1004, if the comparison result is no, the data in the original data queue is copied to the second server in the order from the reading position of the earliest active consumer to the position of the data most recently written into the original data queue by the producer during the migration process to form a migration data queue, that is, the starting migration position is the reading position of the earliest active consumer of the original data queue.
在上述方案中,在接收到迁移指令的当前时刻,若原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离,活跃消费者为紧随最新数据的实时消费者,可从原始数据队列的最新数据位置开始迁移数据;若原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离,活跃消费者在原始数据队列的第一数据处停留时间较长,为原始数据队列的非实时消费者,可从原始数据队列中的最早消费者的读取位置处开始迁移数据,并且,将第一服务器中的的原始数据队列的数据按照从开始迁移位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,相对于从最早数据位置开始迁移数据的方案,由于生产者新写入原始数据队列的数据先存放至页面缓存中且一定时间后再从页面缓存中存入磁盘中,故本申请实施例的数据迁移方法能够增大从页面缓存中拷贝数据进行迁移的比率,减少从磁盘中拷贝第一数据的数量,降低了发生真实的磁盘读取的概率,减少了副本迁移对正常业务的影响,并且由于不需要迁移原始数据队列的所有数据,能够提升副本迁移完成的速度。In the above scheme, at the current moment when the migration instruction is received, if the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is less than or equal to the set distance, the active consumer is a real-time consumer that follows the latest data, and the data can be migrated from the latest data position of the original data queue; if the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, the active consumer stays at the first data of the original data queue for a long time and is a non-real-time consumer of the original data queue, and the data can be migrated from the reading position of the earliest consumer in the original data queue, and the data of the original data queue in the first server is migrated from the beginning to the end. The data queue is copied to the second server in sequence from the starting migration position to the position of the data most recently written by the producer to the original data queue during the migration process to form a migration data queue. Compared with the scheme of migrating data from the earliest data position, since the data newly written by the producer to the original data queue is first stored in the page cache and then stored in the disk from the page cache after a certain period of time, the data migration method of the embodiment of the present application can increase the ratio of copying data from the page cache for migration, reduce the number of first data copied from the disk, reduce the probability of real disk reading, reduce the impact of copy migration on normal business, and since it is not necessary to migrate all the data in the original data queue, it can increase the speed of completing the copy migration.
另外,当从原始数据队列的最新数据位置开始迁移数据时,数据迁移方法还包括:In addition, when migrating data from the latest data position of the original data queue, the data migration method further includes:
步骤1005,在数据迁移过程中获取原始数据队列的最早活跃消费者的读取位置;Step 1005, obtaining the reading position of the earliest active consumer of the original data queue during the data migration process;
步骤1006,判断迁移数据队列是否拷贝到生产者最新写入原始数据队列的数据;Step 1006, determining whether the migration data queue has copied the data most recently written into the original data queue by the producer;
步骤1007,若判断结果为是,则判断数据迁移过程中原始数据队列的最早活跃消费者的读取位置是否进入原始数据队列的第二数据范围内,第二数据为迁移数据队列已拷贝的数据;Step 1007, if the result of the determination is yes, then determining whether the reading position of the earliest active consumer of the original data queue during the data migration process enters the second data range of the original data queue, where the second data is the data that has been copied by the migration data queue;
步骤1008,若判断结果为是,则完成数据迁移。Step 1008: If the judgment result is yes, the data migration is completed.
由于开始迁移位置为原始数据队列的最新数据位置,原始数据队列中的所有活跃消费者为实时消费者,当迁移数据队列拷贝到生产者在迁移过程中最新写入原始数据队列的数据时,只要迁移过程中距离最新数据位置最远的最早活跃消费者的读取位置进入原始数据队列的已被迁移数据队列拷贝的第二数据的范围,就可保证原始数据队列的所有活跃消费者的读取位置均进入迁移数据队列的范围内,即可完成数据迁移,具体迁移过程可参见图7。Since the starting position of migration is the latest data position of the original data queue, all active consumers in the original data queue are real-time consumers. When the migrated data queue is copied to the data that the producer has recently written into the original data queue during the migration process, as long as the reading position of the earliest active consumer farthest from the latest data position during the migration process enters the range of the second data of the original data queue that has been copied by the migrated data queue, it can be ensured that the reading positions of all active consumers of the original data queue are within the range of the migrated data queue, and the data migration can be completed. The specific migration process can be seen in Figure 7.
当从原始数据队列中的最早活跃消费者的读取位置处开始迁移数据时,数据迁移方法还包括:When migrating data from the read position of the earliest active consumer in the original data queue, the data migration method further includes:
步骤1005’,判断迁移数据队列是否拷贝到生产者最新写入原始数据队列的数据;Step 1005', determine whether the migration data queue has copied the data most recently written into the original data queue by the producer;
步骤1006’,若判断结果为是,则完成数据迁移。Step 1006', if the judgment result is yes, the data migration is completed.
由于开始迁移位置为最早活跃消费者的读取位置,这样在数据迁移过程中迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,所有活跃消费者一定位于迁移数据队列的范围内,即可完成数据迁移,具体迁移过程可参见图8。Since the starting position of migration is the reading position of the earliest active consumer, when the migration data queue is copied to the data most recently written by the producer to the original data queue during the data migration process, all active consumers must be within the range of the migration data queue, and the data migration can be completed. The specific migration process can be seen in Figure 8.
需说明的是,步骤1001和步骤1005处的“获取原始数据队列的最早活跃消费者的读取位置”的过程,可按照上面介绍的步骤5314-步骤5316或步骤5311-步骤5316进行。另外,如在图10中所示,在步骤1008或步骤1006’处完成数据迁移后,数据迁移方法还包括:It should be noted that the process of "obtaining the earliest active consumer's reading position of the original data queue" at step 1001 and step 1005 can be performed according to the steps 5314 to 5316 or steps 5311 to 5316 described above. In addition, as shown in FIG10 , after completing the data migration at step 1008 or step 1006', the data migration method further includes:
步骤1009,判断是否有读取原始数据队列的未迁移的第一数据的情形,其中,第一数据为收到迁移指令时原始数据队列中已写入的数据。Step 1009 , determining whether there is a situation of reading the unmigrated first data in the original data queue, wherein the first data is the data written in the original data queue when the migration instruction is received.
步骤1010,若判断结果为是,则按照设定策略删除原始数据队列。Step 1010: If the judgment result is yes, the original data queue is deleted according to the set strategy.
步骤1011,若判断结果为否,则将原始数据队列删除。Step 1011, if the judgment result is no, the original data queue is deleted.
也就是说,在完成数据迁移后需要判断原始数据队列中未迁移的第一数据是否有被消费者读取的情况,如果没有读取的情况,可以立即删除原始数据队列;如果有读取的情形,则可以按照原来的设定策略删除原始数据队列,例如在消费者读取未迁移的第一数据后再删除原始数据队列,以保证不会影响正常工作。That is to say, after completing the data migration, it is necessary to determine whether the first data that has not been migrated in the original data queue has been read by the consumer. If it has not been read, the original data queue can be deleted immediately; if it has been read, the original data queue can be deleted according to the original set strategy, for example, the original data queue can be deleted after the consumer reads the first data that has not been migrated, to ensure that it will not affect normal work.
本申请实施例的数据迁移方法,副本数据拷贝时可以根据用户决策进行有限数据拷贝,在满足数据可靠性前提下提升了迁移效率,降低了迁移影响。另外,在数据迁移完成后,也提供了两种删除原始数据数据的方案:1、按照原来的设定策略删除原始数据队列,这样能够实现业务在数据上的可回放性;2、立即删除原始数据队列,这样能够释放磁盘容量。The data migration method of the embodiment of the present application can perform limited data copying according to user decision when copying the replica data, thereby improving the migration efficiency and reducing the impact of migration under the premise of satisfying data reliability. In addition, after the data migration is completed, two solutions for deleting the original data are also provided: 1. Delete the original data queue according to the original set strategy, so that the business can be replayed in data; 2. Immediately delete the original data queue, so that the disk capacity can be released.
图11是本申请实施例提供的一种数据迁移装置的结构示意图。该数据迁移装置1100用于迁移数据队列中的数据,其中,生产者用于向数据队列中按顺序写入数据,至少一个消费者用于从数据队列中按顺序读取数据。如图11所示,该数据迁移装置1100包括预测单元1101、选择单元1102和迁移单元1103。预测单元1101用于响应迁移第一服务器中的原始数据队列中的数据的迁移指令,预测消费者读取原始数据队列的第一数据的总次数,其中,第一数据为收到迁移指令时原始数据队列中已写入的数据。选择单元1102用于在总次数小于或等于设定次数时,选择第一迁移策略来确定开始迁移位置,其中,第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游。迁移单元1103用于按照设定顺序将原始数据队列中的数据拷贝至第二服务器中以形成迁移数据队列,其中,设定顺序为从所选迁移策略的开始迁移位置至生产者在迁移过程中最新写入原始数据队列的数据的位置的顺序。FIG11 is a schematic diagram of the structure of a data migration device provided by an embodiment of the present application. The data migration device 1100 is used to migrate data in a data queue, wherein the producer is used to write data to the data queue in sequence, and at least one consumer is used to read data from the data queue in sequence. As shown in FIG11, the data migration device 1100 includes a prediction unit 1101, a selection unit 1102, and a migration unit 1103. The prediction unit 1101 is used to respond to a migration instruction for migrating data in the original data queue in the first server, and predict the total number of times the consumer reads the first data in the original data queue, wherein the first data is the data written in the original data queue when the migration instruction is received. The selection unit 1102 is used to select a first migration strategy to determine the start migration position when the total number of times is less than or equal to the set number of times, wherein the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue. The migration unit 1103 is used to copy the data in the original data queue to the second server in a set order to form a migration data queue, wherein the set order is the order from the start migration position of the selected migration strategy to the position of the data written to the original data queue by the producer most recently during the migration process.
图12是图11中的选择单元的结构示意图。如图12所示,选择单元1102包括获取模块21和确定模块22。获取模块21用于获取当前时刻原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,这里的“当前时刻”为确定总次数小于设定次数的时刻,最新数据位置为生产者在该当前时刻写入原始数据队列的数据的位置。确定模块22用于在原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离时,确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置。FIG12 is a schematic diagram of the structure of the selection unit in FIG11. As shown in FIG12, the selection unit 1102 includes an acquisition module 21 and a determination module 22. The acquisition module 21 is used to obtain the read position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the "current moment" here is the moment when the total number of times is determined to be less than the set number of times, and the latest data position is the position of the data written by the producer to the original data queue at the current moment. The determination module 22 is used to determine that the starting migration position of the first migration strategy is the latest data position of the original data queue when the distance between the read position of the earliest active consumer in the original data queue and the latest data position is less than or equal to the set distance.
当确定模块22确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置时,获取模块21还用于在数据迁移过程中获取原始数据队列的最早活跃消费者的读取位置,如图11所示,数据迁移装置1100还包括第一确定单元1104,用于在迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,确定数据迁移过程中原始数据队列的最早活跃消费者的读取位置进入原始数据队列的第二数据范围内,完成数据迁移,其中,第二数据为迁移数据队列已拷贝的数据。When the determination module 22 determines that the starting migration position of the first migration strategy is the latest data position of the original data queue, the acquisition module 21 is also used to obtain the reading position of the earliest active consumer of the original data queue during the data migration process. As shown in Figure 11, the data migration device 1100 also includes a first determination unit 1104, which is used to determine that the reading position of the earliest active consumer of the original data queue during the data migration process enters the second data range of the original data queue when the migration data queue is copied to the data most recently written to the original data queue by the producer, thereby completing the data migration, wherein the second data is the data that has been copied to the migration data queue.
进一步地,确定模块22还可用于在最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离,确定第一迁移策略的开始迁移位置为最早活跃消费者的读取位置。当确定模块22确定第一迁移策略的开始迁移位置为最早活跃消费者的读取位置时,如图11所示,数据迁移装置1100还可包括第二确定单元1105,用于确定迁移数据队列拷贝到生产者最新写入原始数据队列的数据,完成数据迁移。Furthermore, the determination module 22 can also be used to determine that the start migration position of the first migration strategy is the read position of the earliest active consumer when the distance between the read position of the earliest active consumer and the latest data position is greater than a set distance. When the determination module 22 determines that the start migration position of the first migration strategy is the read position of the earliest active consumer, as shown in FIG11, the data migration device 1100 can also include a second determination unit 1105, which is used to determine the data copied from the migration data queue to the latest data written to the original data queue by the producer to complete the data migration.
图13是图12中的获取模块的结构示意图。如图13所示,获取模块21可包括第一获取子模块211、转存子模块212和比较获取子模块213。第一获取子模块211用于获取每个活跃消费者在读取的所有数据队列中的读取位置,所有数据队列包括原始数据队列。转存子模块212用于将每个活跃消费者与每个活跃消费者在读取的所有数据队列中的读取位置的映射结构转存为每个数据队列与每个数据队列中的所有活跃消费者的读取位置的映射结构,从而获得原始数据队列的所有活跃消费者的读取位置。比较获取子模块213用于比较原始数据队列中的所有活跃消费者的读取位置,以获取原始数据队列的最早活跃消费者的读取位置。FIG13 is a schematic diagram of the structure of the acquisition module in FIG12. As shown in FIG13, the acquisition module 21 may include a first acquisition submodule 211, a transfer submodule 212, and a comparison acquisition submodule 213. The first acquisition submodule 211 is used to obtain the reading position of each active consumer in all data queues read, and all data queues include the original data queue. The transfer submodule 212 is used to transfer the mapping structure of each active consumer and the reading position of each active consumer in all data queues read into a mapping structure of each data queue and the reading position of all active consumers in each data queue, thereby obtaining the reading positions of all active consumers in the original data queue. The comparison acquisition submodule 213 is used to compare the reading positions of all active consumers in the original data queue to obtain the reading position of the earliest active consumer of the original data queue.
进一步地,获取模块21还可包括第二获取子模块214、查询子模块215和去除子模块216,第二获取子模块214用于获取所有消费者在读取的数据队列中的读取位置,其中,所有消费者包括非活跃消费者和活跃消费者。查询子模块215用于查询所有消费者的读取状态以将每个消费者划分为活跃消费者或非活跃消费者,其中,活跃消费者的读取状态为正在工作中,非活跃消费者的读取状态为暂停。去除子模块216用于将非活跃消费者和非活跃消费者在读取的数据队列中的读取位置的信息去除,从而获取原始数据队列中的最早活跃消费者的读取位置。Furthermore, the acquisition module 21 may also include a second acquisition submodule 214, a query submodule 215, and a removal submodule 216. The second acquisition submodule 214 is used to obtain the reading position of all consumers in the read data queue, wherein all consumers include inactive consumers and active consumers. The query submodule 215 is used to query the reading status of all consumers to classify each consumer as an active consumer or an inactive consumer, wherein the reading status of an active consumer is working, and the reading status of an inactive consumer is paused. The removal submodule 216 is used to remove the information of the reading position of inactive consumers and inactive consumers in the read data queue, thereby obtaining the reading position of the earliest active consumer in the original data queue.
继续参考图11,当选择单元1102选择第一迁移策略来确定开始迁移位置时,数据迁移装置1100还包括第一删除单元1106,在完成数据迁移后,用于确定消费者有读取原始数据队列的未迁移的第一数据的情形,按照设定策略删除原始数据队列;用于确定消费者没有读取原始数据队列的未迁移的所述第一数据的情形,将原始数据队列删除。Continuing to refer to Figure 11, when the selection unit 1102 selects the first migration strategy to determine the starting migration position, the data migration device 1100 also includes a first deletion unit 1106, which is used to determine, after the data migration is completed, whether the consumer has read the unmigrated first data in the original data queue, and delete the original data queue according to the set strategy; and is used to determine whether the consumer has not read the unmigrated first data in the original data queue, and delete the original data queue.
另外,选择单元1102还用于在总次数大于设定次数时,选择第二迁移策略来确定开始迁移位置,其中,第二迁移策略的开始迁移位置为原始数据队列的最早数据位置。当选择单元1102选择第二迁移策略来确定开始迁移位置时,如图11所示,数据迁移装置1100还可包括第三确定单元1107,用于确定迁移数据队列拷贝到生产者最新写入原始数据队列的数据,完成数据迁移。进一步地,数据迁移装置1100还可包括第二删除单元1108,用于在完成数据迁移后将原始数据队列删除。In addition, the selection unit 1102 is also used to select a second migration strategy to determine the starting migration position when the total number of times is greater than the set number of times, wherein the starting migration position of the second migration strategy is the earliest data position of the original data queue. When the selection unit 1102 selects the second migration strategy to determine the starting migration position, as shown in FIG11, the data migration device 1100 may also include a third determination unit 1107, which is used to determine the data that the migration data queue copies to the latest data written to the original data queue by the producer to complete the data migration. Furthermore, the data migration device 1100 may also include a second deletion unit 1108, which is used to delete the original data queue after completing the data migration.
根据本申请实施例的数据迁移装置,在保证上下游数据能够有效传输且不会发生数据遗失问题的前提下,能够针对消费者的不同消费情况选择迁移方案,具体地,当消费者读取原始数据队列的第一数据的总次数较少时,即总次数小于或等于设定次数时,可选择第一迁移策略来确定开始迁移位置,并且第一迁移策略的开始迁移位置位于原始数据队列的最早数据位置的下游,当原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离时,确定第一迁移策略的开始迁移位置为原始数据队列的最新数据位置,即第一种方案;当原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离时,确定第一迁移策略的开始迁移位置为原始数据队列中的最早活跃消费者的读取位置,即第二种方案。由于在第一种方案和第二种方案中无需拷贝原始数据队列的所有数据,即减少了第一数据的拷贝数量,从而提升了副本迁移完成的速度,降低了存储数据的磁盘的io资源消耗,进而减少了副本迁移对正常业务的影响。另外,在数据迁移成功后,也提供了两种原始数据数据消除方式,即按照设定策略删除原始数据队列和立即删除原始数据队列,选择按照设定策略删除原始数据队列,数据可回放,即消费者能够再次读取第一数据,选择立即删除原始数据,能够尽快释放磁盘容量,从而方便业务在数据可回放性与磁盘容量上做出选择。而当数据迁移完成后消费者读取原始数据队列的第一数据的总次数较多时,即总次数大于设定次数时,可选择第二迁移策略来确定开始迁移位置,第二迁移策略的开始迁移位置为原始数据队列的最早数据位置,即第三种方案,这样迁移数据队列拷贝了原始数据队列的所有数据,在迁移完成后可以立即删除原始数据队列,释放空间,并且也能够满足消费者再次读取第一数据的需求。According to the data migration device of the embodiment of the present application, under the premise of ensuring that the upstream and downstream data can be effectively transmitted and no data loss problem occurs, the migration scheme can be selected according to the different consumption situations of consumers. Specifically, when the total number of times the consumer reads the first data of the original data queue is small, that is, the total number is less than or equal to the set number, the first migration strategy can be selected to determine the start migration position, and the start migration position of the first migration strategy is located downstream of the earliest data position of the original data queue. When the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is less than or equal to the set distance, the start migration position of the first migration strategy is determined to be the latest data position of the original data queue, that is, the first scheme; when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, the start migration position of the first migration strategy is determined to be the reading position of the earliest active consumer in the original data queue, that is, the second scheme. Since it is not necessary to copy all the data of the original data queue in the first scheme and the second scheme, that is, the number of copies of the first data is reduced, the speed of completing the copy migration is improved, the io resource consumption of the disk storing the data is reduced, and the impact of the copy migration on normal business is reduced. In addition, after the data migration is successful, two methods of deleting the original data are also provided, namely, deleting the original data queue according to the set policy and deleting the original data queue immediately. If you choose to delete the original data queue according to the set policy, the data can be replayed, that is, the consumer can read the first data again. If you choose to delete the original data immediately, the disk capacity can be released as soon as possible, so as to facilitate the business to make a choice between data replayability and disk capacity. When the total number of times the consumer reads the first data in the original data queue after the data migration is completed is large, that is, when the total number is greater than the set number, the second migration strategy can be selected to determine the starting migration position. The starting migration position of the second migration strategy is the earliest data position of the original data queue, that is, the third solution. In this way, the migration data queue copies all the data in the original data queue. After the migration is completed, the original data queue can be deleted immediately to release space, and it can also meet the consumer's need to read the first data again.
图14是本申请实施例提供的另一种数据迁移装置的结构示意图。该数据迁移装置用于迁移数据队列中的数据,其中,生产者用于向数据队列中按顺序写入数据,至少一个消费者用于从数据队列中按顺序读取数据。如图14所示,数据迁移装置1400包括获取模块1401和迁移模块1402。获取模块1401用于响应迁移第一服务器中的原始数据队列的迁移指令,获取当前时刻原始数据队列的最早活跃消费者的读取位置和最新数据位置,其中,这里的“当前时刻”为收到迁移指令的时刻,最新数据位置为生产者在该当前时刻写入原始数据队列的数据的位置。迁移模块1402,用于在原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离小于或等于设定距离时,将原始数据队列中的数据按照从最新数据位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,即此时开始迁移位置为最新数据位置;和/或,用于在原始数据队列中的最早活跃消费者的读取位置与最新数据位置之间的距离大于设定距离时,将原始数据队列中的数据按照从最早消费者的读取位置至生产者在迁移过程中最新写入所述原始数据队列的数据的位置的顺序拷贝至第二服务器中以形成迁移数据队列,即此时开始迁移位置为最早消费者的读取位置。其中,获取模块1401的具体结构可与上面提到获取模块21的具体结构相同。Figure 14 is a schematic diagram of the structure of another data migration device provided in an embodiment of the present application. The data migration device is used to migrate data in a data queue, wherein a producer is used to write data to the data queue in sequence, and at least one consumer is used to read data from the data queue in sequence. As shown in Figure 14, the data migration device 1400 includes an acquisition module 1401 and a migration module 1402. The acquisition module 1401 is used to respond to the migration instruction of the original data queue in the first server, and obtain the reading position and the latest data position of the earliest active consumer of the original data queue at the current moment, wherein the "current moment" here is the moment when the migration instruction is received, and the latest data position is the position of the data written by the producer to the original data queue at the current moment. Migration module 1402 is used to copy the data in the original data queue to the second server in the order from the latest data position to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is less than or equal to the set distance, that is, the migration position starts at the latest data position at this time; and/or, when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, copy the data in the original data queue to the second server in the order from the reading position of the earliest consumer to the position of the data most recently written to the original data queue by the producer during the migration process to form a migration data queue, that is, the migration position starts at the reading position of the earliest consumer. Among them, the specific structure of the acquisition module 1401 can be the same as the specific structure of the acquisition module 21 mentioned above.
当从原始数据队列的最新数据位置开始迁移数据时,获取模块1401还用于在数据迁移过程中获取原始数据队列的最早活跃消费者的读取位置。数据迁移装置1400还包括第一确定模块1403,用于在迁移数据队列拷贝到生产者最新写入原始数据队列的数据时,确定数据迁移过程中原始数据队列的最早活跃消费者的读取位置进入原始数据队列的第二数据范围内,完成数据迁移,其中,第二数据为迁移数据队列已拷贝的数据。When migrating data from the latest data position of the original data queue, the acquisition module 1401 is also used to obtain the reading position of the earliest active consumer of the original data queue during the data migration process. The data migration device 1400 also includes a first determination module 1403, which is used to determine that the reading position of the earliest active consumer of the original data queue during the data migration process enters the second data range of the original data queue when the migration data queue is copied to the data that the producer has recently written to the original data queue, and complete the data migration, wherein the second data is the data that has been copied by the migration data queue.
当从原始数据队列中的最早消费者的读取位置处开始迁移数据时,数据迁移装置1400还包括第二确定模块1404,用于确定迁移数据队列拷贝到生产者最新写入原始数据队列的数据,完成数据迁移。When migrating data from the earliest consumer's reading position in the original data queue, the data migration device 1400 further includes a second determination module 1404 for determining the data copied from the migrated data queue to the latest data written into the original data queue by the producer to complete data migration.
进一步地,在完成数据迁移后,数据迁移装置1400还可包括删除模块1405,用于确定消费者有读取原始数据队列的未迁移的第一数据的情形,按照设定策略删除所述原始数据队列,其中,第一数据为收到迁移指令时原始数据队列中已写入的数据;用于确定消费者没有读取原始数据队列的未迁移的第一数据的情形,将原始数据队列立即删除。Furthermore, after completing the data migration, the data migration device 1400 may also include a deletion module 1405, which is used to determine whether the consumer has read the unmigrated first data of the original data queue, and delete the original data queue according to the set strategy, wherein the first data is the data written in the original data queue when the migration instruction is received; and is used to determine whether the consumer has not read the unmigrated first data of the original data queue, and delete the original data queue immediately.
本申请实施例的方案基于下游数据读取状态,确定迁移起始位置以及迁移结束条件并保障下游任务数据消费不丢失,能够快速完成消息通道的数据拷贝工作。例如,一个典型的200G的副本,原始迁移方案时间需要2-3小时,使用本申请实施例的方案,能够在十分钟内完成迁移,大大提升了迁移效率。同时,数据拷贝过程减少大量磁盘读写,降低对正常业务的冲击。另外,根据业务需求确定原始数据副本两种删除方案,原始副本按照设定策略删除的方案使数据可回放消费;而立即删除原始副本的方案,可实现空间最优。The solution of the embodiment of the present application is based on the downstream data reading status, determines the migration starting position and migration end conditions, and ensures that the downstream task data consumption is not lost, and can quickly complete the data copying work of the message channel. For example, for a typical 200G copy, the original migration solution takes 2-3 hours. Using the solution of the embodiment of the present application, the migration can be completed within ten minutes, greatly improving the migration efficiency. At the same time, the data copy process reduces a large amount of disk reading and writing, reducing the impact on normal business. In addition, two deletion schemes for the original data copy are determined according to business needs. The scheme of deleting the original copy according to the set strategy makes the data replayable and consumable; and the scheme of immediately deleting the original copy can achieve space optimization.
另外,由于在诸如Kafka集群的系统中,与生产者、消费者和数据队列等相关的信息在多个服务器中是共享的,因此,上述的数据迁移方法可由第一服务器执行,也可由第二服务器执行。图15是本申请实施例提供的一种服务器的结构示意图。如图15所示,该服务器1500包括收发器1501、存储器1502和处理器1503。其中,收发器1501用于接收和发送数据。存储器1502存储有计算机程序。处理器1503用于执行存储器1502所存储的计算机程序,以使服务器1500实现本申请实施例的数据迁移方法。其中,该服务器为上述的数据迁移方法中提到的第一服务器或所述第二服务器。In addition, since in a system such as a Kafka cluster, information related to producers, consumers, and data queues is shared among multiple servers, the above-mentioned data migration method can be executed by the first server or by the second server. Figure 15 is a schematic diagram of the structure of a server provided in an embodiment of the present application. As shown in Figure 15, the server 1500 includes a transceiver 1501, a memory 1502, and a processor 1503. Among them, the transceiver 1501 is used to receive and send data. The memory 1502 stores a computer program. The processor 1503 is used to execute the computer program stored in the memory 1502, so that the server 1500 implements the data migration method of an embodiment of the present application. Among them, the server is the first server or the second server mentioned in the above-mentioned data migration method.
图16是本申请实施例提供的一种网络系统的结构示意图。如图16所示,该网络系统至少包括两个服务器,即服务器A和服务器B,原始数据队列位于服务器A中,迁移数据队列位于服务器B中。其中,服务器A或服务器B能够执行本申请实施例的数据迁移方法。Figure 16 is a schematic diagram of the structure of a network system provided in an embodiment of the present application. As shown in Figure 16, the network system includes at least two servers, namely server A and server B, the original data queue is located in server A, and the migration data queue is located in server B. Among them, server A or server B can execute the data migration method of the embodiment of the present application.
应理解,上述方法实施例的各步骤可以通过处理器中的硬件形式的逻辑电路或者软件形式的指令完成。处理器可以是CPU、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,例如,分立门、晶体管逻辑器件或分立硬件组件。It should be understood that each step of the above method embodiment can be completed by a hardware-based logic circuit or software-based instructions in a processor. The processor can be a CPU, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices, such as discrete gates, transistor logic devices, or discrete hardware components.
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(centralprocessing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signalprocessor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。It is understood that the processor in the embodiments of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. The general-purpose processor may be a microprocessor or any conventional processor.
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable rom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。The method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, mobile hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor so that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can be located in an ASIC.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions may be transmitted from a website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。It should be understood that the various numerical numbers involved in the embodiments of the present application are only used for the convenience of description and are not used to limit the scope of the embodiments of the present application.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011219098.1A CN114442907B (en) | 2020-11-04 | 2020-11-04 | Data migration method and device, server and network system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011219098.1A CN114442907B (en) | 2020-11-04 | 2020-11-04 | Data migration method and device, server and network system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114442907A CN114442907A (en) | 2022-05-06 |
CN114442907B true CN114442907B (en) | 2024-07-05 |
Family
ID=81361060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011219098.1A Active CN114442907B (en) | 2020-11-04 | 2020-11-04 | Data migration method and device, server and network system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114442907B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103227747A (en) * | 2012-03-14 | 2013-07-31 | 微软公司 | High density hosting for messaging service |
CN103425529A (en) * | 2012-05-17 | 2013-12-04 | 国际商业机器公司 | System and method for migrating virtual machines between networked computing environments based on resource utilization |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7415470B2 (en) * | 2004-08-12 | 2008-08-19 | Oracle International Corporation | Capturing and re-creating the state of a queue when migrating a session |
CN109144972B (en) * | 2017-06-26 | 2022-07-12 | 华为技术有限公司 | Data migration method and data node |
CN109842636A (en) * | 2017-11-24 | 2019-06-04 | 阿里巴巴集团控股有限公司 | Cloud service moving method, device and electronic equipment |
CN109271098B (en) * | 2018-07-18 | 2021-03-23 | 成都华为技术有限公司 | Data migration method and device |
-
2020
- 2020-11-04 CN CN202011219098.1A patent/CN114442907B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103227747A (en) * | 2012-03-14 | 2013-07-31 | 微软公司 | High density hosting for messaging service |
CN103425529A (en) * | 2012-05-17 | 2013-12-04 | 国际商业机器公司 | System and method for migrating virtual machines between networked computing environments based on resource utilization |
Also Published As
Publication number | Publication date |
---|---|
CN114442907A (en) | 2022-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10382380B1 (en) | Workload management service for first-in first-out queues for network-accessible queuing and messaging services | |
CN113553346B (en) | Large-scale real-time data stream integrated processing, forwarding and storing method and system | |
WO2021062981A1 (en) | Ssd data storage node management method and apparatus, and computer device | |
WO2021129477A1 (en) | Data synchronization method and related device | |
US9792231B1 (en) | Computer system for managing I/O metric information by identifying one or more outliers and comparing set of aggregated I/O metrics | |
CN110958300B (en) | A method, system, apparatus, electronic device and computer-readable medium for uploading data | |
CN111796767B (en) | Distributed file system and data management method | |
CN111124270A (en) | Method, apparatus and computer program product for cache management | |
CN112711564B (en) | Merge processing method and related equipment | |
CN114035750A (en) | File processing method, device, equipment, medium and product | |
WO2023216571A1 (en) | Resource scheduling method, apparatus and system for elastic-search cluster | |
US9893972B1 (en) | Managing I/O requests | |
CN108418859B (en) | Method and apparatus for writing data | |
CN120492491A (en) | Cache optimization method, device, computer equipment, readable storage medium and program product for power data system | |
CN113835613B (en) | A file reading method, device, electronic equipment and storage medium | |
CN114442907B (en) | Data migration method and device, server and network system | |
CN109947704A (en) | A lock type switching method, device and cluster file system | |
CN110413689B (en) | Multi-node data synchronization method and device for in-memory database | |
CN113051244A (en) | Data access method and device, and data acquisition method and device | |
WO2025025593A1 (en) | Method for managing storage pool, device, and storage medium | |
JP7073737B2 (en) | Communication log recording device, communication log recording method, and communication log recording program | |
CN111881085A (en) | Method and system for optimizing read-write bandwidth performance | |
CN111405313A (en) | Method and system for storing streaming media data | |
US12340104B1 (en) | Universal storage handler | |
CN112578996B (en) | Metadata sending method of storage system and storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |