[go: up one dir, main page]

CN115344551A - Data migration method and data node - Google Patents

Data migration method and data node Download PDF

Info

Publication number
CN115344551A
CN115344551A CN202210741489.2A CN202210741489A CN115344551A CN 115344551 A CN115344551 A CN 115344551A CN 202210741489 A CN202210741489 A CN 202210741489A CN 115344551 A CN115344551 A CN 115344551A
Authority
CN
China
Prior art keywords
data
node
data node
target
operation request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210741489.2A
Other languages
Chinese (zh)
Inventor
涂屹
朱建峰
智雅楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210741489.2A priority Critical patent/CN115344551A/en
Publication of CN115344551A publication Critical patent/CN115344551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data migration method, which is used for migrating a load without waiting for the completion of the migration of target data from a first data node to a second data node and reducing time delay. The method in the embodiment of the application comprises the following steps: a second data node receives a data migration request about target data sent by a first data node, wherein the data migration request comprises a memory address of the target data in the first data node; the method comprises the steps that an RDMA virtual memory space is established according to a data migration request, a memory address of the RDMA virtual memory space is mapped to a memory address of a first data node, and the RDMA virtual memory space is used for a second data node to access target data on the first data node according to a target operation request; sending an instruction for modifying the metadata to the management node, wherein the instruction for modifying the metadata is used for modifying the metadata by the management node so that a target operation request for accessing the target data is routed to the second data node; and receiving and storing the target data sent by the first data node.

Description

一种数据迁移的方法以及数据节点A data migration method and data nodes

本申请是分案申请,原申请的申请号是201710495228.6,原申请日是2017年06月26日,原申请的全部内容通过引用结合在本申请中。This application is a divisional application. The application number of the original application is 201710495228.6, and the original application date is June 26, 2017. The entire content of the original application is incorporated in this application by reference.

技术领域technical field

本申请涉及计算机领域,尤其涉及一种数据迁移的方法、第一数据节点以及第二数据节点。The present application relates to the computer field, and in particular to a data migration method, a first data node and a second data node.

背景技术Background technique

随着数据库技术的不断发展,用户对于数据库的扩展能力,容灾能力的要求不断提升,分布式数据库已经越来越受到用户的欢迎。With the continuous development of database technology, users' requirements for database expansion and disaster recovery capabilities continue to increase, and distributed databases have become more and more popular with users.

在现有的负载均衡中,要将第一数据节点上的一些数据(可称为目标数据或者迁移数据)迁移到第二数据节点上。在数据迁移发生前,服务器对第一数据节点有一些访问流(如读/写请求);在数据迁移过程中,服务器对目标数据的访问仍然集中在第一数据节点上;在数据迁移结束后,服务器对目标数据的访问会切换到第二数据节点上。In existing load balancing, some data (which may be called target data or migration data) on the first data node needs to be migrated to the second data node. Before the data migration occurs, the server has some access flows (such as read/write requests) to the first data node; during the data migration process, the server’s access to the target data is still concentrated on the first data node; after the data migration is completed , the server's access to the target data will be switched to the second data node.

但是,关于目标数据的负载需要等到数据复制完以后,即数据都复制到第二数据节点上,才能将目标数据的负载切换到第二数据节点,负载迁移速度较慢;目标数据需要先复制到内核缓冲区,再发送到网络上向第二数据节点传输,对CPU资源消耗很大,会影响第一数据节点处理其他负载的性能,且目标数据传输速度较慢,时延较大。However, the load of the target data needs to wait until the data is copied, that is, the data is copied to the second data node before the load of the target data can be switched to the second data node, and the load migration speed is slow; the target data needs to be copied to the second data node first. The kernel buffer is sent to the network for transmission to the second data node, which consumes a lot of CPU resources and will affect the performance of the first data node to handle other loads, and the target data transmission speed is slow and the delay is large.

发明内容Contents of the invention

本申请实施例提供了一种数据迁移的方法以及数据节点,用于无需等待目标数据从第一数据节点向第二数据节点迁移完成就可以迁移负载,提升负载迁移的速度,降低时延。The embodiment of the present application provides a data migration method and a data node, which are used to migrate loads without waiting for the target data to be migrated from the first data node to the second data node, so as to increase the speed of load migration and reduce the time delay.

本申请实施例第一方面提供一种数据迁移的方法,可以包括:在负载均衡的方案中,第二数据节点接收第一数据节点发送的关于目标数据的数据迁移请求,该数据迁移请求包含该目标数据在该第一数据节点中的内存地址;应理解,目标数据可以是第一数据节点上的热点数据,热点数据可以为经常访问或者访问次数超过特定阈值的数据。该第二数据节点根据该数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,该RDMA虚拟内存空间的内存地址映射到该第一数据节点的内存地址,该RDMA虚拟内存空间用于该第二数据节点根据目标操作请求通过RDMA访问该第一数据节点上的目标数据;该第二数据节点向管理节点发送修改元数据的指令,该修改元数据的指令用于该管理节点修改该元数据,以使得访问该目标数据的目标操作请求路由到该第二数据节点;这里的目标操作请求可以是对目标数据的负载或者访问。元数据(Metadata),又称中介数据、中继数据,为描述数据的数据(data about data),主要是描述数据属性(property)的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。该第二数据节点接收并存储该第一数据节点发送的该目标数据。The first aspect of the embodiment of the present application provides a data migration method, which may include: in the load balancing solution, the second data node receives a data migration request about the target data sent by the first data node, and the data migration request includes the The memory address of the target data in the first data node; it should be understood that the target data may be hot data on the first data node, and the hot data may be data that is frequently accessed or whose access times exceed a specific threshold. The second data node establishes a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used for the second data The node accesses the target data on the first data node through RDMA according to the target operation request; the second data node sends an instruction to modify the metadata to the management node, and the instruction to modify the metadata is used by the management node to modify the metadata to A target operation request for accessing the target data is routed to the second data node; the target operation request here may be a load or an access to the target data. Metadata, also known as intermediary data and relay data, is the data describing data (data about data), mainly describing the information of data attributes (property), used to support such as indicating storage location, historical data, resource search , file recording and other functions. The second data node receives and stores the target data sent by the first data node.

在本申请实施例中,管理节点可以收集集群信息,该集群信息可以包括每个数据节点以及数据节点上每个store的容量和负载信息等。管理节点根据收集的集群信息可以触发负载均衡,第一数据节点接收管理节点发送的触发负载均衡的指令,向第二数据节点发送关于目标数据的数据迁移请求,或者,第一数据节点向管理节点发送负载大于第一阈值的指令,第二数据节点向管理节点发送负载小于第二阈值的指令,管理节点可以确认负载大于第一阈值的第一数据节点为源数据节点,确认负载小于第二阈值的第二数据节点为目标数据节点,管理节点可以向第一数据节点发送触发负载均衡的指令,第一数据节点向第二数据节点发送关于目标数据的迁移请求。第二数据节点根据关于目标数据的数据迁移请求,建立RDMA虚拟内存空间,RDMA的虚拟内存空间的内存地址映射到该第一数据节点的内存地址,所以,在第二数据节点上可以直接对第一数据节点的目标数据进行访问,不需要等到目标数据在第二数据节点上复制完成再迁移目标操作请求。第二数据节点再向管理节点发送元数据的修改指令,可以使得访问该目标数据的目标操作请求路由到该第二数据节点;第二数据节点接收目标操作请求后,可以通过RDMA访问第一数据节点上的目标数据。进一步的,第二数据节点接收第一数据节点发送的目标数据。利用RDMA虚拟内存映射技术,将待迁移数据映射到第二数据节点的虚拟内存空间中,并将使用权和负载全部切换到第二数据节点,从而无需等待数据复制完成就可以迁移负载,提升负载迁移的速度。In the embodiment of the present application, the management node may collect cluster information, and the cluster information may include capacity and load information of each data node and each store on the data node. The management node can trigger load balancing according to the collected cluster information. The first data node receives the trigger load balancing instruction sent by the management node, and sends a data migration request about the target data to the second data node, or the first data node sends a request to the management node Send an instruction that the load is greater than the first threshold, the second data node sends an instruction that the load is less than the second threshold to the management node, the management node can confirm that the first data node with the load greater than the first threshold is the source data node, and confirm that the load is less than the second threshold The second data node is the target data node, the management node can send an instruction to trigger load balancing to the first data node, and the first data node sends a migration request about the target data to the second data node. The second data node establishes an RDMA virtual memory space according to the data migration request about the target data, and the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, so the second data node can directly access the first data node When the target data of a data node is accessed, there is no need to wait for the target data to be replicated on the second data node before migrating the target operation request. The second data node then sends a metadata modification instruction to the management node, which can route the target operation request for accessing the target data to the second data node; after receiving the target operation request, the second data node can access the first data through RDMA Target data on the node. Further, the second data node receives the target data sent by the first data node. Using RDMA virtual memory mapping technology, the data to be migrated is mapped to the virtual memory space of the second data node, and all usage rights and loads are switched to the second data node, so that the load can be migrated without waiting for the completion of data copying, and the load can be increased speed of migration.

结合本申请实施例的第一方面,在本申请实施例的第一方面的第一种实现方式中,所述第二数据节点接收并存储所述第一数据节点发送的所述目标数据,可以包括:该第二数据节点通过RDMA接收并存储所述第一数据节点发送的该目标数据。在本申请实施例中,将RDMA技术用于分布式数据库集群实时数据迁移的过程,减少了数据迁移对第一数据节点上CPU资源的消耗,减小了迁移过程对第一数据节点正在运行的其它业务的影响,并提升了数据传输的速度。With reference to the first aspect of the embodiments of the present application, in the first implementation manner of the first aspect of the embodiments of the present application, the second data node receives and stores the target data sent by the first data node, and may The method includes: the second data node receives and stores the target data sent by the first data node through RDMA. In this embodiment of the application, RDMA technology is used in the process of real-time data migration of distributed database clusters, which reduces the consumption of CPU resources on the first data node by data migration, and reduces the impact of the migration process on the running of the first data node. The impact of other services, and improve the speed of data transmission.

结合本申请实施例的第一方面、第一方面的第一种实现方式,在本申请实施例的第一方面的第二种实现方式中,该目标数据为该第一数据节点中将热点数据分为M份数据中的其中一份数据,M为大于等于2的整数。在本申请实施例中,第二数据节点接收第一数据节点通过RDMA发送的目标数据,可以是第一数据节点将热点数据分为M份,以M份中的其中一份进行传输,这样,就可以减少在第一数据节点向第二数据节点发送数据的大小,缩短了目标数据不可用的时间和数据量,提升了服务质量。In combination with the first aspect of the embodiment of the present application and the first implementation of the first aspect, in the second implementation of the first aspect of the embodiment of the present application, the target data is the hotspot data in the first data node Divide into one of the M data, where M is an integer greater than or equal to 2. In the embodiment of the present application, the second data node receives the target data sent by the first data node through RDMA. It may be that the first data node divides the hot data into M shares and transmits one of the M shares. In this way, Therefore, the size of the data sent from the first data node to the second data node can be reduced, shortening the unavailable time and data volume of the target data, and improving the service quality.

结合本申请实施例的第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式,在本申请实施例的第一方面的第三种实现方式中,+该方法还可以包括:第二数据节点向管理节点发送元数据的修改指令后,原先关于目标数据的目标操作请求(如负载或访问)会路由到第二数据节点上,当该第二数据节点接收该目标操作请求时,若该目标操作请求访问的该目标数据未保存在该第二数据节点上,应理解,在第二数据节点建立RDMA虚拟内存空间后,第二数据节点可以通过RDMA访问第一数据节点的目标数据的过程中,第一数据节点也可以将目标数据向第二数据节点发送,或者,可以通过RDMA方式将目标数据向第二数据节点发送。则该第二数据节点根据该目标操作请求通过RDMA访问该第一数据节点上的该目标数据;若该目标操作请求访问的该目标数据已保存在该第二数据节点上,则该第二数据节点根据该目标操作请求在已保存的该目标数据上进行访问。在本申请实施例中,若目标操作请求访问的目标数据未保存在第二数据节点上,则可以通过RDMA访问第一数据节点上的内存数据。若目标操作请求访问的目标数据保存在第二数据节点上,则可以在第二数据节点上直接访问,不需要再通过RDMA访问第一数据节点,不然,还需要第二数据节点向第一数据节点发送目标数据,浪费资源,增大访问时延。In combination with the first aspect of the embodiment of the present application, the first implementation of the first aspect, and the second implementation of the first aspect, in the third implementation of the first aspect of the embodiment of the present application, + the method It may also include: after the second data node sends the metadata modification instruction to the management node, the original target operation request (such as load or access) on the target data will be routed to the second data node, when the second data node receives the When the target operation requests, if the target data accessed by the target operation request is not stored on the second data node, it should be understood that after the second data node establishes an RDMA virtual memory space, the second data node can access the first data node through RDMA. During the process of receiving the target data from the data nodes, the first data node may also send the target data to the second data node, or may send the target data to the second data node through RDMA. Then the second data node accesses the target data on the first data node through RDMA according to the target operation request; if the target data accessed by the target operation request has been stored on the second data node, the second data The node performs access on the stored target data according to the target operation request. In the embodiment of the present application, if the target data accessed by the target operation request is not stored on the second data node, the memory data on the first data node can be accessed through RDMA. If the target data accessed by the target operation request is stored on the second data node, it can be directly accessed on the second data node, and there is no need to access the first data node through RDMA, otherwise, the second data node needs to send data to the first data node Nodes send target data, which wastes resources and increases access delay.

结合本申请实施例的第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式,在本申请实施例第一方面的第四种实现方式中,该方法还可以包括:若该目标操作请求为写操作请求,且所述写操作请求指示增加新数据,则该第二数据节点根据该写操作请求在该第二数据节点上进行写操作。因为如果是写操作请求的话,是写入目标数据,则不管第二数据界节点上是否有保存第一数据节点发送的目标数据,第二数据节点都可以根据写操作请求在第二数据节点上进行写操作,减少传输流程,降低时延。In combination with the first aspect of the embodiments of the present application, the first implementation of the first aspect, and the second implementation of the first aspect, in the fourth implementation of the first aspect of the embodiments of the present application, the method may also be It includes: if the target operation request is a write operation request, and the write operation request indicates adding new data, then the second data node performs a write operation on the second data node according to the write operation request. Because if it is a write operation request, it is to write the target data, no matter whether the target data sent by the first data node is saved on the second data node, the second data node can write the target data on the second data node according to the write operation request Perform write operations to reduce the transmission process and delay.

结合本申请实施例的第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式,在本申请实施例第一方面的第五种实现方式中,该方法还可以包括:若所述目标操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且该目标操作请求访问的该目标数据已保存在该第二数据节点上,则该第二数据节点根据该目标操作请求在该第二数据节点上进行访问;若所述目标操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且该目标操作请求访问的该目标数据未保存在该第二数据节点上,则该第二数据节点根据该目标操作请求通过RDMA访问该第一数据节点上的该目标数据。不管是读操作请求或者写操作请求,第二数据节点上如果已保存有第一数据节点发送的目标数据,则第二数据节点可以直接访问该目标数据;如果第二数据节点上未保存有第一数据节点发送的目标数据,第二数据节点可以通过RDMA访问第一数据节点的目标数据。提供了具体的实现方式,增加了方案的可行性。In combination with the first aspect of the embodiments of the present application, the first implementation of the first aspect, and the second implementation of the first aspect, in the fifth implementation of the first aspect of the embodiments of the present application, the method may also be Including: if the target operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the target operation request has been saved on the second data node, then the second The data node performs access on the second data node according to the target operation request; if the target operation request is a read operation request, or a write operation request for modifying the target data, and the target operation request accesses the target If the data is not stored on the second data node, the second data node accesses the target data on the first data node through RDMA according to the target operation request. Regardless of whether it is a read operation request or a write operation request, if the target data sent by the first data node is stored on the second data node, the second data node can directly access the target data; if the second data node does not store the target data For the target data sent by a data node, the second data node can access the target data of the first data node through RDMA. A specific implementation method is provided, which increases the feasibility of the scheme.

本申请实施例第二方面提供一种数据迁移的方法,可以包括:在负载均衡的方案中,第一数据节点向第二数据节点发送关于目标数据的迁移请求。该数据迁移请求包含该目标数据在该第一数据节点中的内存地址,该数据迁移请求用于该第二数据节点建立远程直接内存访问RDMA虚拟内存空间,该RDMA虚拟内存空间的内存地址映射到该第一数据节点的内存地址,该RDMA虚拟内存空间用于该第二数据节点根据目标操作请求通过RDMA访问该第一数据节点上的该目标数据,该目标操作请求为管理节点路由到该第二数据节点上访问该目标数据的操作请求;应理解,目标数据可以是第一数据节点上的热点数据,热点数据为经常访问或者访问次数超过特定阈值的数据。这里的目标操作请求可以是对目标数据的负载或者访问。元数据(Metadata),又称中介数据、中继数据,为描述数据的数据(data aboutdata),主要是描述数据属性(property)的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。该第一数据节点向该第二数据节点发送该目标数据。The second aspect of the embodiment of the present application provides a data migration method, which may include: in a load balancing solution, the first data node sends a migration request about target data to the second data node. The data migration request includes the memory address of the target data in the first data node, and the data migration request is used for the second data node to establish a remote direct memory access RDMA virtual memory space, and the memory address of the RDMA virtual memory space is mapped to The memory address of the first data node, the RDMA virtual memory space is used for the second data node to access the target data on the first data node through RDMA according to the target operation request, and the target operation request is routed to the second data node by the management node An operation request for accessing the target data on the second data node; it should be understood that the target data may be hot data on the first data node, and hot data is data that is frequently accessed or whose access times exceed a specific threshold. The target operation request here may be a load or access to target data. Metadata, also known as intermediary data and relay data, is the data describing data (data about data), mainly describing the information of data attributes (property), used to support such as indicating storage location, historical data, resource search, File recording and other functions. The first data node sends the target data to the second data node.

在本申请实施例中,管理节点可以收集集群信息,该集群信息可以包括每个数据节点以及数据节点上每个store的容量和负载信息等。管理节点根据收集的集群信息可以触发负载均衡,第一数据节点接收管理节点发送的触发负载均衡的指令,向第二数据节点发送关于目标数据的数据迁移请求,或者,第一数据节点向管理节点发送负载大于第一阈值的指令,第二数据节点向管理节点发送负载小于第二阈值的指令,管理节点可以确认负载大于第一阈值的第一数据节点为源数据节点,确认负载小于第二阈值的第二数据节点为目标数据节点,管理节点可以向第一数据节点发送触发负载均衡的指令,第一数据节点向第二数据节点发送关于目标数据的数据迁移请求,该数据迁移请求用于该第二数据节点建立RDMA虚拟内存空间,该RDMA虚拟内存空间的内存地址映射到该第一数据节点的内存地址,所以,在第二数据节点上可以直接对第一数据节点的目标数据进行访问,不需要等到目标数据在第二数据节点上复制完成再迁移目标操作请求。第一数据节点向第二数据节点发送目标数据。利用RDMA虚拟内存映射技术,将待迁移数据映射到第二数据节点的虚拟内存空间中,并将使用权和负载全部切换到第二数据节点,从而无需等待数据复制完成就可以迁移负载,提升负载迁移的速度。In the embodiment of the present application, the management node may collect cluster information, and the cluster information may include capacity and load information of each data node and each store on the data node. The management node can trigger load balancing according to the collected cluster information. The first data node receives the trigger load balancing instruction sent by the management node, and sends a data migration request about the target data to the second data node, or the first data node sends a request to the management node Send an instruction that the load is greater than the first threshold, the second data node sends an instruction that the load is less than the second threshold to the management node, the management node can confirm that the first data node with the load greater than the first threshold is the source data node, and confirm that the load is less than the second threshold The second data node is the target data node, the management node can send an instruction to trigger load balancing to the first data node, and the first data node sends a data migration request about the target data to the second data node, and the data migration request is used for the The second data node establishes an RDMA virtual memory space, and the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, so the target data of the first data node can be directly accessed on the second data node, There is no need to wait for the target data to be replicated on the second data node before migrating the target operation request. The first data node sends the target data to the second data node. Using RDMA virtual memory mapping technology, the data to be migrated is mapped to the virtual memory space of the second data node, and all usage rights and loads are switched to the second data node, so that the load can be migrated without waiting for the completion of data copying, and the load can be increased speed of migration.

结合本申请实施例的第二方面,在本申请实施例的第二方面的第一种实现方式中,该第一数据节点向所述第二数据节点发送该目标数据,可以包括:该第一数据节点通过RDMA向该第二数据节点发送该目标数据。将RDMA技术用于分布式数据库集群实时数据迁移的过程,减少了数据迁移对第一数据节点上CPU资源的消耗,减小了迁移过程对第一数据节点正在运行的其它业务的影响,并提升了数据传输的速度。With reference to the second aspect of the embodiments of the present application, in the first implementation manner of the second aspect of the embodiments of the present application, sending the target data to the second data node by the first data node may include: the first The data node sends the target data to the second data node through RDMA. Using RDMA technology for the process of real-time data migration of distributed database clusters reduces the consumption of CPU resources on the first data node by data migration, reduces the impact of the migration process on other services running on the first data node, and improves the speed of data transmission.

结合本申请实施例的第二方面、第二方面的第一种实现方式,在本申请实施例第二方面的第二种实现方式中,该第一数据节点向该第二数据节点发送目标数据,可以包括:该第一数据节点确定热点数据;应理解,关于第一数据节点中热点数据,可以是第一数据节点确定,也可以是管理节点向第一数据节点发送关于热点数据的信息,第一数据节点根据热点数据的信息确定热点数据。该第一数据节点将该热点数据分为M份数据,M为大于等于2的整数;该第一数据节点从该M份数据中选择该目标数据;该第一数据节点向第二数据节点发送该目标数据,或者,可以通过RDMA向该第二数据节点发送该目标数据。在本申请实施例中,第一数据节点可以将热点数据分为M份,以M份中的其中一份进行传输,这样,就可以减少在第一数据节点向第二数据节点发送数据的大小,缩短了目标数据不可用的时间和数据量,提升了服务质量。In combination with the second aspect of the embodiment of the present application and the first implementation of the second aspect, in the second implementation of the second aspect of the embodiment of the present application, the first data node sends the target data to the second data node , may include: the first data node determines the hotspot data; it should be understood that the hotspot data in the first data node may be determined by the first data node, or the management node may send information about the hotspot data to the first data node, The first data node determines the hotspot data according to the information of the hotspot data. The first data node divides the hotspot data into M pieces of data, and M is an integer greater than or equal to 2; the first data node selects the target data from the M pieces of data; the first data node sends the data to the second data node Alternatively, the target data may be sent to the second data node through RDMA. In the embodiment of this application, the first data node can divide the hotspot data into M shares, and transmit one of the M shares, so that the size of the data sent from the first data node to the second data node can be reduced , which shortens the unavailable time and data volume of the target data, and improves the service quality.

结合本申请实施例的第二方面的第二种实现方式,在本申请实施例第二方面的第三种实现方式中,该第一数据节点确定热点数据之前,该方法还可以包括:该第一数据节点接收管理节点发送的热点数据信息;该第一数据节点确定热点数据,可以包括:该第一数据节点根据该热点数据信息确定该热点数据。在本申请实施例中,提供了一种确定热点数据的方式,增加了方案的可行性。With reference to the second implementation of the second aspect of the embodiment of the present application, in the third implementation of the second aspect of the embodiment of the present application, before the first data node determines the hotspot data, the method may further include: the first data node A data node receives hot data information sent by a management node; determining the hot data by the first data node may include: determining the hot data by the first data node according to the hot data information. In the embodiment of the present application, a method for determining hotspot data is provided, which increases the feasibility of the solution.

本申请实施例第三方面提供一种数据迁移的方法,可以包括:管理节点确定数据迁移的第一数据节点和第二数据节点;即管理节点可以收集集群信息,该集群信息可以包括每个数据节点以及数据节点上每个store的容量和负载信息等。管理节点根据收集的集群信息可以触发负载均衡,第一数据节点接收管理节点发送的触发负载均衡的指令,向第二数据节点发送关于目标数据的数据迁移请求,或者,第一数据节点向管理节点发送负载大于第一阈值的指令,第二数据节点向管理节点发送负载小于第二阈值的指令,管理节点可以确认负载大于第一阈值的第一数据节点为源数据节点,确认负载小于第二阈值的第二数据节点为目标数据节点,管理节点可以向第一数据节点发送触发负载均衡的指令,第一数据节点向第二数据节点发送关于目标数据的迁移请求。该管理节点接收第二数据节点发送的修改元数据的指令;该管理节点根据该修改元数据的指令修改该元数据,使得访问该第一数据节点上目标数据的目标操作请求路由到该第二数据节点上。The third aspect of the embodiment of the present application provides a data migration method, which may include: the management node determines the first data node and the second data node for data migration; that is, the management node can collect cluster information, and the cluster information can include each data Capacity and load information of each store on nodes and data nodes. The management node can trigger load balancing according to the collected cluster information. The first data node receives the trigger load balancing instruction sent by the management node, and sends a data migration request about the target data to the second data node, or the first data node sends a request to the management node Send an instruction that the load is greater than the first threshold, the second data node sends an instruction that the load is less than the second threshold to the management node, the management node can confirm that the first data node with the load greater than the first threshold is the source data node, and confirm that the load is less than the second threshold The second data node is the target data node, the management node can send an instruction to trigger load balancing to the first data node, and the first data node sends a migration request about the target data to the second data node. The management node receives the metadata modification instruction sent by the second data node; the management node modifies the metadata according to the metadata modification instruction, so that the target operation request for accessing the target data on the first data node is routed to the second data node. on the data node.

在本申请实施例中,管理节点可以确定迁移目标数据的第一数据节点和第二数据节点,管理节点接收第二数据节点发送的修改元数据的指令;该管理节点根据该修改元数据的指令修改该元数据,使得访问该第一数据节点上目标数据的目标操作请求路由到该第二数据节点上。第二数据节点就可以通过RDMA访问第一数据节点上的目标数据了,不需要等到第一数据节点上的目标数据复制到第二数据节点上,才能将目标操作请求切换到第二数据节点。In this embodiment of the application, the management node can determine the first data node and the second data node of the migration target data, and the management node receives the instruction to modify the metadata sent by the second data node; the management node according to the instruction to modify the metadata The metadata is modified so that a target operation request for accessing target data on the first data node is routed to the second data node. The second data node can access the target data on the first data node through RDMA, and the target operation request can be switched to the second data node without waiting for the target data on the first data node to be copied to the second data node.

结合本申请实施例的第三方面,在本申请实施例第三方面的第一种可能的实现方式中,该管理节点确定数据迁移的第一数据节点和第二数据节点,可以包括:若该管理节点检测到有新数据节点加入数据库集群,则该管理节点从该数据库集群中选择一个负载最大的数据节点作为该第一数据节点,将该新数据节点作为该第二数据节点。提供了怎么确定数据迁移的第一数据节点和第二数据节点的一种实现方式,增加了方案的可行性。With reference to the third aspect of the embodiment of the present application, in the first possible implementation manner of the third aspect of the embodiment of the present application, the management node determining the first data node and the second data node for data migration may include: if the When the management node detects that a new data node joins the database cluster, the management node selects a data node with the largest load from the database cluster as the first data node, and the new data node as the second data node. An implementation manner of how to determine the first data node and the second data node for data migration is provided, which increases the feasibility of the solution.

结合本申请实施例的第三方面,在本申请实施例第三方面的第二种可能的实现方式中,该管理节点确定数据迁移的第一数据节点和第二数据节点,可以包括:若该管理节点未检测到有新数据节点加入数据库集群,则该管理节点从该数据库集群中选择一个负载最大的数据节点作为该第一数据节点,选择一个负载最小的数据节点作为该第二数据节点。提供了怎么确定数据迁移的第一数据节点和第二数据节点的另一种实现方式,增加了方案的可行性。With reference to the third aspect of the embodiment of the present application, in the second possible implementation manner of the third aspect of the embodiment of the present application, the management node determining the first data node and the second data node for data migration may include: if the If the management node does not detect that there is a new data node joining the database cluster, the management node selects a data node with the largest load from the database cluster as the first data node, and selects a data node with the least load as the second data node. Another implementation manner of how to determine the first data node and the second data node for data migration is provided, which increases the feasibility of the solution.

本申请实施例第四方面提供一种数据节点,具有无需等待目标数据从第一数据节点向第二数据节点迁移完成就可以迁移负载,提升负载迁移的速度,降低时延的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。The fourth aspect of the embodiment of the present application provides a data node, which has the function of migrating load without waiting for the target data to be migrated from the first data node to the second data node, increasing the speed of load migration, and reducing the time delay. This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.

本申请实施例第五方面提供一种数据节点,具有无需等待目标数据从第一数据节点向第二数据节点迁移完成就可以迁移负载,提升负载迁移的速度,降低时延的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。The fifth aspect of the embodiment of the present application provides a data node, which has the function of migrating load without waiting for the target data to be migrated from the first data node to the second data node, increasing the speed of load migration, and reducing the delay. This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.

本申请实施例第六方面提供一种管理节点,具有无需等待目标数据从第一数据节点向第二数据节点迁移完成就可以迁移负载,提升负载迁移的速度,降低时延的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。The sixth aspect of the embodiment of the present application provides a management node, which can migrate loads without waiting for the target data to be migrated from the first data node to the second data node, so as to increase the speed of load migration and reduce the delay. This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.

本申请实施例第七方面提供一种数据节点,可以包括:The seventh aspect of the embodiment of the present application provides a data node, which may include:

收发器,处理器,存储器和总线,该收发器、该处理器和该存储器通过该总线连接;a transceiver, a processor, a memory and a bus, the transceiver, the processor and the memory connected by the bus;

该存储器,用于存储操作指令;The memory is used to store operation instructions;

该收发器,用于接收第一数据节点发送的关于目标数据的数据迁移请求,该数据迁移请求包含该目标数据在该第一数据节点中的内存地址;向管理节点发送修改元数据的指令,该修改元数据的指令用于该管理节点修改该元数据,以使得访问该目标数据的目标操作请求路由到该第二数据节点;并存储该第一数据节点发送的该目标数据;The transceiver is configured to receive a data migration request about the target data sent by the first data node, where the data migration request includes the memory address of the target data in the first data node; send an instruction to modify metadata to the management node, The instruction for modifying metadata is used by the management node to modify the metadata, so that the target operation request for accessing the target data is routed to the second data node; and storing the target data sent by the first data node;

该处理器,用于根据该数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,该RDMA虚拟内存空间的内存地址映射到该第一数据节点的内存地址,该RDMA虚拟内存空间用于该第二数据节点根据目标操作请求通过RDMA访问该第一数据节点上的目标数据。The processor is configured to establish a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used for the second The data node accesses the target data on the first data node through RDMA according to the target operation request.

本申请实施例第八方面提供一种数据节点,可以包括:The eighth aspect of the embodiment of the present application provides a data node, which may include:

收发器,存储器和总线,该收发器和该存储器通过该总线连接;a transceiver, a memory, and a bus, the transceiver and the memory connected via the bus;

该存储器,用于存储操作指令;The memory is used to store operation instructions;

该收发器,用于向第二数据节点发送关于目标数据的数据迁移请求,该数据迁移请求包含该目标数据在该第一数据节点中的内存地址,该数据迁移请求用于该第二数据节点建立远程直接内存访问RDMA虚拟内存空间,该RDMA虚拟内存空间的内存地址映射到该第一数据节点的内存地址,该RDMA虚拟内存空间用于该第二数据节点根据目标操作请求通过RDMA访问该第一数据节点上的该目标数据,该目标操作请求为管理节点路由到该第二数据节点上访问该目标数据的操作请求;向该第二数据节点发送该目标数据。The transceiver is configured to send a data migration request about the target data to the second data node, the data migration request includes the memory address of the target data in the first data node, and the data migration request is used for the second data node Establishing a remote direct memory access RDMA virtual memory space, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used by the second data node to access the first data node through RDMA according to the target operation request For the target data on a data node, the target operation request is an operation request routed by the management node to the second data node to access the target data; and the target data is sent to the second data node.

本申请实施例第九方面提供一种管理节点,可以包括:A ninth aspect of the embodiment of the present application provides a management node, which may include:

收发器,处理器,存储器和总线,该收发器、该处理器和该存储器通过该总线连接;a transceiver, a processor, a memory and a bus, the transceiver, the processor and the memory connected by the bus;

该存储器,用于存储操作指令;The memory is used to store operation instructions;

该处理器,用于确定数据迁移的第一数据节点和第二数据节点;根据该修改元数据的指令修改该元数据,使得访问该第一数据节点上目标数据的目标操作请求路由到该第二数据节点上;The processor is configured to determine a first data node and a second data node for data migration; modify the metadata according to the instruction for modifying the metadata, so that a target operation request for accessing target data on the first data node is routed to the second data node On the second data node;

该收发器,用于接收第二数据节点发送的修改元数据的指令。The transceiver is configured to receive an instruction for modifying metadata sent by the second data node.

本申请实施例第十方面提供一种分布式数据库系统,该分布式数据库系统包括第一数据节点和第二数据节点,该第一数据节点为执行本申请第一方面或第一方面任一可选实现方式中所述的第一数据节点;该第二数据节点为执行本申请第二方面或第二方面任一可选实现方式中所述的第二数据节点。The tenth aspect of the embodiment of the present application provides a distributed database system. The distributed database system includes a first data node and a second data node. The first data node described in the optional implementation manner; the second data node is the second data node described in the second aspect of the present application or any optional implementation manner of the second aspect.

本发明实施例第十一方面提供一种存储介质,需要说明的是,本发的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产口的形式体现出来,该计算机软件产品存储在一个存储介质中,用于储存为上述设备所用的计算机软件指令,其包含用于执行上述第一方面、第二方面、第三方面为数据节点或者管理节点所设计的程序。The eleventh aspect of the embodiments of the present invention provides a storage medium. It should be noted that the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be produced by software. It is embodied in the form that the computer software product is stored in a storage medium for storing computer software instructions used by the above-mentioned equipment, which includes data nodes or management The program designed by the node.

该存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and other media capable of storing program codes.

本发明实施例第十二方面提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如本申请第一方面或第一方面任一可选实现方式,或者,本申请第二方面或第二方面任一可选实现方式,或者,本申请第三方面或第三方面任一可选实现方式中所述的方法。The twelfth aspect of the embodiments of the present invention provides a computer program product containing instructions, which, when run on a computer, causes the computer to execute the first aspect of the present application or any optional implementation of the first aspect, or, the present application The second aspect or any optional implementation manner of the second aspect, or the method described in the third aspect of the present application or any optional implementation manner of the third aspect.

从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

在本申请实施例中,第二数据节点根据第一数据节点发送的关于目标数据的数据迁移请求,建立RDMA虚拟内存空间,第二数据节点向管理节点发送修改元数据的指令,管理节点可以根据修改元数据的指令修改元数据,以使得访问所述目标数据的目标操作请求路由到所述第二数据节点;从而,不需要等到目标数据在第二数据节点上复制完成再迁移目标操作请求,降低时延;第一数据节点将目标数据向第二数据节点发送。利用RDMA虚拟内存技术,将目标数据映射到第二数据节点的RDMA虚拟内存空间中,并将使用权和负载全部切换到第二数据节点,从而无需等待目标数据复制完成就可以迁移负载,提升了负载迁移的速度;进一步的,可以将RDMA技术用于分布式数据库集群实时数据迁移的过程,减少了数据迁移对第一数据节点上CPU资源的消耗,减小了迁移过程对第一数据节点正在运行的其它业务的影响,并提升了数据传输的速度。将目标数据分小块迁移,从而缩短了目标数据不可用的时间和数据量,提升了服务质量。节约了时间,降低了时延。In this embodiment of the application, the second data node establishes the RDMA virtual memory space according to the data migration request about the target data sent by the first data node, and the second data node sends an instruction to modify the metadata to the management node, and the management node can according to The instruction for modifying the metadata modifies the metadata so that the target operation request for accessing the target data is routed to the second data node; thus, the target operation request does not need to be migrated until the target data is copied on the second data node, Reduce the delay; the first data node sends the target data to the second data node. Using RDMA virtual memory technology, the target data is mapped to the RDMA virtual memory space of the second data node, and all usage rights and loads are switched to the second data node, so that the load can be migrated without waiting for the copy of the target data to be completed, improving the The speed of load migration; further, RDMA technology can be used in the process of real-time data migration of distributed database clusters, which reduces the consumption of CPU resources on the first data node by data migration and reduces the impact of the migration process on the first data node. The impact of other running businesses and increased data transmission speed. The target data is migrated in small pieces, thereby shortening the unavailable time and data volume of the target data, and improving the service quality. Save time and reduce delay.

附图说明Description of drawings

为了更清楚地说明本申请实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings that are required in the description of the embodiments and prior art. Obviously, the accompanying drawings in the following description are only some implementations of the present application For example, other drawings can also be obtained from these drawings.

图1为现有技术所应用的一个场景示意图;FIG. 1 is a schematic diagram of a scene applied in the prior art;

图2为现有技术中负载迁移的一个示意图;FIG. 2 is a schematic diagram of load migration in the prior art;

图3为现有技术所应用的一个系统架构图;FIG. 3 is a system architecture diagram applied in the prior art;

图4为现有技术中数据迁移的一个流程示意图;Fig. 4 is a schematic flow chart of data migration in the prior art;

图5为本申请实施例所应用的一个分布式集群总体架构图;FIG. 5 is an overall architecture diagram of a distributed cluster applied in the embodiment of the present application;

图6为本申请实施例中数据迁移的方法的一个实施例示意图;FIG. 6 is a schematic diagram of an embodiment of a method for data migration in the embodiment of the present application;

图7为本申请实施例中触发负载均衡的流程示意图;FIG. 7 is a schematic flow diagram of triggering load balancing in an embodiment of the present application;

图8为本申请实施例中负载迁移的一个示意图;FIG. 8 is a schematic diagram of load migration in the embodiment of the present application;

图9为本申请实施例中数据块分片传输的一个示意图;FIG. 9 is a schematic diagram of data block fragmentation transmission in the embodiment of the present application;

图10为本申请实施例中分片传输算法的一个流程示意图;FIG. 10 is a schematic flow diagram of the fragmentation transmission algorithm in the embodiment of the present application;

图11为本申请实施例中数据节点的一个实施例示意图;FIG. 11 is a schematic diagram of an embodiment of a data node in the embodiment of the present application;

图12A为本申请实施例中数据节点的另一个实施例示意图;FIG. 12A is a schematic diagram of another embodiment of a data node in the embodiment of the present application;

图12B为本申请实施例中数据节点的另一个实施例示意图;FIG. 12B is a schematic diagram of another embodiment of a data node in the embodiment of the present application;

图13为本申请实施例中数据节点的另一个实施例示意图。FIG. 13 is a schematic diagram of another embodiment of a data node in the embodiment of the present application.

具体实施方式Detailed ways

本申请实施例提供了一种数据迁移的方法以及数据节点,用于无需等待目标数据从第一数据节点向第二数据节点迁移完成就可以迁移负载,提升了负载迁移的速度,降低时延。The embodiment of the present application provides a data migration method and a data node, which are used to migrate loads without waiting for the target data to be migrated from the first data node to the second data node, which improves the speed of load migration and reduces the time delay.

为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only a part of the application Examples, but not all examples. Based on the embodiments in this application, all should belong to the protection scope of this application.

随着数据库技术的不断发展,用户对于数据库的扩展能力,容灾能力的要求不断提升,分布式数据库已经越来越受到用户的欢迎。分布式数据库相对于传统数据库的一个很重要的优点就在于,可以把高负载数据节点上的数据迁移到低负载数据节点上;或者把老数据节点上的数据迁移到新加入数据节点上,从而实现备份和负载均衡。而这也必然会造成很大的网络开销和性能损失,服务器间实时数据迁移已经成为分布式数据库最大的瓶颈之一。With the continuous development of database technology, users' requirements for database expansion and disaster recovery capabilities continue to increase, and distributed databases have become more and more popular with users. A very important advantage of distributed databases over traditional databases is that they can migrate data from high-load data nodes to low-load data nodes; or migrate data from old data nodes to newly added data nodes, thereby Implement backup and load balancing. And this will inevitably cause a lot of network overhead and performance loss. Real-time data migration between servers has become one of the biggest bottlenecks in distributed databases.

因此,分布式数据库采用了远程直接内存访问(Remote Direct Memory Access,RDMA)技术让网卡直接读取应用层数据,绕过了传统传输控制协议(Transmission ControlProtocol,TCP)/互联网协议(Internet Protocol,IP)的处理过程,从而大大提升了网络传输的速度,减小了数据迁移对中央处理器(Central Processing Unit,CPU)资源的消耗,从而为解决数据迁移瓶颈,带来了新的思路。Therefore, the distributed database adopts the remote direct memory access (Remote Direct Memory Access, RDMA) technology to allow the network card to directly read the application layer data, bypassing the traditional Transmission Control Protocol (Transmission Control Protocol, TCP) / Internet Protocol (Internet Protocol, IP ), thereby greatly improving the speed of network transmission and reducing the consumption of Central Processing Unit (CPU) resources by data migration, thereby bringing new ideas to solve the bottleneck of data migration.

其中,RDMA是指一台计算机可以访问另一台远程计算机的内存数据,可以像读/写本机内存一样读/写远程计算机的内存数据。其实现主要是通过RDMA零拷贝网络技术,RDMA零拷贝网络技术使网络适配器(Network Interface Card,NIC)可以直接与应用内存相互传输数据,从而消除了在应用内存与内核内存之间复制数据的需要,大大节省了CPU资源,提升了数据传输的速度。Among them, RDMA means that one computer can access the memory data of another remote computer, and can read/write the memory data of the remote computer just like reading/writing the local memory. Its realization is mainly through the RDMA zero-copy network technology, which enables the network adapter (Network Interface Card, NIC) to directly transfer data to and from the application memory, thus eliminating the need to copy data between the application memory and the kernel memory , which greatly saves CPU resources and improves the speed of data transmission.

如图1所示,为现有技术所应用的一个场景示意图,在终端(包括手机、个人计算机(personal computer,PC)和平板等)上安装有邮件系统的客户端,简称邮件客户端;终端上的邮件客户端通过网络,与邮件系统的服务器相连;服务器会向底层的数据节点发送读/写请求。为了实现负载均衡,第一数据节点上的目标数据会向第二数据节点上迁移。在数据迁移发生前,服务器对第一数据节点有一些访问流(如读/写请求);在数据迁移过程中,服务器对目标数据的访问仍然集中在第一数据节点上;在数据迁移结束后,服务器对目标数据的访问会切换到第二数据节点上。As shown in Figure 1, it is a schematic diagram of a scene applied by the prior art, and a client of a mail system is installed on a terminal (including a mobile phone, a personal computer (PC) and a tablet, etc.), referred to as a mail client; The mail client on the network is connected to the server of the mail system through the network; the server will send read/write requests to the underlying data nodes. In order to achieve load balancing, the target data on the first data node will be migrated to the second data node. Before the data migration occurs, the server has some access flows (such as read/write requests) to the first data node; during the data migration process, the server’s access to the target data is still concentrated on the first data node; after the data migration is completed , the server's access to the target data will be switched to the second data node.

如图2所示,为现有技术中负载迁移的一个示意图。在数据迁移开始时,首先在第一数据节点的操作日志(Log)中,标记数据迁移开始时的位置,然后在第一数据节点检索需要迁移的数据项,简称为目标数据或者迁移数据,检索完成以后,将检索到需要迁移的数据项通过网络全部复制到第二数据节点上;复制完成以后,把操作日志中从标记位置开始的对目标数据的操作请求,全部更新到第二数据节点上;最后,把所有对目标数据的访问和负载,全部切换到第二数据节点上;至此,数据迁移过程结束。As shown in FIG. 2 , it is a schematic diagram of load migration in the prior art. When the data migration starts, first mark the location at the start of the data migration in the operation log (Log) of the first data node, and then retrieve the data items to be migrated on the first data node, referred to as target data or migration data, and retrieve After the completion, copy all retrieved data items that need to be migrated to the second data node through the network; after the copy is completed, update all the operation requests for the target data starting from the marked position in the operation log to the second data node ;Finally, switch all the accesses and loads to the target data to the second data node; so far, the data migration process ends.

但是,目标数据的负载需要等到在第二数据节点上的数据复制完以后,才能切换到第二数据节点,而且负载迁移速度较慢;目标数据需要先复制到内核缓冲区,再发送到网络上,对第一数据节点的CPU资源消耗很大,会影响第一数据节点处理其他负载的性能,且数据传输速度较慢,时延较大。However, the load of the target data needs to wait until the data on the second data node is copied before switching to the second data node, and the load migration speed is slow; the target data needs to be copied to the kernel buffer first, and then sent to the network , the CPU resource consumption of the first data node is very large, which will affect the performance of the first data node for processing other loads, and the data transmission speed is relatively slow and the time delay is relatively large.

如图3所示,为现有技术所应用的一个系统架构图。其中各组件功能列举如下:As shown in FIG. 3 , it is a system architecture diagram applied in the prior art. The functions of each component are listed as follows:

虚拟机(Virtual Machine,VM):虚拟机实例;Virtual Machine (VM): virtual machine instance;

虚拟机监视器(Hypervisor):将硬件层设备抽象为虚拟设备,提供给虚拟机实例;Virtual machine monitor (Hypervisor): abstract the hardware layer device into a virtual device and provide it to the virtual machine instance;

主机迁移代理(Host Migration Agent):监视有没有内存被修改,将被修改内存块加入RDMA队列,然后异步的从RDMA队列中选取合适的内存块并初始化选中内存块的RDMA传输过程;Host Migration Agent (Host Migration Agent): monitors whether memory has been modified, adds the modified memory block to the RDMA queue, then asynchronously selects a suitable memory block from the RDMA queue and initializes the RDMA transfer process of the selected memory block;

RDMA通信管理(RDMAcommunication manager,RDMA CM):负责将待传输的内存块注册到RDMA网络适配器;RDMA communication manager (RDMA CM): responsible for registering the memory block to be transferred to the RDMA network adapter;

RDMA网络适配器(RDMANetwork Interface Card,RNIC):具有RDMA功能的网卡。RDMA Network Adapter (RDMANetwork Interface Card, RNIC): A network card with RDMA function.

即第一数据节点上的主机迁移代理会实时监控内存情况,一旦发现有内存块被修改,就立刻将该内存块加入一个队列中。同时,主机迁移代理还会异步的,从队列中选出一个内存块,注册到RDMA网卡上,并通过RDMA方式传输到第二数据节点。其中,从队列选出内存块要遵循一定的规则:第一,被选中的内存块和已经在传输的内存块的大小不能超过一个阈值;第二,在负荷第一条规则的内存块中,可以选中最早/晚被修改,或者修改最频繁的内存块。That is, the host migration agent on the first data node will monitor the memory situation in real time, and once it finds that a memory block has been modified, it will immediately add the memory block to a queue. At the same time, the host migration agent also asynchronously selects a memory block from the queue, registers it on the RDMA network card, and transmits it to the second data node through RDMA. Among them, the selection of memory blocks from the queue must follow certain rules: first, the size of the selected memory block and the memory block already being transferred cannot exceed a threshold; second, in the memory block that loads the first rule, You can select the earliest/latest modified memory block, or the most frequently modified memory block.

如图4所示,为现有技术中数据迁移的一个流程示意图,每个被修改的内存块都要经过以下几种状态的变更过程。在第一数据节点上,内存块被修改后,首先处于入队;选中以后会被锁定;然后注册到RDMA网卡;之后经RDMA方式传输到第二数据节点;传输完成后,将该内存块从RDMA网卡注销;最后解除对该内存块的锁定。这里是以数据块为单位进行数据迁移的,那么迁移时在选中的内存块上可能会有大量数据长时间处于锁定不可用状态,从而影响服务质量。As shown in FIG. 4 , which is a schematic flow chart of data migration in the prior art, each modified memory block has to go through the following state change processes. On the first data node, after the memory block is modified, it is first in the queue; after being selected, it will be locked; then registered to the RDMA network card; and then transferred to the second data node through RDMA; after the transfer is completed, the memory block is transferred from RDMA network card logout; finally unlock the memory block. Here, data migration is performed in units of data blocks. During the migration, a large amount of data may be locked and unavailable on the selected memory block for a long time, thereby affecting the quality of service.

如图5所示,为本申请实施例所应用的一个分布式集群总体架构图。在本申请实施例中,利用RDMA虚拟内存映射技术,将第一数据节点的迁移数据映射到第二数据节点的RDMA虚拟内存空间中,并将使用权和负载全部切换到第二数据节点,从而无需等待数据复制完成就可以迁移负载,提升负载迁移的速度,降低时延;将RDMA技术用于分布式数据库集群实时数据迁移的过程,还可以减少数据传输对第一数据节点CPU资源的消耗,减小迁移过程对正在运行的其它业务的影响,并提升了数据传输的速度;进一步的,第一数据节点将数据块再分块迁移,可以缩短目标数据不可用的时间和数据量,提升服务质量。As shown in FIG. 5 , it is an overall architecture diagram of a distributed cluster applied in the embodiment of the present application. In the embodiment of this application, the migration data of the first data node is mapped to the RDMA virtual memory space of the second data node by using the RDMA virtual memory mapping technology, and all usage rights and loads are switched to the second data node, thereby The load can be migrated without waiting for the completion of data replication, which improves the speed of load migration and reduces the delay; the use of RDMA technology for the process of real-time data migration of distributed database clusters can also reduce the consumption of CPU resources of the first data node by data transmission. Reduce the impact of the migration process on other running businesses, and increase the speed of data transmission; further, the first data node migrates the data blocks into blocks, which can shorten the unavailable time and data volume of the target data, and improve service quality.

图5中上部是数据库集群的物理结构,下部是数据的逻辑结构。从图5中上可以看到,集群中共有5个数据节点(泛指5台机器,并不限于5台),每个数据节点上有3个存储store(也不限于3个)。从图5中下部可以看到,每个store上,存放了多个段落range(range表示将数据库存储的所有数据,划分成多个段落,每个段落称为一个range,例如,按照ASCII首字符,可以将所有的数据划分成0-9,a-c,c-e等多个range)。每个store上存放若干个range,例如,数据节点node1的store1上,存放了0-9,a-c,e-g,g-i,i-k,k-m等6个range。In Figure 5, the upper part is the physical structure of the database cluster, and the lower part is the logical structure of the data. As can be seen from Figure 5, there are 5 data nodes in the cluster (generally referring to 5 machines, not limited to 5), and each data node has 3 storage stores (not limited to 3). As can be seen from the lower part of Figure 5, each store stores multiple paragraphs range (range means that all the data stored in the database is divided into multiple paragraphs, each paragraph is called a range, for example, according to the ASCII initial character , you can divide all the data into multiple ranges such as 0-9, a-c, c-e, etc.). Each store stores several ranges. For example, store1 of the data node node1 stores 6 ranges including 0-9, a-c, e-g, g-i, i-k, and k-m.

为了容灾的需要,可以将每个range,复制成若干个副本,分别放在不同数据节点的不同store上,以便当其中某一个数据节点故障时,其它数据节点上仍然保存了该数据节点的副本。从图5中可以看到,每个store上,都存放了多个range,不同的图案代表不同的range,相同的图案代表相同的range。例如,图中node1的store1和node2的store1都存放了Range a-c的副本。For the needs of disaster recovery, each range can be copied into several copies and placed on different stores of different data nodes, so that when one of the data nodes fails, other data nodes still save the data node copy. As can be seen from Figure 5, multiple ranges are stored on each store, different patterns represent different ranges, and the same pattern represents the same range. For example, store1 of node1 and store1 of node2 in the figure both store copies of Range a-c.

在数据库运行过程中,难免会遇到一种情况,就是有的range被访问得多,有的range被访问的少。这就会导致有的数据节点被访问得较多,有的数据节点被访问得较少,即负载不均衡。被访问得多的range,可以称之为热点range,被访问得多的数据节点,可以称之为热点数据节点。当热点数据节点的访问率达到一定限度时,则数据节点的性能会随之下降,则对该数据节点上range访问的时延也会加大,甚至会导致中断,死机。During the operation of the database, it is inevitable to encounter a situation that some ranges are accessed more and some ranges are accessed less. This will cause some data nodes to be accessed more, and some data nodes to be accessed less, that is, the load is unbalanced. A range that is visited more often can be called a hotspot range, and a data node that is visited more can be called a hotspot data node. When the access rate of a hot data node reaches a certain limit, the performance of the data node will decrease accordingly, and the delay of range access on the data node will also increase, and even cause interruption and crash.

为了避免由于负载不均衡导致的严重后果,数据库集群需要进行负载迁移,一般可以将热点数据节点上的部分range,迁移到其他数据节点上。但负载迁移的过程往往会有很大的开销。例如,range数据的复制需要消耗很大的网络带宽,而且,传统的网络传输,需要CPU参与,需要将待传输数据从用户空间复制到内核空间,再经过TCP/IP协议栈的处理以后,才能发送出去,接收过程亦是同样复杂。所以,负载迁移过程会给原本负担过重的热点数据节点造成更大的负担,后果可能是数据节点直接挂死。In order to avoid serious consequences caused by unbalanced load, database clusters need to perform load migration. Generally, some ranges on hot data nodes can be migrated to other data nodes. But the process of load migration often has a lot of overhead. For example, the copying of range data consumes a lot of network bandwidth. Moreover, traditional network transmission requires the participation of the CPU. The data to be transmitted needs to be copied from the user space to the kernel space, and then processed by the TCP/IP protocol stack. Sending out, the receiving process is equally complicated. Therefore, the load migration process will put a greater burden on the hot data nodes that were originally overburdened, and the consequence may be that the data nodes hang directly.

本申请实施例采用RDMA虚拟内存映射技术,将待迁移的range直接映射到第二数据节点上,从而无需等待数据复制完成,可以快速的迁移负载,降低时延。通过引入RDMA技术,将RDMA技术用于负载迁移的数据复制过程,从而大大提升了数据传输的速度,更重要的是,RDMA传输的全过程不需要第一数据节点的CPU参与,所以相比传统的网络传输方式,特别是对第一数据节点造成的负担大为减小。The embodiment of the present application adopts the RDMA virtual memory mapping technology to directly map the range to be migrated to the second data node, so that there is no need to wait for the completion of data copying, and the load can be quickly migrated and the delay can be reduced. By introducing RDMA technology, RDMA technology is used in the data replication process of load migration, which greatly improves the speed of data transmission. More importantly, the whole process of RDMA transmission does not require the CPU of the first data node to participate, so compared with the traditional The network transmission method, especially the burden on the first data node is greatly reduced.

下面以实施例的方式对本申请技术方案中数据迁移的方法做进一步的说明,如图6所示,为本申请实施例中数据迁移的方法的一个实施例示意图,包括:The method of data migration in the technical solution of the present application will be further described below in the form of an embodiment. As shown in FIG. 6, it is a schematic diagram of an embodiment of the method of data migration in the embodiment of the present application, including:

601、管理节点确定数据迁移的第一数据节点和第二数据节点。601. The management node determines a first data node and a second data node for data migration.

在本申请实施例中,在一种可能的实现方式中,管理节点定期的收集集群中每个数据节点的相关信息;即管理节点每隔一定时间,收集一次集群信息,该集群信息可以包括每个数据节点以及数据节点上每个store的容量和负载信息等。第一数据节点也可以称为源数据节点、第二数据节点也可以称为目标数据节点。In the embodiment of this application, in a possible implementation manner, the management node periodically collects relevant information of each data node in the cluster; that is, the management node collects cluster information at regular intervals, and the cluster information may include Each data node and the capacity and load information of each store on the data node. The first data node may also be called a source data node, and the second data node may also be called a target data node.

在另一种可能的实现方式中,第一数据节点可以向管理节点发起负载超过第一阈值的指令,该负载超过第一阈值的指令可以指示第一数据节点为源数据节点,第二数据节点向管理节点发送负载低于第二阈值的指令,该负载低于第二阈值的指令可以指示第二数据节点为目标数据节点。或者,管理节点接收第一数据节点发送的负载超过第一阈值的指令,确定第一数据节点为源数据节点,管理节点接收第二数据节点发送的低于第二阈值的指令,确定第二数据节点为目标数据节点。In another possible implementation, the first data node may send an instruction to the management node that the load exceeds the first threshold, and the instruction that the load exceeds the first threshold may indicate that the first data node is the source data node, and the second data node An instruction that the load is lower than the second threshold is sent to the management node, and the instruction that the load is lower than the second threshold may indicate that the second data node is the target data node. Or, the management node receives the instruction sent by the first data node that the load exceeds the first threshold, determines that the first data node is the source data node, and receives the instruction sent by the second data node that the load is lower than the second threshold, and determines that the second data node node is the target data node.

其中,管理节点根据集群信息确定数据迁移的第一数据节点和第二数据节点,可以包括但不限于以下的实现方式:Wherein, the management node determines the first data node and the second data node for data migration according to the cluster information, which may include but not limited to the following implementations:

(1)若管理节点检测到有新数据节点加入数据库集群,则管理节点从数据库集群中选择一个负载最大的数据节点作为第一数据节点,将新数据节点作为第二数据节点。(1) If the management node detects that a new data node joins the database cluster, the management node selects a data node with the largest load from the database cluster as the first data node, and uses the new data node as the second data node.

(2)若管理节点未检测到有新数据节点加入数据库集群,则管理节点从数据库集群中选择一个负载最大的数据节点作为第一数据节点,选择一个负载最小的数据节点作为第二数据节点。(2) If the management node does not detect that there is a new data node joining the database cluster, the management node selects a data node with the largest load from the database cluster as the first data node, and selects a data node with the smallest load as the second data node.

示例性的,如图7所示,为触发负载均衡的流程示意图。管理节点如果发现有新数据节点加入数据库集群,就立刻从数据库集群中选一个负载最大的store作为源store,并从该store上挑一个负载最大的热点range作为迁移数据,将新加入数据节点上的任意store作为目标store,触发负载迁移;如果没有新数据节点加入,就看有没有store的负载超出阈值,如果有,就从超出阈值的store中选一个负载最大的作为源store,并从该store上挑一个负载最大的热点range作为迁移数据。再在数据库集群中找一个负载最小的store作为目标store,触发负载均衡;如果既没有新数据节点加入,也没有store负载超阈值,就不触发负载均衡。Exemplarily, as shown in FIG. 7 , it is a schematic flow diagram of triggering load balancing. If the management node finds that a new data node joins the database cluster, it immediately selects a store with the largest load from the database cluster as the source store, and picks a hotspot range with the largest load from the store as the migration data, and transfers the newly added data node to the database cluster. Any store is used as the target store to trigger load migration; if no new data node is added, it depends on whether the load of any store exceeds the threshold. Pick a hotspot range with the largest load as the migration data. Then find a store with the smallest load in the database cluster as the target store to trigger load balancing; if neither new data nodes are added nor the store load exceeds the threshold, load balancing will not be triggered.

需要说明的是,管理节点可以是一个独立的服务器,也可以是集成在服务器上的一个模块或者单元,如管理模块或者路由模块,具体不做限定。It should be noted that the management node may be an independent server, or a module or unit integrated on the server, such as a management module or a routing module, which is not specifically limited.

602、第一数据节点将关于目标数据的数据迁移请求发送给第二数据节点。602. The first data node sends a data migration request about the target data to the second data node.

在本申请实施例中,该数据迁移请求可以包含目标数据在第一数据节点中的内存地址,数据迁移请求用于第二数据节点建立RDMA虚拟内存空间,RDMA虚拟内存空间的内存地址映射到第一数据节点的内存地址,RDMA虚拟内存空间用于第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据,目标操作请求为管理节点路由到第二数据节点上访问目标数据的操作请求。通常,热点数据就是这里的目标数据,也可以理解为第一数据节点上的迁移数据。应理解,热点数据就是访问比较频繁或者访问次数超过特定阈值的数据。In this embodiment of the application, the data migration request may include the memory address of the target data in the first data node, the data migration request is used for the second data node to establish an RDMA virtual memory space, and the memory address of the RDMA virtual memory space is mapped to the first data node The memory address of a data node, the RDMA virtual memory space is used for the second data node to access the target data on the first data node through RDMA according to the target operation request, and the target operation request is routed by the management node to access the target data on the second data node Action request. Usually, the hotspot data is the target data here, and can also be understood as the migration data on the first data node. It should be understood that hot data refers to data that is frequently accessed or whose access times exceed a specific threshold.

在一种可能的实现方式中,在第一数据节点向第二数据节点发送关于目标数据的数据迁移请求之前,第一数据节点和第二数据节点可以收到管理节点发送的触发负载均衡的指令,第一数据节点可以根据触发负载均衡的指令向第二数据节点发送关于目标数据的数据迁移请求,第二数据节点根据触发负载均衡的指令,做好接收目标数据的准备。In a possible implementation, before the first data node sends a data migration request about the target data to the second data node, the first data node and the second data node may receive an instruction to trigger load balancing from the management node , the first data node may send a data migration request about the target data to the second data node according to the instruction triggering load balancing, and the second data node is ready to receive the target data according to the instruction triggering load balancing.

在另一种可能的实现方式中,在第一数据节点向第二数据节点发送关于目标数据的数据迁移请求之前,第一数据节点向管理节点发送负载超过第一阈值的指令,第二数据节点向管理节点发送负载低于第二阈值的指令,管理节点可以确定第一数据节点为源数据节点,第二数据节点为目标数据节点,管理节点可以向第一数据节点发送触发负载均衡的指令。第一数据节点可以根据触发负载均衡的指令向第二数据节点发送关于目标数据的数据迁移请求,第二数据节点根据触发负载均衡的指令,做好接收目标数据的准备。In another possible implementation, before the first data node sends a data migration request about the target data to the second data node, the first data node sends an instruction to the management node that the load exceeds the first threshold, and the second data node Send an instruction that the load is lower than the second threshold to the management node, the management node can determine that the first data node is the source data node, and the second data node is the target data node, and the management node can send an instruction to trigger load balancing to the first data node. The first data node may send a data migration request about the target data to the second data node according to the instruction triggering load balancing, and the second data node is ready to receive the target data according to the instruction triggering load balancing.

可选的,管理节点还可以确定第一数据节点向第二数据节点迁移的目标数据,即热点数据,再向第一数据节点发送热点数据的信息,第一数据节点可以根据热点数据的信息确定热点数据。Optionally, the management node can also determine the target data to be migrated from the first data node to the second data node, that is, the hot data, and then send information about the hot data to the first data node, and the first data node can determine according to the information about the hot data hot data.

603、第二数据节点根据数据迁移请求建立RDMA虚拟内存空间。603. The second data node establishes an RDMA virtual memory space according to the data migration request.

在本申请实施例中,第二数据节点根据数据迁移请求建立RDMA虚拟内存空间,RDMA虚拟内存空间的内存地址映射到第一数据节点的内存地址,RDMA虚拟内存空间用于第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据。In this embodiment of the application, the second data node establishes an RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used by the second data node according to the target The operation requests to access target data on the first data node through RDMA.

604、第二数据节点将修改元数据的指令发送给管理节点。604. The second data node sends an instruction to modify the metadata to the management node.

在本申请实施例中,第二数据节点的虚拟内存空间建立以后,第二数据节点可以向管理节点发送修改元数据的指令,管理节点接收第二数据节点发送的修改元数据的指令。需要说明的是,元数据(Metadata),又称中介数据、中继数据,为描述数据的数据(dataabout data),主要是描述数据属性(property)的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。In this embodiment of the present application, after the virtual memory space of the second data node is established, the second data node may send an instruction to modify the metadata to the management node, and the management node receives the instruction to modify the metadata sent by the second data node. It should be noted that metadata (Metadata), also known as intermediary data and relay data, is data describing data (dataabout data), mainly describing information about data attributes (property), used to support such as indicating storage Data, resource search, file recording and other functions.

605、管理节点根据修改元数据的指令修改元数据,使得访问第一数据节点上目标数据的目标操作请求路由到第二数据节点上。605. The management node modifies the metadata according to the metadata modification instruction, so that the target operation request for accessing the target data on the first data node is routed to the second data node.

在本申请实施例中,管理节点收到第二数据节点发送的修改元数据的指令之后,管理节点可以根据修改元数据的指令修改元数据,使原先路由到第一数据节点上的目标数据的目标操作请求全部路由到第二数据节点上。In this embodiment of the application, after the management node receives the metadata modification instruction sent by the second data node, the management node can modify the metadata according to the metadata modification instruction, so that the target data originally routed to the first data node All target operation requests are routed to the second data node.

示例性的,如图8所示,为本申请实施例中负载迁移的一个示意图。通过采用RDMA内存映射技术,可以将第一数据节点S1上的迁移数据块P,直接映射到第二数据节点S2的数据块P1上,使得S2能够像访问自己的内存一样去访问P上的数据。然后管理节点通过修改数据库集群的元数据,可以将原来应该路由到S1上对P的操作请求,都路由到S2的P1上。Exemplarily, as shown in FIG. 8 , it is a schematic diagram of load migration in the embodiment of the present application. By using RDMA memory mapping technology, the migration data block P on the first data node S1 can be directly mapped to the data block P1 on the second data node S2, so that S2 can access the data on P as if it were accessing its own memory . Then, by modifying the metadata of the database cluster, the management node can route the operation requests to P that should have been routed to S1 to P1 of S2.

这样一来,第二数据节点就可以像访问S2上的数据块P1一样去访问S1上的数据块P,由S2来处理P上所有的负载(虽然实际还是要访问S1上的数据块P,但CPU是用的S2的CPU),而P到P1的映射过程和网络传输过程,则对用户透明,从而实现了P上负载的瞬间迁移。而且,由于RDMA方式的网络传输过程不需要CPU参与,所以负载迁移过程不会消耗S1的CPU资源,故不会影响S1上运行的其他业务。另外,通过内存映射,S1上数据块P迁移到S2上的过程被转换成S2内部数据复制过程(P1→P2),由S2统一管理,可以提升迁移效率。需要说明的是,在本申请实施例中,目标操作请求可以理解为负载。In this way, the second data node can access the data block P on S1 in the same way as accessing the data block P1 on S2, and S2 will handle all the load on P (although the actual data block P on S1 still needs to be accessed, But the CPU is the CPU of S2), and the mapping process from P to P1 and the network transmission process are transparent to users, thus realizing the instantaneous migration of load on P. Moreover, since the RDMA network transmission process does not require CPU participation, the load migration process will not consume the CPU resources of S1, so it will not affect other services running on S1. In addition, through memory mapping, the process of migrating the data block P on S1 to S2 is converted into the internal data copy process of S2 (P1 → P2), which is managed by S2 in a unified manner, which can improve the migration efficiency. It should be noted that, in this embodiment of the application, the target operation request can be understood as a load.

606、第二数据节点对接收的目标操作请求进行访问。606. The second data node accesses the received target operation request.

在本申请实施例中,当第二数据节点接收目标操作请求时,若目标操作请求访问的目标数据未保存在第二数据节点上,则第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据;若目标操作请求访问的目标数据已保存在第二数据节点上,则第二数据节点根据目标操作请求在已保存的目标数据上进行访问。In this embodiment of the application, when the second data node receives the target operation request, if the target data accessed by the target operation request is not stored on the second data node, the second data node accesses the first data through RDMA according to the target operation request The target data on the node; if the target data requested by the target operation has been stored on the second data node, the second data node will access the stored target data according to the target operation request.

在另外的几种实现方式中,如下所示:In several other implementations, as follows:

(1)若目标操作请求为写操作请求,且所述写操作请求指示增加新数据,则第二数据节点根据写操作请求在第二数据节点上进行写操作。(1) If the target operation request is a write operation request, and the write operation request indicates adding new data, the second data node performs a write operation on the second data node according to the write operation request.

(2)若目标操作请求是读操作请求,且目标操作请求访问的目标数据已保存在第二数据节点上,则第二数据节点根据目标操作请求在第二数据节点上进行访问。(2) If the target operation request is a read operation request, and the target data accessed by the target operation request has been stored on the second data node, then the second data node performs access on the second data node according to the target operation request.

(3)若目标操作请求是读操作请求,且目标操作请求访问的目标数据未保存在第二数据节点上,则第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据。(3) If the target operation request is a read operation request, and the target data accessed by the target operation request is not stored on the second data node, the second data node accesses the target data on the first data node through RDMA according to the target operation request.

(4)若所述目标操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且所述目标操作请求访问的所述目标数据已保存在所述第二数据节点上,则所述第二数据节点根据所述目标操作请求在所述第二数据节点上进行访问。(4) If the target operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the target operation request has been stored on the second data node, Then the second data node performs access on the second data node according to the target operation request.

(5)若所述目标操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且所述目标操作请求访问的所述目标数据未保存在所述第二数据节点上,则所述第二数据节点根据所述目标操作请求通过RDMA访问所述第一数据节点上的所述目标数据。(5) If the target operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the target operation request is not stored on the second data node, Then the second data node accesses the target data on the first data node through RDMA according to the target operation request.

607、第一数据节点将目标数据发送给第二数据节点。607. The first data node sends the target data to the second data node.

在本申请实施例中,即第二数据节点接管第一数据节点上迁移数据的负载以后,就可以开始将第一数据节点上的迁移数据(一般为热点range),复制到第二数据节点;迁移数据复制完成以后,负载迁移结束。第二数据节点接收并存储第一数据节点发送的目标数据。需要说明的是,热点数据可以理解为访问次数或者访问率超过阈值的数据。可选的,第一数据节点也可以通过RDMA方式将目标数据向第二数据节点发送,那么第二数据节点可以通过RDMA方式接收第一数据节点发送的目标数据。In the embodiment of this application, after the second data node takes over the load of the migrated data on the first data node, it can start to copy the migrated data (generally hotspot range) on the first data node to the second data node; After the migration data is copied, the load migration ends. The second data node receives and stores the target data sent by the first data node. It should be noted that hot data can be understood as data whose number of visits or visit rate exceeds a threshold. Optionally, the first data node may also send the target data to the second data node through RDMA, then the second data node may receive the target data sent by the first data node through RDMA.

第一数据节点向第二数据节点发送目标数据,可以包括:第一数据节点确定热点数据;第一数据节点将热点数据分为M份数据,M为大于等于2的整数;第一数据节点从M份数据中选择目标数据;第一数据节点向第二数据节点发送目标数据,或者,通过RDMA方式向第二数据节点发送目标数据。The first data node sends the target data to the second data node, which may include: the first data node determines the hot data; the first data node divides the hot data into M data, and M is an integer greater than or equal to 2; The target data is selected from the M data; the first data node sends the target data to the second data node, or sends the target data to the second data node through RDMA.

其中,第一数据节点确定热点数据,可以是第一数据节点根据自己的容量、负载信息、访问率等信息确定的,也可以是管理节点确定的第一数据节点的热点数据的信息并将其发送给第一数据节点,第一数据节点再根据热点数据的信息确定热点数据。Wherein, the hotspot data determined by the first data node may be determined by the first data node according to its own capacity, load information, access rate and other information, or it may be the information of the hotspot data of the first data node determined by the management node Send it to the first data node, and the first data node determines the hotspot data according to the information of the hotspot data.

第一数据节点通过RDMA向第二数据节点发送目标数据的时候,可以采用分片传输的方法。示例性的,如图9所示,为本申请实施例中数据块分片传输的一个示意图。因为数据在迁移过程中,可能会被第一数据节点和第二数据节点同时操作,所以数据在迁移过程中是锁定不可用的。如果锁定数据过大,则迁移时间也会相应更久,大量数据长时间不可用,必然导致服务质量下降,甚至会中断服务。所以,在本申请实施例中,将大数据块分成多个小块(一般按页分块)迁移,从而减小不可用的数据量,缩短数据不可用的时间,从而提升服务质量,避免服务中断。When the first data node sends target data to the second data node through RDMA, a fragmented transmission method may be used. Exemplarily, as shown in FIG. 9 , it is a schematic diagram of fragmented data block transmission in the embodiment of the present application. Because the data may be operated by the first data node and the second data node at the same time during the migration process, the data is locked and unavailable during the migration process. If the locked data is too large, the migration time will be correspondingly longer, and a large amount of data will be unavailable for a long time, which will inevitably lead to a decline in service quality and even service interruption. Therefore, in the embodiment of this application, large data blocks are divided into multiple small blocks (generally divided into blocks by page) for migration, thereby reducing the amount of unavailable data and shortening the time of data unavailable, thereby improving service quality and avoiding interruption.

在图9所示的示意图中:In the schematic diagram shown in Figure 9:

1)将迁移数据块P1分成多个小数据块(例如每小数据块一页),逐个把小数据块迁移到P2;1) Divide the migration data block P1 into multiple small data blocks (for example, one page per small data block), and migrate the small data blocks to P2 one by one;

2)把正在迁移的小数据块锁定(读/写都不可访问);2) Lock the small data block being migrated (read/write are not accessible);

3)复制过程中,所有对迁移数据的写操作请求都路由到P2上执行;3) During the replication process, all write requests for the migrated data are routed to P2 for execution;

4)对于读或更新操作请求,可以保存一个迁移状态表,表中记录每一个小数据块的迁移状态(复制完成,正在复制,未复制)。对于已经完成复制的数据块,直接到P2上执行读或更新操作;对于未复制或正在复制的数据块,则到P1上执行读或更新操作。4) For read or update operation requests, a migration status table can be saved, which records the migration status of each small data block (copy completed, copying, not copied). For the data blocks that have been copied, directly perform read or update operations on P2; for data blocks that are not copied or are being copied, perform read or update operations on P1.

需要注意的是,数据块P1其实是从第一数据节点的迁移数据块P映射到目标服务器上的镜像,所以如果不采用RDMA技术,而直接采用上述分块迁移算法,则每复制一小块数据到第二数据节点,都要修改该块数据的元数据信息,迁移速度必然大大降低。It should be noted that the data block P1 is actually a mirror image mapped from the migrated data block P of the first data node to the target server, so if the above-mentioned block migration algorithm is directly used instead of the RDMA technology, each small block When the data is transferred to the second data node, the metadata information of the block of data must be modified, and the migration speed will inevitably be greatly reduced.

如图10所示,为分片传输算法的一个流程示意图。第一数据节点上的数据块P复制开始以后,将迁移数据分片(一般按页),然后逐片复制;然后锁定正在复制的小数据块,使其读/写都不可访问;当有操作请求要访问迁移数据时,看该操作请求是否为写操作请求,如果是写操作请求,直接路由到第二数据节点上;如果不是写操作(例如读操作请求),就判断该小数据块是否已经复制完成,如果已经复制完了,就把读操作请求路由到第二数据节点上执行;如果没有复制完,就把该读操作请求路由到第一数据节点上执行。As shown in FIG. 10 , it is a schematic flowchart of a fragment transmission algorithm. After the copying of the data block P on the first data node starts, the migrated data will be fragmented (generally by page), and then copied piece by piece; then the small data block being copied is locked to make it inaccessible for reading/writing; when there is an operation When requesting access to migrated data, check whether the operation request is a write operation request. If it is a write operation request, it will be routed directly to the second data node; if it is not a write operation (such as a read operation request), it will be judged whether the small data block is The copying has been completed. If the copying has been completed, the read operation request is routed to the second data node for execution; if the copying is not completed, the read operation request is routed to the first data node for execution.

在本申请实施例中,管理节点确定数据迁移的第一数据节点和第二数据节点;第一数据节点向第二数据节点发送关于目标数据的数据迁移请求;第二数据节点接收第一数据节点发送的关于目标数据的数据迁移请求,数据迁移请求包含目标数据在第一数据节点中的内存地址;第二数据节点根据数据迁移请求建立RDMA虚拟内存空间,RDMA虚拟内存空间的内存地址映射到第一数据节点的内存地址,RDMA虚拟内存空间用于第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据;第二数据节点向管理节点发送修改元数据的指令;管理节点接收第二数据节点发送的修改元数据的指令;管理节点根据修改元数据的指令修改元数据,使得访问第一数据节点上目标数据的目标操作请求路由到第二数据节点上;第一数据节点通过RDMA向第二数据节点发送目标数据;第二数据节点通过RDMA接收并存储第一数据节点发送的目标数据。In this embodiment of the application, the management node determines the first data node and the second data node for data migration; the first data node sends a data migration request about the target data to the second data node; the second data node receives the first data node The data migration request sent about the target data, the data migration request includes the memory address of the target data in the first data node; the second data node establishes an RDMA virtual memory space according to the data migration request, and the memory address of the RDMA virtual memory space is mapped to the first data node The memory address of a data node, the RDMA virtual memory space is used for the second data node to access the target data on the first data node through RDMA according to the target operation request; the second data node sends an instruction to modify metadata to the management node; the management node receives The instruction to modify the metadata sent by the second data node; the management node modifies the metadata according to the instruction to modify the metadata, so that the target operation request for accessing the target data on the first data node is routed to the second data node; the first data node passes The RDMA sends the target data to the second data node; the second data node receives and stores the target data sent by the first data node through RDMA.

即第二数据节点根据第一数据节点发送的关于目标数据的数据迁移请求,建立RDMA虚拟内存空间,第二数据节点向管理节点发送修改元数据的指令,管理节点可以根据修改元数据的指令修改元数据,以使得访问目标数据的目标操作请求路由到第二数据节点,降低时延;从而,不需要等到目标数据在第二数据节点上复制完成再迁移目标操作请求;第一数据节点通过RDMA将目标数据向第二数据节点发送。利用RDMA虚拟内存技术,将目标数据映射到第二数据节点的RDMA虚拟内存空间中,并将使用权和负载全部切换到第二数据节点,从而无需等待目标数据复制完成就可以迁移负载,提升了负载迁移的速度;即将RDMA技术用于分布式数据库集群实时数据迁移的过程,减少了数据迁移对第一数据节点上CPU资源的消耗,减小了迁移过程对第一数据节点正在运行的其它业务的影响,并提升了数据传输的速度。进一步的,将目标数据分小块迁移,从而缩短了目标数据不可用的时间和数据量,提升了服务质量。That is, the second data node establishes an RDMA virtual memory space according to the data migration request about the target data sent by the first data node, and the second data node sends an instruction to modify the metadata to the management node, and the management node can modify the metadata according to the instruction to modify the metadata Metadata, so that the target operation request for accessing the target data is routed to the second data node, reducing the delay; thus, there is no need to wait until the target data is replicated on the second data node before migrating the target operation request; the first data node uses RDMA Send the target data to the second data node. Using RDMA virtual memory technology, the target data is mapped to the RDMA virtual memory space of the second data node, and all usage rights and loads are switched to the second data node, so that the load can be migrated without waiting for the copy of the target data to be completed, improving the The speed of load migration; that is, RDMA technology is used in the process of real-time data migration of distributed database clusters, which reduces the consumption of CPU resources on the first data node by data migration, and reduces the impact of the migration process on other businesses running on the first data node , and increase the speed of data transmission. Furthermore, the target data is migrated in small pieces, thereby shortening the unavailable time and data volume of the target data, and improving the service quality.

上面对本申请实施例中数据迁移的方法做了说明,下面对本申请实施例中的数据节点进行说明,如图11所示,为本申请实施例中数据节点的一个实施例示意图,包括:The method of data migration in the embodiment of this application has been described above, and the data nodes in the embodiment of this application will be described below, as shown in Figure 11, which is a schematic diagram of an embodiment of a data node in the embodiment of this application, including:

接收模块1101,用于接收第一数据节点发送的关于目标数据的数据迁移请求,数据迁移请求包含目标数据在第一数据节点中的内存地址;接收并存储第一数据节点发送的目标数据;The receiving module 1101 is configured to receive a data migration request about the target data sent by the first data node, the data migration request includes the memory address of the target data in the first data node; receive and store the target data sent by the first data node;

处理模块1102,用于根据数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,RDMA虚拟内存空间的内存地址映射到第一数据节点的内存地址,RDMA虚拟内存空间用于第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据;The processing module 1102 is configured to establish a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used for the second data node to operate according to the target Request to access target data on the first data node through RDMA;

发送模块1103,用于向管理节点发送修改元数据的指令,修改元数据的指令用于管理节点修改元数据,以使得访问目标数据的目标操作请求路由到第二数据节点。The sending module 1103 is configured to send an instruction for modifying metadata to the management node, where the instruction for modifying metadata is used for the management node to modify the metadata, so that the target operation request for accessing the target data is routed to the second data node.

可选的,在本申请实施例的一些实施例中,Optionally, in some embodiments of the embodiments of the present application,

接收模块1101,具体用于通过RDMA接收并存储第一数据节点发送的目标数据。The receiving module 1101 is specifically configured to receive and store target data sent by the first data node through RDMA.

可选的,在本申请的一些实施例中,目标数据为第一数据节点中将热点数据分为M份数据中的其中一份数据,M为大于等于2的整数。Optionally, in some embodiments of the present application, the target data is one of the hotspot data divided into M data in the first data node, where M is an integer greater than or equal to 2.

可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,

处理模块,还用于若目标操作请求为写操作请求,且写操作请求指示增加新数据,则处理模块根据写操作请求在第二数据节点上进行写操作。The processing module is further configured to: if the target operation request is a write operation request, and the write operation request indicates adding new data, the processing module performs a write operation on the second data node according to the write operation request.

可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,

处理模块1102,还用于若目标操作请求是读操作请求,或者对目标数据进行修改的写操作请求,且目标操作请求访问的目标数据已保存在第二数据节点上,则处理模块根据目标操作请求在第二数据节点上进行访问;若目标操作请求是读操作请求,或者对目标数据进行修改的写操作请求,且目标操作请求访问的目标数据未保存在第二数据节点上,则处理模块根据目标操作请求通过RDMA访问第一数据节点上的目标数据。The processing module 1102 is further configured to: if the target operation request is a read operation request, or a write operation request to modify the target data, and the target data accessed by the target operation request has been saved on the second data node, the processing module Requesting access on the second data node; if the target operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the target operation request is not stored on the second data node, the processing module The target data on the first data node is accessed through RDMA according to the target operation request.

如图12A所示,为本申请实施例中数据节点的另一个实施例示意图,包括:As shown in Figure 12A, it is a schematic diagram of another embodiment of the data node in the embodiment of the present application, including:

发送模块1201,用于向第二数据节点发送关于目标数据的数据迁移请求,数据迁移请求包含目标数据在第一数据节点中的内存地址,数据迁移请求用于第二数据节点建立远程直接内存访问RDMA虚拟内存空间,RDMA虚拟内存空间的内存地址映射到第一数据节点的内存地址,RDMA虚拟内存空间用于第二数据节点根据目标操作请求通过RDMA访问第一数据节点上的目标数据,目标操作请求为管理节点路由到第二数据节点上访问目标数据的操作请求;向第二数据节点发送目标数据。The sending module 1201 is configured to send a data migration request about the target data to the second data node, the data migration request includes the memory address of the target data in the first data node, and the data migration request is used for the second data node to establish remote direct memory access RDMA virtual memory space, the memory address of the RDMA virtual memory space is mapped to the memory address of the first data node, and the RDMA virtual memory space is used for the second data node to access the target data on the first data node through RDMA according to the target operation request, and the target operation The request is an operation request routed by the management node to the second data node to access the target data; and the target data is sent to the second data node.

可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,

发送模块1201,具体用于通过RDMA向第二数据节点发送目标数据。The sending module 1201 is specifically configured to send target data to the second data node through RDMA.

可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,

发送模块1201,具体用于确定热点数据;将热点数据分为M份数据,M为大于等于2的整数;从M份数据中选择目标数据;通过RDMA向第二数据节点发送目标数据。The sending module 1201 is specifically used to determine hot data; divide the hot data into M pieces of data, where M is an integer greater than or equal to 2; select target data from the M pieces of data; and send the target data to the second data node through RDMA.

可选的,在本申请的一些实施例中,在图12A所示的基础上,如图12B所示,为数据节点的另一个实施例示意图,数据节点还包括:Optionally, in some embodiments of the present application, on the basis of what is shown in FIG. 12A , as shown in FIG. 12B , it is a schematic diagram of another embodiment of a data node. The data node also includes:

接收模块1202,用于接收管理节点发送的热点数据信息;A receiving module 1202, configured to receive hotspot data information sent by the management node;

发送模块1201,具体用于根据热点数据信息确定热点数据。The sending module 1201 is specifically configured to determine hotspot data according to hotspot data information.

如图13所示,为本申请实施例中数据节点的另一个实施例示意图。As shown in FIG. 13 , it is a schematic diagram of another embodiment of the data node in the embodiment of the present application.

该数据节点可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1322(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据节点中的一系列指令操作。更进一步地,中央处理器1322可以设置为与存储介质1330通信,在数据节点上执行存储介质1330中的一系列指令操作。The data nodes may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1322 (for example, one or more processors) and memory 1332, one or one The storage medium 1330 (such as one or more mass storage devices) for storing the application program 1342 or the data 1344 above. Wherein, the memory 1332 and the storage medium 1330 may be temporary storage or persistent storage. The program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on data nodes. Furthermore, the central processing unit 1322 may be configured to communicate with the storage medium 1330, and execute a series of instruction operations in the storage medium 1330 on the data node.

数据节点还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358,和/或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The data node can also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or, one or more operating systems 1341, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

上述实施例中由第一数据节点和第二数据节点所执行的步骤都可以基于该图13所示的数据节点结构,此处不再赘述。The steps performed by the first data node and the second data node in the above embodiment can be based on the data node structure shown in FIG. 13 , and will not be repeated here.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a Solid State Disk (SSD)).

可选的,在本申请的一些实施例中,提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如上述图6中第一数据节点或第二数据节点所述的方法。Optionally, in some embodiments of the present application, a computer-readable storage medium is provided, including instructions, which, when run on a computer, cause the computer to execute the first data node or the second data node in FIG. 6 above. the method described.

可选的,在本申请的一些实施例中,提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如上述图6中第一数据节点或第二数据节点所述的方法。Optionally, in some embodiments of the present application, a computer program product containing instructions is provided, and when it runs on a computer, the computer executes the above-mentioned first data node or the second data node in FIG. 6. Methods.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes.

以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims (22)

1.一种数据迁移的方法,其特征在于,包括:1. A method for data migration, comprising: 第二数据节点接收来自第一数据节点的关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址;The second data node receives a data migration request about the target data from the first data node, where the data migration request includes a memory address of the target data in the first data node; 所述第二数据节点根据所述数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的数据;The second data node establishes a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space is used for The second data node accesses data on the first data node; 所述第二数据节点向管理节点发送修改元数据的指令,所述修改元数据的指令用于所述管理节点修改所述元数据,以使得访问所述目标数据的操作请求路由到所述第二数据节点;The second data node sends an instruction to modify metadata to the management node, and the instruction to modify metadata is used by the management node to modify the metadata, so that an operation request for accessing the target data is routed to the first Two data nodes; 所述第二数据节点接收来自所述第一数据节点的所述目标数据。The second data node receives the target data from the first data node. 2.根据权利要求1所述的方法,其特征在于,所述第二数据节点接收并存储所述第一数据节点发送的所述目标数据,包括:2. The method according to claim 1, wherein the second data node receives and stores the target data sent by the first data node, comprising: 所述第二数据节点通过RDMA接收并存储所述第一数据节点发送的所述目标数据。The second data node receives and stores the target data sent by the first data node through RDMA. 3.根据权利要求1或2所述的方法,其特征在于,所述目标数据为所述第一数据节点的一部分热点数据。3. The method according to claim 1 or 2, wherein the target data is a part of hotspot data of the first data node. 4.根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:4. The method according to any one of claims 1-3, wherein the method further comprises: 若所述操作请求为写操作请求,且所述写操作请求指示增加新数据,则所述第二数据节点根据所述写操作请求在所述第二数据节点上进行写操作。If the operation request is a write operation request, and the write operation request indicates adding new data, the second data node performs a write operation on the second data node according to the write operation request. 5.根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:5. The method according to any one of claims 1-3, wherein the method further comprises: 若所述操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且所述目标操作请求访问的所述目标数据已保存在所述第二数据节点上,则所述第二数据节点根据所述操作请求在所述第二数据节点上进行访问;If the operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the target operation request has been stored on the second data node, then the second The second data node performs access on the second data node according to the operation request; 若所述操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且所述操作请求访问的所述目标数据未保存在所述第二数据节点上,则所述第二数据节点根据所述操作请求通过RDMA访问所述第一数据节点上的所述目标数据。If the operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the operation request is not stored on the second data node, the second The data node accesses the target data on the first data node through RDMA according to the operation request. 6.一种数据迁移的方法,其特征在于,包括:6. A method for data migration, comprising: 第一数据节点向第二数据节点发送关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址,所述数据迁移请求用于所述第二数据节点建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的数据;The first data node sends a data migration request about the target data to the second data node, the data migration request includes the memory address of the target data in the first data node, and the data migration request is used for the second data node Two data nodes establish a remote direct memory access RDMA virtual memory space, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space is used for the second data node to access the data on the first data node; 所述第一数据节点向所述第二数据节点发送所述目标数据。The first data node sends the target data to the second data node. 7.根据权利要求6所述的方法,其特征在于,所述第一数据节点向所述第二数据节点发送所述目标数据,包括:7. The method according to claim 6, wherein the sending of the target data by the first data node to the second data node comprises: 所述第一数据节点通过RDMA向所述第二数据节点发送所述目标数据。The first data node sends the target data to the second data node through RDMA. 8.根据权利要求6或7所述的方法,其特征在于,所述第一数据节点向所述第二数据节点发送目标数据,包括:8. The method according to claim 6 or 7, wherein the sending of the target data by the first data node to the second data node comprises: 所述第一数据节点确定热点数据;The first data node determines hotspot data; 所述第一数据节点将所述热点数据分为M份数据,M为大于等于2的整数;The first data node divides the hotspot data into M pieces of data, where M is an integer greater than or equal to 2; 所述第一数据节点从所述M份数据中选择所述目标数据;The first data node selects the target data from the M shares of data; 所述第一数据节点向所述第二数据节点发送所述目标数据。The first data node sends the target data to the second data node. 9.根据权利要求8所述的方法,其特征在于,所述第一数据节点确定热点数据之前,所述方法还包括:9. The method according to claim 8, wherein before the first data node determines the hotspot data, the method further comprises: 所述第一数据节点接收管理节点发送的热点数据信息;The first data node receives the hotspot data information sent by the management node; 所述第一数据节点确定热点数据,包括:The first data node determines hotspot data, including: 所述第一数据节点根据所述热点数据信息确定所述热点数据。The first data node determines the hotspot data according to the hotspot data information. 10.一种数据节点,其特征在于,包括:10. A data node, characterized in that, comprising: 接收模块,用于接收来自第一数据节点的关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址;接收并存储所述第一数据节点发送的所述目标数据;A receiving module, configured to receive a data migration request about target data from a first data node, where the data migration request includes a memory address of the target data in the first data node; receive and store the first data The target data sent by the node; 处理模块,用于根据所述数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的数据;A processing module, configured to establish a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space is used for all The second data node accesses data on the first data node; 发送模块,用于向管理节点发送修改元数据的指令,所述修改元数据的指令用于所述管理节点修改所述元数据,以使得访问所述目标数据的操作请求路由到所述第二数据节点。A sending module, configured to send an instruction to modify metadata to a management node, where the instruction to modify metadata is used by the management node to modify the metadata so that an operation request for accessing the target data is routed to the second data node. 11.根据权利要求10所述的数据节点,其特征在于,11. The data node according to claim 10, characterized in that, 所述接收模块,具体用于通过RDMA接收并存储所述第一数据节点发送的所述目标数据。The receiving module is specifically configured to receive and store the target data sent by the first data node through RDMA. 12.根据权利要求10或11所述的数据节点,其特征在于,所述目标数据为所述第一数据节点的热点数据。12. The data node according to claim 10 or 11, wherein the target data is hotspot data of the first data node. 13.根据权利要求10-12任一所述的数据节点,其特征在于,13. The data node according to any one of claims 10-12, characterized in that, 所述处理模块,还用于若所述操作请求为写操作请求,且所述写操作请求指示增加新数据,则所述处理模块根据所述写操作请求在所述第二数据节点上进行写操作。The processing module is further configured to: if the operation request is a write operation request, and the write operation request indicates adding new data, the processing module writes on the second data node according to the write operation request operate. 14.根据权利要求10-12任一所述的数据节点,其特征在于,14. The data node according to any one of claims 10-12, characterized in that, 所述处理模块,还用于若所述操作请求是读操作请求,或者对所述目标数据进行修改的写操作请求,且所述操作请求访问的所述目标数据已保存在所述第二数据节点上,则所述处理模块根据所述操作请求在所述第二数据节点上进行访问;The processing module is further configured to: if the operation request is a read operation request, or a write operation request for modifying the target data, and the target data accessed by the operation request has been stored in the second data node, the processing module performs access on the second data node according to the operation request; 还用于若所述操作请求是读操作请求,或者对所述数据进行修改的写操作请求,且所述操作请求访问的所述目标数据未保存在所述第二数据节点上,则所述处理模块根据所述操作请求通过RDMA访问所述第一数据节点上的所述目标数据。It is also used if the operation request is a read operation request, or a write operation request for modifying the data, and the target data accessed by the operation request is not stored on the second data node, then the The processing module accesses the target data on the first data node through RDMA according to the operation request. 15.一种数据节点,其特征在于,包括:15. A data node, characterized in that, comprising: 发送模块,用于向第二数据节点发送关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址,所述数据迁移请求用于所述第二数据节点建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的数据。A sending module, configured to send a data migration request about target data to a second data node, where the data migration request includes a memory address of the target data in the first data node, and the data migration request is used for the The second data node establishes a remote direct memory access RDMA virtual memory space, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space is used for the second data node to access all Describe the data on the first data node. 16.根据权利要求15所述的数据节点,其特征在于,16. The data node according to claim 15, characterized in that, 所述发送模块,具体用于通过RDMA向所述第二数据节点发送所述目标数据。The sending module is specifically configured to send the target data to the second data node through RDMA. 17.根据权利要求15或16所述的数据节点,其特征在于,17. The data node according to claim 15 or 16, characterized in that, 所述发送模块,具体用于确定热点数据;将所述热点数据分为M份数据,M为大于等于2的整数;从所述M份数据中选择所述目标数据;向所述第二数据节点发送所述目标数据。The sending module is specifically used to determine hotspot data; divide the hotspot data into M pieces of data, where M is an integer greater than or equal to 2; select the target data from the M pieces of data; The node sends the target data. 18.根据权利要求15所述的数据节点,其特征在于,所述数据节点还包括:18. The data node according to claim 15, wherein the data node further comprises: 接收模块,用于接收管理节点发送的热点数据信息;The receiving module is used to receive the hotspot data information sent by the management node; 所述发送模块,具体用于根据所述热点数据信息确定所述热点数据。The sending module is specifically configured to determine the hotspot data according to the hotspot data information. 19.一种数据节点,其特征在于,包括:19. A data node, comprising: 收发器,处理器,存储器和总线,所述收发器、所述处理器和所述存储器通过所述总线连接;a transceiver, a processor, a memory, and a bus, the transceiver, the processor, and the memory being connected through the bus; 所述存储器,用于存储操作指令;The memory is used to store operation instructions; 所述收发器,用于接收来自第一数据节点关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址;向管理节点发送修改元数据的指令,所述修改元数据的指令用于所述管理节点修改所述元数据,以使得访问所述目标数据的目标操作请求路由到所述第二数据节点;接收并存储所述第一数据节点发送的所述目标数据;The transceiver is configured to receive a data migration request about target data from a first data node, where the data migration request includes a memory address of the target data in the first data node; and send modification metadata to a management node An instruction for modifying metadata is used for the management node to modify the metadata so that a target operation request for accessing the target data is routed to the second data node; receiving and storing the first data The target data sent by the node; 所述处理器,用于根据所述数据迁移请求建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的目标数据。The processor is configured to establish a remote direct memory access RDMA virtual memory space according to the data migration request, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space uses Access target data on the first data node at the second data node. 20.一种数据节点,其特征在于,包括:20. A data node, characterized in that, comprising: 收发器,存储器和总线,所述收发器和所述存储器通过所述总线连接;a transceiver, a memory and a bus, the transceiver and the memory are connected through the bus; 所述存储器,用于存储操作指令;The memory is used to store operation instructions; 所述收发器,用于向第二数据节点发送关于目标数据的数据迁移请求,所述数据迁移请求包含所述目标数据在所述第一数据节点中的内存地址,所述数据迁移请求用于所述第二数据节点建立远程直接内存访问RDMA虚拟内存空间,所述RDMA虚拟内存空间的内存地址对应所述第一数据节点的内存地址,所述RDMA虚拟内存空间用于所述第二数据节点访问所述第一数据节点上的数据。The transceiver is configured to send a data migration request about target data to a second data node, where the data migration request includes a memory address of the target data in the first data node, and the data migration request is used for The second data node establishes a remote direct memory access RDMA virtual memory space, the memory address of the RDMA virtual memory space corresponds to the memory address of the first data node, and the RDMA virtual memory space is used for the second data node Access data on the first data node. 21.一种计算机可读存储介质,其特征在于,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-9任意一项所述的方法。21. A computer-readable storage medium, characterized by comprising instructions, which, when run on a computer, cause the computer to execute the method according to any one of claims 1-9. 22.一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1-9任意一项所述的方法。22. A computer program product containing instructions, characterized in that when it is run on a computer, it causes the computer to execute the method according to any one of claims 1-9.
CN202210741489.2A 2017-06-26 2017-06-26 Data migration method and data node Pending CN115344551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210741489.2A CN115344551A (en) 2017-06-26 2017-06-26 Data migration method and data node

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210741489.2A CN115344551A (en) 2017-06-26 2017-06-26 Data migration method and data node
CN201710495228.6A CN109144972B (en) 2017-06-26 2017-06-26 Data migration method and data node

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710495228.6A Division CN109144972B (en) 2017-06-26 2017-06-26 Data migration method and data node

Publications (1)

Publication Number Publication Date
CN115344551A true CN115344551A (en) 2022-11-15

Family

ID=64804790

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710495228.6A Active CN109144972B (en) 2017-06-26 2017-06-26 Data migration method and data node
CN202210741489.2A Pending CN115344551A (en) 2017-06-26 2017-06-26 Data migration method and data node

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710495228.6A Active CN109144972B (en) 2017-06-26 2017-06-26 Data migration method and data node

Country Status (1)

Country Link
CN (2) CN109144972B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427270B (en) * 2019-08-09 2022-11-01 华东师范大学 Dynamic load balancing method for distributed connection operator in RDMA (remote direct memory Access) network
CN110716985B (en) * 2019-10-16 2022-09-09 北京小米移动软件有限公司 A node information processing method, device and medium
CN111274176B (en) * 2020-01-15 2022-04-22 联想(北京)有限公司 Information processing method, electronic equipment, system and storage medium
CN113742050B (en) * 2020-05-27 2023-03-03 华为技术有限公司 Method, apparatus, computing device and storage medium for manipulating data objects
CN114442907B (en) * 2020-11-04 2024-07-05 华为技术有限公司 Data migration method and device, server and network system
CN116346581A (en) * 2021-12-24 2023-06-27 华为技术有限公司 Communication method, device and system
CN114706714A (en) * 2022-04-19 2022-07-05 纳贤信息科技(深圳)有限公司 Method for synchronizing computer memory division snapshots
CN115277858B (en) * 2022-09-23 2022-12-20 太极计算机股份有限公司 A data processing method and system for big data

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120331243A1 (en) * 2011-06-24 2012-12-27 International Business Machines Corporation Remote Direct Memory Access ('RDMA') In A Parallel Computer
US20130083690A1 (en) * 2011-10-04 2013-04-04 International Business Machines Corporation Network Adapter Hardware State Migration Discovery in a Stateful Environment
US9354933B2 (en) * 2011-10-31 2016-05-31 Intel Corporation Remote direct memory access adapter state migration in a virtual environment
US9311122B2 (en) * 2012-03-26 2016-04-12 Oracle International Corporation System and method for providing a scalable signaling mechanism for virtual machine migration in a middleware machine environment
CN103763173B (en) * 2013-12-31 2017-08-25 华为技术有限公司 Data transmission method and calculate node
CN104270416B (en) * 2014-09-12 2018-03-13 杭州华为数字技术有限公司 Control method for equalizing load and management node
CN105518611B (en) * 2014-12-27 2019-10-25 华为技术有限公司 Remote direct data access method, device and system
US9904627B2 (en) * 2015-03-13 2018-02-27 International Business Machines Corporation Controller and method for migrating RDMA memory mappings of a virtual machine
CN106372013B (en) * 2015-07-24 2019-11-12 华为技术有限公司 Remote memory access method, device and system
CN106777225B (en) * 2016-12-26 2021-04-06 腾讯科技(深圳)有限公司 Data migration method and system

Also Published As

Publication number Publication date
CN109144972A (en) 2019-01-04
CN109144972B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN109144972B (en) Data migration method and data node
US9372726B2 (en) Gang migration of virtual machines using cluster-wide deduplication
EP3929756B1 (en) Method, system, and intelligent network interface card for migrating data
JP5458308B2 (en) Virtual computer system, virtual computer system monitoring method, and network device
CN103399778B (en) A kind of virtual machine online bulk migration method and apparatus
JP6607783B2 (en) Distributed cache cluster management
WO2018000993A1 (en) Distributed storage method and system
US10657108B2 (en) Parallel I/O read processing for use in clustered file systems having cache storage
US20170193416A1 (en) Reducing costs related to use of networks based on pricing heterogeneity
CN111338806B (en) Service control method and device
CN111158851B (en) Rapid deployment method of virtual machine
JP5988402B2 (en) Web content prefetch control apparatus, web content prefetch control program, and web content prefetch control method
WO2019153702A1 (en) Interrupt processing method, apparatus and server
CN114625474A (en) Container migration method and device, electronic equipment and storage medium
CN115878269A (en) Cluster migration method, related device and storage medium
WO2023231572A1 (en) Container creation method and apparatus, and storage medium
US11076027B1 (en) Network communications protocol selection based on network topology or performance
Wang et al. Grid-oriented storage: A single-image, cross-domain, high-bandwidth architecture
US8621260B1 (en) Site-level sub-cluster dependencies
CN106326143B (en) A cache allocation, data access, data transmission method, processor and system
CN104461779B (en) A kind of storage method of distributed data, apparatus and system
CN109947704A (en) A lock type switching method, device and cluster file system
US12056017B2 (en) Capacity-based redirection efficiency and resiliency
JP7342089B2 (en) Computer systems and computer system scale-out methods
CN114466412A (en) An edge cloud application perception method based on wireless ad hoc network environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination