CN111221857B - Method and apparatus for reading data records from a distributed system - Google Patents
Method and apparatus for reading data records from a distributed system Download PDFInfo
- Publication number
- CN111221857B CN111221857B CN201811323197.7A CN201811323197A CN111221857B CN 111221857 B CN111221857 B CN 111221857B CN 201811323197 A CN201811323197 A CN 201811323197A CN 111221857 B CN111221857 B CN 111221857B
- Authority
- CN
- China
- Prior art keywords
- partition
- parent
- server
- range
- read request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了一种从分布式系统中读数据记录的方法和装置,分布式系统包括分区服务器和客户端,分区服务器用于管理数据表中的父分区,分区服务器存储有第一分区范围检查器,第一分区范围检查器的分区范围为父分区的分区范围,该方法包括:分区服务器在拆分父分区的过程中,接收客户端发送的第一读请求,第一读请求用于请求从父分区中读取第一数据记录;分区服务器根据第一读请求从父分区中读取第一数据记录,读取第一数据记录的完成时刻晚于父分区的拆分完成时刻;分区服务器使用第一分区范围检查器检查第一数据记录的行键值所属的分区的分区范围,有利于提高读取数据记录的准确性。
This application provides a method and device for reading data records from a distributed system. The distributed system includes a partition server and a client. The partition server is used to manage the parent partitions in the data table. The partition server stores the first partition range check The partition range of the first partition range checker is the partition range of the parent partition. The method includes: the partition server receives the first read request sent by the client during the process of splitting the parent partition, and the first read request is used to request Read the first data record from the parent partition; the partition server reads the first data record from the parent partition according to the first read request, and the completion time of reading the first data record is later than the split completion time of the parent partition; the partition server Using the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs is beneficial to improve the accuracy of reading data records.
Description
技术领域technical field
本申请涉及信息技术领域,并且更具体地,涉及从分布式系统中读数据记录的方法和装置。The present application relates to the field of information technology, and more particularly, to a method and apparatus for reading data records from a distributed system.
背景技术Background technique
在分布式系统中,通常通过数据表的形式存储数据记录,例如,Hadoop分布式系统中的Hbase表。为了提高查询效率,数据表又可以被划分为多个分区(region),而多个分区被划分给多个分区服务器,每个分区服务器对各自的分区进行管理。在多个分区服务器中有的分区服务器处理访问请求(读请求或写请求)的频率较高,有的分区服务器处理访问请求的频率较低。这样,就会导致多个分区服务器之间负载不均衡的情况。In a distributed system, data records are usually stored in the form of a data table, for example, an Hbase table in a Hadoop distributed system. In order to improve query efficiency, the data table can be divided into multiple partitions (regions), and multiple partitions are divided into multiple partition servers, and each partition server manages its own partition. Among the multiple partition servers, some partition servers process access requests (read requests or write requests) more frequently, and some partition servers process access requests less frequently. In this way, it will lead to load imbalance among multiple partition servers.
业界通常是通过对分区进行拆分来均衡各分区服务器的负载。例如,分布式系统中负载较高的分区服务器为目标分区服务器,且目标分区服务器管理的分区中目标分区被访问的频率较高,那么,可以将目标分区作为待拆分的分区(即,父分区),再将父分区拆分为多个子分区,并将多个子分区中的部分子分区分散到分布式系统中负载较低的分区服务器上,以减轻目标分区服务器的负载。In the industry, the load of servers in each partition is usually balanced by splitting partitions. For example, if the partition server with higher load in the distributed system is the target partition server, and the target partition is accessed frequently in the partitions managed by the target partition server, then the target partition can be used as the partition to be split (that is, the parent partition), and then split the parent partition into multiple sub-partitions, and distribute some of the sub-partitions to the partition servers with lower load in the distributed system to reduce the load of the target partition server.
在分区拆分的过程中,需要修改父分区的分区元数据,以及父分区的分区范围检查器,以应用于新的子分区。然而,若在拆分父分区的过程中,接收到请求从父分区中读取第一数据记录的第一读请求后,且读第一数据记录的完成时间晚于父分区的拆分完成时间,此时,父分区的分区服务器已被修改,若使用修改后的分区服务器检查第一数据记录的分区范围,使得第一数据记录的不再是从父分区中读取的,而是从新生成的子分区中读取的,导致第一数据记录的准确性降低。During partition splitting, the partition metadata of the parent partition and the partition range checker of the parent partition need to be modified to apply to the new child partition. However, if during the process of splitting the parent partition, after receiving the first read request requesting to read the first data record from the parent partition, and the completion time of reading the first data record is later than the split completion time of the parent partition , at this time, the partition server of the parent partition has been modified. If the modified partition server is used to check the partition range of the first data record, the first data record is no longer read from the parent partition, but regenerated read in the subpartition, resulting in a reduction in the accuracy of the first data record.
发明内容Contents of the invention
本申请提供一种从分布式系统中读数据记录的方法和装置,以提高读取数据记录的准确性。The present application provides a method and device for reading data records from a distributed system, so as to improve the accuracy of reading data records.
第一方面,提供了一种从分布式系统中读数据记录的方法,所述分布式系统包括分区服务器和客户端,所述分区服务器用于管理数据表中的父分区,所述分区服务器存储有第一分区范围检查器,所述第一分区范围检查器的分区范围为所述父分区的分区范围,所述方法包括:所述分区服务器在拆分所述父分区的过程中,接收所述客户端发送的第一读请求,所述第一读请求用于请求从所述父分区中读取第一数据记录;所述分区服务器根据所述第一读请求从所述父分区中读取所述第一数据记录,读取所述第一数据记录的完成时刻晚于所述父分区的拆分完成时刻;所述分区服务器使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。In a first aspect, a method for reading data records from a distributed system is provided, the distributed system includes a partition server and a client, the partition server is used to manage parent partitions in a data table, and the partition server stores There is a first partition range checker, the partition range of the first partition range checker is the partition range of the parent partition, the method includes: the partition server receives the The first read request sent by the client, the first read request is used to request to read the first data record from the parent partition; the partition server reads from the parent partition according to the first read request Fetching the first data record, the completion time of reading the first data record is later than the split completion time of the parent partition; the partition server uses the first partition range checker to check the first data The partition range of the partition to which the row key value of the record belongs.
上述分区服务器使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围,可以理解为,若上述第一读请求请求从第一文件中的父分区中读取第一数据记录,分区服务器使用第一分区范围检查器检查读取的所述第一文件中的数据记录的行健值,以得到所述第一文件中的所述父分区中的数据记录,以便分区服务器从所述第一文件中的所述父分区的数据记录中读取所述第一数据记录。The above partition server uses the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs. It can be understood that if the above first read request requests from the parent partition in the first file Reading the first data record, the partition server uses the first partition range checker to check the row key value of the read data record in the first file to obtain the data in the parent partition in the first file record, so that the partition server reads the first data record from the data record of the parent partition in the first file.
在本申请实施例中,对第一读请求请求读取的第一数据记录采用第一分区范围检查器检查,避免了现有技术中,在父分区拆分成功后,第一分区范围检查器本修改为子分区的分区范围检查器,导致读取的数据记录的分区范围不是原本第一读请求希望读取的数据记录所在的分区范围,有利于提高读取数据记录的准确性。In the embodiment of the present application, the first data record read by the first read request is checked by the first partition range checker, which avoids that in the prior art, after the parent partition is successfully split, the first partition range checker This modification is the partition range checker for sub-partitions, which causes the partition range of the read data record to be different from the partition range where the data record expected to be read by the first read request is located, which is conducive to improving the accuracy of reading data records.
在一种可能的实现方式中,所述分区服务器存储有第二分区范围检查器,所述第二分区检查器的分区范围为第一目标子分区的分区范围,所述第一目标子分区为对所述父分区进行拆分得到的子分区,所述方法还包括:在所述父分区拆分完成后,所述分区服务器接收第二读请求,所述第二读请求用于请求从所述第一目标子分区中读取第二数据记录;所述分区服务器从所述第一目标子分区中读取所述第二数据记录;所述分区服务器使用所述第二分区范围检查器检查所述第二数据记录的行键值所属的分区的分区范围。In a possible implementation manner, the partition server stores a second partition range checker, the partition range of the second partition checker is the partition range of the first target sub-partition, and the first target sub-partition is The sub-partition obtained by splitting the parent partition, the method further includes: after the split of the parent partition is completed, the partition server receives a second read request, and the second read request is used to request reads a second data record from the first target subpartition; the partition server reads the second data record from the first target subpartition; the partition server checks using the second partition range checker The partition range of the partition to which the row key value of the second data record belongs.
在本申请实施例中,对于第二读请求请求读取的数据记录使用第二分区范围检查器进行检查,有利于提高读取数据记录的准确性。In the embodiment of the present application, the second partition range checker is used to check the data records read by the second read request, which is beneficial to improve the accuracy of reading data records.
在一种可能的实现方式中,在拆分所述父分区的过程中,所述父分区对应的时间戳的值为最大值,所述方法还包括:所述分区服务器获取所述第一读请求携带的时间戳;若所述第一读请求携带的时间戳小于所述父分区对应的时间戳,所述分区服务器选择所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。In a possible implementation manner, during the process of splitting the parent partition, the value of the timestamp corresponding to the parent partition is the maximum value, and the method further includes: the partition server obtains the first read The timestamp carried in the request; if the timestamp carried in the first read request is smaller than the timestamp corresponding to the parent partition, the partition server selects the first partition range checker to check the row key of the first data record The partition range of the partition to which the value belongs.
在本申请实施例中,通过父分区对应的时间戳与读请求携带的时间戳进行比较,以选择不同的分区范围检查器,有利于提高选择分区范围检查器的准确性。In the embodiment of the present application, the time stamp corresponding to the parent partition is compared with the time stamp carried in the read request to select a different partition range checker, which is beneficial to improve the accuracy of selecting a partition range checker.
在一种可能的实现方式中,在所述父分区拆分完成后,所述父分区对应的时间戳的值为所述父分区的拆分完成时刻,所述分区服务器存储有第三分区范围检查器,所述第三分区范围检查器的分区范围为第二目标子分区的分区范围,所述第二目标子分区为对所述父分区进行拆分得到的子分区,所述方法还包括:所述分区服务器接收第三读请求,所述第三读请求用于请求从所述第二目标子分区中读取第三数据记录,所述第二目标子分区为对所述父分区进行拆分得到的子分区;所述分区服务器获取所述第三读请求携带的时间戳;若所述第三读请求携带的时间戳大于所述父分区对应的时间戳,所述分区服务器使用所述第三分区范围检查器检查所述第三数据记录的行键值所属的分区的分区范围。In a possible implementation manner, after the split of the parent partition is completed, the value of the time stamp corresponding to the parent partition is the split completion time of the parent partition, and the partition server stores a third partition range Inspector, the partition range of the third partition range checker is the partition range of the second target sub-partition, and the second target sub-partition is a sub-partition obtained by splitting the parent partition, and the method further includes : the partition server receives a third read request, the third read request is used to request to read a third data record from the second target sub-partition, the second target sub-partition is to perform The sub-partition obtained by splitting; the partition server obtains the timestamp carried by the third read request; if the timestamp carried by the third read request is greater than the timestamp corresponding to the parent partition, the partition server uses the The third partition range checker checks the partition range of the partition to which the row key value of the third data record belongs.
在本申请实施例中,通过父分区对应的时间戳与读请求携带的时间戳进行比较,以选择不同的分区范围检查器,有利于提高选择分区范围检查器的准确性。In the embodiment of the present application, the time stamp corresponding to the parent partition is compared with the time stamp carried in the read request to select a different partition range checker, which is beneficial to improve the accuracy of selecting a partition range checker.
在一种可能的实现方式中,所述分区服务器存储有所述父分区的分区元数据记录的多个副本,所述方法还包括:在将所述父分区拆分为所述多个子分区之后,所述分区服务器将所述多个副本修改为所述多个子分区的分区元数据记录。In a possible implementation manner, the partition server stores multiple copies of the partition metadata records of the parent partition, and the method further includes: after splitting the parent partition into the multiple child partitions , the partition server modifies the multiple replicas into partition metadata records of the multiple sub-partitions.
在本申请实施例中,通过修改父分区的分区元数据的副本得到子分区的分区元数据,有利于简化生成子分区的分区元数据的生成过程。In the embodiment of the present application, the partition metadata of the child partition is obtained by modifying the copy of the partition metadata of the parent partition, which is beneficial to simplify the process of generating the partition metadata of the child partition.
第二方面,提供一种从分布式系统中读数据记录的分区服务器,所述分区服务器包括用于执行第一方面或第一方面任一种可能实现方式中的各个模块。In a second aspect, a partition server for reading data records from a distributed system is provided, and the partition server includes various modules for executing the first aspect or any possible implementation manner of the first aspect.
上述模块需要实现的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。The functions to be realized by the above modules can be realized by hardware, or can be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
第三方面,提供了一种分区服务器集群,分区服务器集群包括至少一个分区服务器,每个分区服务器包括处理器和存储器。存储器用于存储计算机程序,处理器用于从存储器中调用并运行该计算机程序,使得该分区服务器集群执行上述第一方面中的方法。In a third aspect, a partitioned server cluster is provided. The partitioned server cluster includes at least one partitioned server, and each partitioned server includes a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the cluster of partitioned servers executes the method in the first aspect above.
第四方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述各方面中的方法。According to a fourth aspect, a computer program product is provided, and the computer program product includes: computer program code, when the computer program code is run on a computer, it causes the computer to execute the methods in the above aspects.
需要说明的是,上述计算机程序代码可以全部或者部分存储在第一存储介质上,其中第一存储介质可以与处理器封装在一起的,也可以与处理器单独封装,本申请实施例对此不作具体限定。It should be noted that all or part of the above computer program code may be stored on the first storage medium, where the first storage medium may be packaged together with the processor, or may be packaged separately with the processor, and this embodiment of the present application does not make any Specific limits.
第五方面,提供了一种计算机可读介质,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述各方面中的方法。In a fifth aspect, a computer-readable medium is provided, the computer-readable medium stores program codes, and when the computer program codes are run on a computer, the computer is made to execute the methods in the above aspects.
附图说明Description of drawings
图1是本申请实施例适用的分布式系统100的架构的示意图。FIG. 1 is a schematic diagram of an architecture of a distributed system 100 applicable to an embodiment of the present application.
图2是分布式系统中数据表200的示意图。FIG. 2 is a schematic diagram of a data table 200 in a distributed system.
图3是本申请实施例的分区拆分的方法的流程图。FIG. 3 is a flowchart of a method for splitting partitions according to an embodiment of the present application.
图4是本申请另一实施例的从分布式系统中读数据记录的方法的流程图。Fig. 4 is a flowchart of a method for reading data records from a distributed system according to another embodiment of the present application.
图5是本申请另一实施例的从分布式系统中读数据记录的方法的流程图。Fig. 5 is a flowchart of a method for reading data records from a distributed system according to another embodiment of the present application.
图6是本申请实施例的写数据记录的分区服务器的示意性结构图。FIG. 6 is a schematic structural diagram of a partition server for writing data records according to an embodiment of the present application.
图7是本申请另一实施例的分区服务器集群的示意性框图。Fig. 7 is a schematic block diagram of a partitioned server cluster according to another embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below with reference to the accompanying drawings.
为了便于理解,先结合图1介绍本申请实施例适用的分布式系统的系统架构。图1所示的分布式存储系统100包括客户端110,控制服务器120,分区服务器130以及存储服务器140。For ease of understanding, first introduce the system architecture of the distributed system applicable to the embodiment of the present application with reference to FIG. 1 . The distributed storage system 100 shown in FIG. 1 includes a client 110 , a
存储服务器140,用于为数据记录提供共享存储空间,负责数据记录最终的持久化,所有分区服务器可以共享分布式存储系统中的数据记录。The storage server 140 is used to provide shared storage space for data records and is responsible for the final persistence of data records. All partition servers can share data records in the distributed storage system.
客户端(client)110,可以理解为访问分布式系统提供接口。客户端可以通过在缓存中存储分布式系统的信息,例如,数据表的分区位置信息等,以提高对数据记录的访问。The client (client) 110 can be understood as providing an interface for accessing the distributed system. The client can improve the access to the data records by storing the information of the distributed system in the cache, for example, the partition location information of the data table, etc.
控制服务器120,用于管理各个分区服务器,实时监控各分区服务器的状态,均衡各个分区服务器之间的负载。通常控制服务器中还存储有数据表的元数据记录、数据表中各分区的分区元数据、以及路由表。The
上述分区的分区元数据包括分区元数据描述的分区的分区范围,以及管理该分区的分区服务器。The partition metadata of the above partition includes the partition range of the partition described by the partition metadata and the partition server that manages the partition.
上述数据表的元数据用于描述数据表包含的行键的行键值范围,以及该数据表包含的分区的标识。The above metadata of the data table is used to describe the row key value range of the row key contained in the data table, and the identifier of the partition contained in the data table.
分区服务器(region server)130,用于维护已分配的分区,并处理请求从该分区中读数据记录的读请求。A region server (region server) 130 is configured to maintain the allocated partitions and process read requests for reading data records from the partitions.
在分布式系统中,数据记录可以以文件的形式存储,数据记录通常由一个或多个键值对组成(参见表1),每一条数据记录均包含行键(row key)用"_id"表示,键(key)用"key"表示,以及值,用"value"表示。这些数据记录从逻辑上可以以数据表的形式呈现。为了提高分布式系统中数据记录的访问性能,可以将数据表在水平方向上拆分为多个分区。例如,图2中数据表200可以划分为分区1,……,分区m,分区m+1,每个分区中包含数据表中一部分连续的行键,每个分区包含的连续的行键的值可以表示分区的分区范围。In a distributed system, data records can be stored in the form of files. Data records usually consist of one or more key-value pairs (see Table 1). Each data record contains a row key (row key) represented by "_id" , the key (key) is represented by "key", and the value is represented by "value". These data records can be logically presented in the form of data tables. In order to improve the access performance of data records in a distributed system, the data table can be split into multiple partitions in the horizontal direction. For example, the data table 200 in FIG. 2 can be divided into partition 1, ..., partition m, partition m+1, each partition contains a part of continuous row keys in the data table, and each partition contains the values of the continuous row keys A partition range that can represent a partition.
表1Table 1
在分布式系统中,每个分区有各自的分区范围。每个分区中的数据记录由分布式系统存储引擎进行组织和存储,同一分区下的数据记录可以分散在不同数据文件中,相应地,同一数据文件中可能包含数据表中不同分区的数据记录。当分区被拆分之后,该分区对应的分区范围也会变化,为了保证读请求请求读取的数据记录是该分区范围内的数据,可以在分区服务器中设置分区范围检查器(checker),检查读取的数据记录的行键值所属的分区范围。In a distributed system, each partition has its own partition scope. The data records in each partition are organized and stored by the distributed system storage engine. The data records under the same partition can be scattered in different data files. Correspondingly, the same data file may contain data records from different partitions in the data table. When a partition is split, the partition range corresponding to the partition will also change. In order to ensure that the data records read by the read request are within the range of the partition, a partition range checker (checker) can be set in the partition server to check The partition range to which the row key value of the read data record belongs.
例如,分区服务器上管理了数据表中的分区A和分区B,分区A的分区范围是[1-100],分区B的分区范围是[101-200],且分区A的数据记录散落在数据文件F1,F2,F3中,同时,分区B数据记录也散落在数据文件F1,F2,F3中。此时,若读请求请求读取分区A中的数据记录,存储引擎会依次从文件F1,F2,F3中读取数据记录,但是,如上文所述,数据文件F1,F2,F3中同时存储有属于分区B的数据记录,则存储引擎在读取数据文件F1,F2,F3的同时会将属于分区B的数据记录一并读出,此时,可以使用分区A的分区范围检查器,排除属于分区B的数据记录。For example, partition A and partition B in the data table are managed on the partition server, the partition range of partition A is [1-100], the partition range of partition B is [101-200], and the data records of partition A are scattered in the data In the files F1, F2, and F3, at the same time, the data records of the partition B are also scattered in the data files F1, F2, and F3. At this time, if the read request requests to read the data records in partition A, the storage engine will read the data records from the files F1, F2, and F3 in turn. However, as mentioned above, the data files F1, F2, and F3 are stored If there are data records belonging to partition B, the storage engine will read the data records belonging to partition B while reading data files F1, F2, and F3. At this time, you can use the partition range checker of partition A to exclude Data records belonging to Partition B.
综合考虑分布式系统的负载和性能,可能需要在现有的分区的基础上,进一步对分区进行拆分,以生成更多的分区。例如,可以通过分区拆分提高读写的并发性能。又例如,可以通过分区拆分的机制均衡分布式系统中不同分区服务的负载,即,将负载过重的目标分区服务器中的父分区拆分为多个子分区,并将子分区交给其他的分区服务器管理,以减轻目标分区服务器的负载。Considering the load and performance of the distributed system, it may be necessary to further split the partition on the basis of the existing partition to generate more partitions. For example, the concurrent performance of reading and writing can be improved through partition splitting. For another example, the load of different partition services in the distributed system can be balanced through the mechanism of partition splitting, that is, the parent partition in the target partition server with heavy load is split into multiple sub-partitions, and the sub-partitions are handed over to other partitions. Partition server management to offload the target partition server.
下文结合图1所示的分布式系统的架构,结合图3介绍分区拆分的方法。图3是本申请实施例的分区拆分的方法的流程图。图3包括步骤310至步骤380。The following describes the method of partition splitting in combination with the architecture of the distributed system shown in FIG. 1 and in combination with FIG. 3 . FIG. 3 is a flowchart of a method for splitting partitions according to an embodiment of the present application. FIG. 3 includes
310,控制服务器根据各个分区服务器的负载情况,以及该分区服务器管理的各个分区中数据记录的被访问频率,选择需要拆分的分区作为父分区。310. The control server selects the partition to be split as the parent partition according to the load of each partition server and the access frequency of data records in each partition managed by the partition server.
320,控制服务器向分布式系统申请用于存储子分区的存储空间。320. The control server applies to the distributed system for storage space for storing sub-partitions.
330,控制服务器向父分区所在的目标分区服务器发送分区拆分(region split)消息。分区拆分消息中携带父分区的标识(ID),为子分区申请的存储空间,父分区的分区元数据等信息。330. The control server sends a region split (region split) message to the target partition server where the parent partition is located. The partition split message carries the identification (ID) of the parent partition, the storage space applied for for the child partition, the partition metadata of the parent partition and other information.
340,目标分区服务器关闭父分区的分区业务,即将父分区标记为下线。此时,如果目标分区服务器收到请求从父分区中读取数据记录的读请求,目标分区服务器会向客户端反馈异常告知父分区不是服务分区,相应地,客户端后续可以进行补偿性重试。340. The target partition server closes the partition service of the parent partition, that is, marks the parent partition as offline. At this time, if the target partition server receives a read request to read data records from the parent partition, the target partition server will report an exception to the client and inform the parent partition that it is not a service partition. Correspondingly, the client can perform compensatory retry later .
350,目标分区服务器根据父分区的分区元数据,以及预设的分区拆分策略将父分区拆分为两个子分区,并将两个子分区中的第一子分区的分区元数据存储到上述存储空间中,第二子分区的分区元数据可以占用父分区的分区元数据的存储空间。350. The target partition server splits the parent partition into two sub-partitions according to the partition metadata of the parent partition and the preset partition splitting strategy, and stores the partition metadata of the first sub-partition of the two sub-partitions to the above storage In the space, the partition metadata of the second child partition may occupy the storage space of the partition metadata of the parent partition.
360,目标分区服务器为两个子分区配置分区范围检查器。360, the target partition server configures partition range checkers for both child partitions.
具体地,目标分区服务器将父分区拆分完成之后,会将父分区的分区范围检查器的分区范围修改为第一子分区的分区范围,作为第一子分区的分区范围检查器。相应地,再生成一个分区范围检查器作为第二子分区的分区范围检查器。Specifically, after the target partition server splits the parent partition, it will modify the partition range of the partition range checker of the parent partition to the partition range of the first sub-partition as the partition range checker of the first sub-partition. Correspondingly, a partition range checker is regenerated as the partition range checker for the second sub-partition.
370,目标分区服务器向控制服务器发送分区拆分成功消息。370. The target partition server sends a partition split success message to the control server.
380,控制服务器修改路由表的分区信息、分区的元数据、数据表的元数据等信息,以记录上述子分区的分区信息。380. The control server modifies information such as the partition information of the routing table, the metadata of the partition, and the metadata of the data table, so as to record the partition information of the sub-partition.
从图3所示的分区拆分过程中可以看出,分区服务器在对父分区进行拆分之前会先将父分区下线,即分区服务器不再处理请求从父分区中读取数据记录的读请求,会影响分布式分布式系统的整体性能,降低客户端读取数据的成功率。From the partition splitting process shown in Figure 3, it can be seen that the partition server will take the parent partition offline before splitting the parent partition, that is, the partition server will no longer process the read requests to read data records from the parent partition. Requests will affect the overall performance of the distributed distributed system and reduce the success rate of clients reading data.
为了提高分布式系统的整体性能,提高客户端读取数据的成功率,本申请还提供了一种读取数据记录的方法,下文基于图1所示的分布式系统结合图4详细介绍。In order to improve the overall performance of the distributed system and improve the success rate of the client to read data, the present application also provides a method for reading data records, which will be introduced in detail below based on the distributed system shown in FIG. 1 and in conjunction with FIG. 4 .
图4是本申请另一实施例的从分布式系统中读数据记录的方法的流程图。图4所示的方法包括步骤410和步骤420。Fig. 4 is a flowchart of a method for reading data records from a distributed system according to another embodiment of the present application. The method shown in FIG. 4 includes
410,目标分区服务器在对父分区进行拆分的过程中,接收客户端发送的第一读请求,第一读请求用于请求从所述父分区中读取第一数据记录。410. During the process of splitting the parent partition, the target partition server receives the first read request sent by the client, where the first read request is used to request to read the first data record from the parent partition.
也就是说,无论是在拆分父分区的过程中,还是准备拆分父分区,父分区都处于在线状态。That is to say, whether it is in the process of splitting the parent partition or preparing to split the parent partition, the parent partition is always online.
上述目标分区服务器在对父分区进行拆分的过程中,可以包括准备对父分区进行拆分,以及拆分父分区的过程。即从目标分区服务器接收到针对父分区的拆分请求直到父分区拆分完成之前的时间段。The process of splitting the parent partition by the target partition server may include the process of preparing to split the parent partition and splitting the parent partition. That is, the time period from when the target partition server receives a split request for the parent partition until the split of the parent partition is completed.
420,目标分区服务器根据所述第一读请求从所述父分区中读取所述第一数据记录。420. The target partition server reads the first data record from the parent partition according to the first read request.
在本申请实施例中,分区服务器在拆分父分区的过程中,还可以从父分区中读取第一读请求请求读取的数据记录,避免了传统的分区拆分过程中,在拆分之前需要先将父分区下线,无法处理针对父分区中的数据记录的第一读请求,有利于提高分布式系统的整体性能,提高客户端读取数据的成功率。In the embodiment of the present application, during the process of splitting the parent partition, the partition server can also read the data records requested by the first read request from the parent partition, avoiding the traditional partition splitting process. Previously, the parent partition had to be offline first, and the first read request for the data records in the parent partition could not be processed, which was conducive to improving the overall performance of the distributed system and improving the success rate of clients reading data.
目标分区服务器在处理读请求的过程中还存在一种情况,即目标分区服务器读取第一数据记录的完成时刻可能晚于父分区的拆分完成时刻。例如,第一读请求请求读取的第一数据记录的数据量较大,目标分区服务器读取完全部的数据记录时,父分区已经被拆分为多个子分区。此时,由于父分区的分区范围检查器已被修改为子分区的分区范围检查器,如果使用子分区的分区范围检查器检查第一数据记录的行键所属的分区时,会使得第一数据记录中,不属于子分区的分区范围检查器的分区范围的一部分数据记录无法被读出,降低从分布式系统中读取数据记录的准确性。There is also a situation in the process of processing the read request by the target partition server, that is, the completion time of reading the first data record by the target partition server may be later than the split completion time of the parent partition. For example, the first data record requested by the first read request has a large amount of data, and when the target partition server has read all the data records, the parent partition has been split into multiple sub-partitions. At this time, since the partition range checker of the parent partition has been modified to the partition range checker of the child partition, if the partition range checker of the child partition is used to check the partition to which the row key of the first data record belongs, the first data record In the records, a part of the data records that do not belong to the partition range of the partition range checker of the sub-partition cannot be read, which reduces the accuracy of reading data records from the distributed system.
例如,父分区的分区范围以及父分区的分区范围检查器的分区范围为[1,100],第一子分区的分区范围以及第一子分区的分区范围检查器的分区范围为[1,50],第二子分区的分区范围以及第二子分区的分区范围检查器的分区范围为[51,100],第一读请求请求读取的第一数据记录的行键范围为[2,90]。当目标分区服务器在读取第一数据记录的过程中,父分区的分区范围检查器被修改为第一子分区的分区范围检查器后,只能使用第一子分区的分区范围检查器检查第一数据记录,此时,会将第一数据记录中行键值不属于分区范围[1,50]的数据记录作为错误的数据记录,不返回给客户端,降低查询数据记录的准确性。For example, the partition range of the parent partition and the partition range of the parent partition's partition range checker is [1,100], the partition range of the first child partition and the partition range of the first child partition's partition range checker is [1,50], The partition range of the second sub-partition and the partition range of the partition range checker of the second sub-partition are [51,100], and the row key range of the first data record read by the first read request is [2,90]. When the target partition server is in the process of reading the first data record, after the partition range checker of the parent partition is modified to the partition range checker of the first sub-partition, only the partition range checker of the first sub-partition can be used to check the first sub-partition. A data record, at this time, the data record whose row key value does not belong to the partition range [1,50] in the first data record will be regarded as an error data record, and will not be returned to the client, reducing the accuracy of querying data records.
为了避免上述情况,可以在目标分区服务器中保留父分区的分区范围检查器,以用于检查上述第一数据记录的行键值是否属于父分区。In order to avoid the above situation, the partition range checker of the parent partition may be reserved in the target partition server to check whether the row key value of the above first data record belongs to the parent partition.
即,所述分区服务处理完成所述第一读请求的时间晚于所述父分区的拆分完成时刻,所述目标分区服务器存储有第一分区范围检查器,所述第一分区范围检查器的分区范围为所述父分区的分区范围,所述方法还包括:That is, the time when the partition service completes the first read request is later than the split completion time of the parent partition, and the target partition server stores a first partition range checker, and the first partition range checker The partition range is the partition range of the parent partition, and the method further includes:
430,在所述父分区拆分完成后,所述目标分区服务器使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。430. After the parent partition is split, the target partition server uses the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs.
上述分区服务器使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围,可以理解为,若上述第一读请求请求从第一文件中的父分区中读取第一数据记录,分区服务器使用第一分区范围检查器检查读取的所述第一文件中的数据记录的行健值,以得到所述第一文件中的所述父分区中的数据记录,以便分区服务器从所述第一文件中的所述父分区的数据记录中读取所述第一数据记录。The above partition server uses the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs. It can be understood that if the above first read request requests from the parent partition in the first file Reading the first data record, the partition server uses the first partition range checker to check the row key value of the read data record in the first file to obtain the data in the parent partition in the first file record, so that the partition server reads the first data record from the data record of the parent partition in the first file.
需要说明的是,上述第一分区范围检查器(即父分区的分区范围检查器)可以一直存储在目标分区服务器中。但是,为了节省分区服务器的资源,可以在目标分区服务器处理完第一读请求之后,将第一分区范围检查器删除。It should be noted that the above-mentioned first partition range checker (that is, the partition range checker of the parent partition) may always be stored in the target partition server. However, in order to save resources of the partition server, the first partition range checker may be deleted after the target partition server finishes processing the first read request.
也就是说,在本申请实施例中,可以将读请求与分区范围检查器关联,如此,与第一读请求类型相同,即在父分区拆分过程中接收的,且处理时间晚于父分区的拆分完成时刻的读请求可以与第一分区范围检查器关联。That is to say, in this embodiment of the application, the read request can be associated with the partition range checker, so that it is of the same type as the first read request, that is, it is received during the splitting process of the parent partition, and the processing time is later than that of the parent partition The read request at the time of the split completion of can be associated with the first partition range checker.
例如,父分区的分区范围以及父分区的分区范围检查器的分区范围为[1,100],第一子分区的分区范围以及第一子分区的分区范围检查器的分区范围为[1,50],第二子分区的分区范围以及第二子分区的分区范围检查器的分区范围为[51,100],第一读请求请求读取的第一数据记录的行键范围为[2,90]。在目标分区服务器读取第一数据记录完成之前,第一分区范围检查器会被保存在目标分区服务器中,以便第一数据记录都可以使用第一分区范围检查器检查。For example, the partition range of the parent partition and the partition range of the parent partition's partition range checker is [1,100], the partition range of the first child partition and the partition range of the first child partition's partition range checker is [1,50], The partition range of the second sub-partition and the partition range of the partition range checker of the second sub-partition are [51,100], and the row key range of the first data record read by the first read request is [2,90]. Before the target partition server finishes reading the first data record, the first partition range checker will be saved in the target partition server, so that the first data record can be checked by the first partition range checker.
相应地,对于在父分区拆分完成后,目标分区服务器接收到的读请求可以与子分区的分区范围检查器相关联。Correspondingly, after the split of the parent partition is completed, the read request received by the target partition server can be associated with the partition range checker of the child partition.
即,目标分区服务器存储有第二分区范围检查器,第二分区检查器的分区范围为第一目标子分区的分区范围,所述第一目标子分区为对所述父分区进行拆分得到的子分区,所述方法还包括:在所述父分区拆分完成后,所述目标分区服务器接收第二读请求,所述第二读请求用于请求从所述第一目标子分区中读取第二数据记录;所述目标分区服务器从所述第一目标子分区中读取所述第二数据记录;所述目标分区服务器使用所述第二分区范围检查器检查所述第二数据记录的行键值所属的分区的分区范围。That is, the target partition server stores a second partition range checker, the partition range of the second partition checker is the partition range of the first target sub-partition, and the first target sub-partition is obtained by splitting the parent partition child partition, the method further includes: after the parent partition is split, the target partition server receives a second read request, and the second read request is used to request to read from the first target child partition second data record; the target partition server reads the second data record from the first target sub-partition; the target partition server checks the second data record with the second partition range checker The partition range of the partition to which the row key value belongs.
上述第一目标子分区可以是基于父分区拆分得到的多个子分区中的任意一个子分区。上述第一目标子分区还可以是多个子分区中需要目标分区服务器需要管理的子分区,相应地,出于负载均衡的原因,多个子分区中的其他子分区可以分配给其它的分区服务器管理。The above-mentioned first target sub-partition may be any one of multiple sub-partitions obtained by splitting the parent partition. The above-mentioned first target sub-partition may also be a sub-partition that needs to be managed by the target partition server among the multiple sub-partitions. Correspondingly, for load balancing reasons, other sub-partitions among the multiple sub-partitions may be assigned to other partition servers for management.
为了简化子分区的分区范围检查器(第二分区范围检查器)的生成过程,可以先对第一分区范围检查器进行快照,得到第一分区范围检查器的副本,通过修改第一分区范围检查器的副本得到第二分区范围检查器。当然,也可以通过修改第一分区范围检查器得到第二分区范围检查器。In order to simplify the generation process of the subpartition range checker (second partition range checker), you can first take a snapshot of the first partition range checker to get a copy of the first partition range checker, by modifying the first partition range checker A copy of the checker gets the second partition range checker. Of course, the second partition range checker can also be obtained by modifying the first partition range checker.
需要说明的是,若在父分区拆分完成后,目标分区服务器再接收到请求从父分区中读取数据记录的读请求时,可以直接向客户端返回分区版本错误的消息,以通知客户端更新分区版本。It should be noted that if the target partition server receives a read request to read data records from the parent partition after the parent partition is split, it can directly return a partition version error message to the client to notify the client Update partition version.
目标分区服务器可以通过多种方式判断读请求的接收时刻与父分区的分区拆分完成时刻之间的关系。例如,可以通过预设时间段的方式判断。即以父分区的分区拆分起始时刻为上述预设时间段的起始时刻,在预设时间段内接收到的读请求,即可以与第一分区范围检查器关联的读请求。在预设时间段之后接收到的读请求,即可以与第二分区范围检查器关联的读请求。又例如,还可以通过时间戳的方式判断,具体的判断方式在下文中介绍。The target partition server can determine the relationship between the time of receiving the read request and the time of completion of partition splitting of the parent partition in various ways. For example, it can be judged by means of a preset time period. That is, the start time of partition splitting of the parent partition is taken as the start time of the preset time period, and the read requests received within the preset time period are the read requests that can be associated with the first partition range checker. Read requests received after a preset time period can be associated with the second partition range checker. For another example, it can also be judged by means of time stamp, and the specific judgment method will be introduced below.
在对父分区进行分区拆分的过程中,目标分区服务器的父分区对应的时间戳的值为最大值,所述方法还包括:所述目标分区服务器根据所述第一读请求携带的时间戳以及所述父分区对应的时间戳,确定所述第一读请求携带的时间戳小于所述父分区对应的时间戳;所述目标分区服务器使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。In the process of splitting the parent partition, the value of the timestamp corresponding to the parent partition of the target partition server is the maximum value, and the method further includes: the target partition server according to the timestamp carried by the first read request and the timestamp corresponding to the parent partition, determining that the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition; the target partition server uses the first partition range checker to check the first The partition range of the partition to which the row key value of the data record belongs.
需要说明的是,目标分区服务器可以为父分区维护一个时间戳(即父分区对应的时间戳),每个这对从父分区的读请求读取的数据记录都需要与父分区的时间戳进行比较,以确定与读请求关联的分区范围检查器。It should be noted that the target partition server can maintain a timestamp for the parent partition (that is, the timestamp corresponding to the parent partition), and each pair of data records read from the read request of the parent partition needs to be compared with the timestamp of the parent partition. Compare to determine the partition range checker associated with the read request.
在所述父分区拆分完成后,所述父分区对应的时间戳的值为所述父分区的拆分完成时刻,所述目标分区服务器存储有第三分区检查器的分区范围为第二目标子分区的分区范围,所述第二目标子分区为对所述父分区进行拆分得到的子分区,所述方法还包括:所述目标分区服务器接收第三读请求,所述第三读请求用于请求从所述第二目标子分区中读取第三数据记录,所述第二目标子分区为对所述父分区进行拆分得到的子分区;所述目标分区服务器确定所述第三读请求携带的时间戳大于所述父分区对应的时间戳;所述目标分区服务器使用所述第三分区范围检查器检查所述第三数据记录的行键值所属的分区的分区范围。After the split of the parent partition is completed, the value of the time stamp corresponding to the parent partition is the split completion time of the parent partition, and the target partition server stores the partition range of the third partition checker as the second target The partition range of the sub-partition, the second target sub-partition is a sub-partition obtained by splitting the parent partition, and the method further includes: the target partition server receives a third read request, and the third read request It is used to request to read a third data record from the second target sub-partition, where the second target sub-partition is a sub-partition obtained by splitting the parent partition; the target partition server determines that the third The time stamp carried in the read request is greater than the time stamp corresponding to the parent partition; the target partition server uses the third partition range checker to check the partition range of the partition to which the row key value of the third data record belongs.
在分区拆分的过程中,目标分区服务器需要生成子分区的分区元数据,为了简化生成分区元数据的生成过程,可以先对父分区的分区元数据进行快照,生成多个父分区的分区元数据的副本,这样,可以通过修改父分区的分区元数据以生成子分区的分区元数据。即所述目标分区服务器存储有所述父分区的分区元数据记录的多个副本,所述方法还包括:在将所述父分区拆分为所述多个子分区之后,所述目标分区服务器将所述多个副本修改为所述多个子分区的分区元数据记录。In the process of partition splitting, the target partition server needs to generate partition metadata of child partitions. In order to simplify the process of generating partition metadata, you can first take a snapshot of the partition metadata of the parent partition to generate partition metadata of multiple parent partitions. A copy of the data, such that the partition metadata of the parent partition can be modified to generate the partition metadata of the child partition. That is, the target partition server stores multiple copies of the partition metadata records of the parent partition, and the method further includes: after splitting the parent partition into the multiple sub-partitions, the target partition server will The multiple replicas are modified as partition metadata records for the multiple sub-partitions.
上述多个副本包括父分区的原始的分区元数据,上述多个副本的数量可以与多个子分区的数量相等,以便生成每个子分区的分区元数据。The multiple copies include the original partition metadata of the parent partition, and the number of the multiple copies may be equal to the number of the multiple sub-partitions, so as to generate the partition metadata of each sub-partition.
例如,若将父分区拆分为第一子分区和第二子分区,第一子分区的分区范围是父分区的起始行键(start rowkey)至拆分点对应的行键,第二子分区的分区范围是拆分点对应的行键至父分区的结束行键(end rowkey)。那么可以通过快照的方式再生成一个父分区的分区元数据,然后通过删除父分区的原始的分区元数据中第二子分区的分区元数据,即用于描述数据表中拆分点对应的行键至结束行键的分区的元数据,作为第一子分区的分区元数据,通过删除父分区的分区元数据的副本中第一子分区的分区元数据,即用于描述数据表中起始行键至拆分点对应的行键的分区的元数据,作为第二子分区的分区元数据。For example, if the parent partition is split into the first sub-partition and the second sub-partition, the partition range of the first sub-partition is from the start row key (start rowkey) of the parent partition to the row key corresponding to the split point, and the second sub-partition The partition range of a partition is from the row key corresponding to the split point to the end row key (end rowkey) of the parent partition. Then, the partition metadata of a parent partition can be regenerated by means of a snapshot, and then the partition metadata of the second child partition in the original partition metadata of the parent partition can be deleted, which is used to describe the row corresponding to the split point in the data table The metadata of the partition from the key to the end row key, as the partition metadata of the first child partition, is used to describe the start of the data table by deleting the partition metadata of the first child partition in the copy of the parent partition's partition metadata The metadata of the partition from the row key to the row key corresponding to the split point is used as the partition metadata of the second sub-partition.
需要说明的是,上述对父分区的分区元数据进行快照与上文中介绍的对父分区的分区范围进行快照可以同时执行,也可以不同时执行,本申请实施例对此不作限定。It should be noted that the snapshot of the partition metadata of the parent partition and the snapshot of the partition range of the parent partition described above may or may not be executed at the same time, which is not limited in this embodiment of the present application.
为了便于理解本申请,下文结合具体的例子介绍本申请实施例的方法。应理解,下文的例子仅仅为了说明本申请实施例的方法,并不会对本申请实施例的范围造成限定。In order to facilitate the understanding of the present application, the method in the embodiment of the present application is introduced below in combination with specific examples. It should be understood that the following examples are only for illustrating the methods of the embodiments of the present application, and do not limit the scope of the embodiments of the present application.
图5是本申请另一实施例的从分布式系统中读数据记录的方法的流程图。图5所示的方法包括步骤510至,需要说明的是,分区拆分的准备工作可以参见图3中的介绍,下文重点介绍分区服务器执行的分区拆分过程,以及拆分分区过程中分区服务器处理读请求的方法流程。Fig. 5 is a flowchart of a method for reading data records from a distributed system according to another embodiment of the present application. The method shown in FIG. 5 includes
假设待拆分的父分区为分区A,且管理分区A的分区服务器为目标分区服务器。Assume that the parent partition to be split is partition A, and the partition server managing partition A is the target partition server.
510,目标分区服务器执行分区拆分前的准备工作,即包括将父分区对应的时间戳调整为最大值,对分区A的分区元数据以及分区A的分区范围检查器进行快照,得到分区A的分区元数据以及分区A的分区范围检查器的副本。510. The target partition server performs preparatory work before splitting the partition, including adjusting the timestamp corresponding to the parent partition to the maximum value, taking a snapshot of the partition metadata of partition A and the partition range checker of partition A, and obtaining the partition A's Partition metadata and a copy of the partition range checker for partition A.
520,目标分区服务器接收第一读请求,第一读请求用于请求从分区A中读取数据记录。520. The target partition server receives a first read request, where the first read request is used to request to read data records from partition A.
530,目标分区服务器判断第一读请求携带的时间戳,与父分区对应的时间戳之间的大小关系。530. The target partition server judges the relationship between the timestamp carried in the first read request and the timestamp corresponding to the parent partition.
由于在拆分过程中,父分区对应的时间戳的值为最大值,那么在拆分过程中接收到的第一读请求携带的时间戳的值小于父分区对应的时间戳的值。Since the value of the timestamp corresponding to the parent partition is the maximum value during the split process, the value of the timestamp carried in the first read request received during the split process is smaller than the value of the timestamp corresponding to the parent partition.
540,目标分区服务器将分区A的分区范围检查器与第一读请求关联。即使用分区A的分区范围检查器检查第一读请求请求读取的数据记录的行键所属的分区。540. The target partition server associates the partition range checker of partition A with the first read request. That is, the partition range checker of partition A is used to check the partition to which the row key of the data record read by the first read request belongs.
550,目标分区服务器将分区A拆分为子分区A’以及子分区B。550. The target partition server splits partition A into subpartition A' and subpartition B.
具体地,目标分区服务器将分区A的分区元数据修改为子分区A’的分区元数据,将分区A的分区元数据的副本修改子分区B的分区元数据。目标分区服务器将分区A的分区范围检查器修改为子分区A’的分区范围检查器,将分区A的分区范围检查器的副本修改子分区B的分区范围检查器。Specifically, the target partition server modifies the partition metadata of partition A to the partition metadata of sub-partition A', and modifies the copy of the partition metadata of partition A to the partition metadata of sub-partition B. The target partition server modifies the partition range checker of partition A to the partition range checker of subpartition A', and modifies the copy of partition A's partition range checker of subpartition B's partition range checker.
另外,若目标分区服务器检测到第一读请求请求读取的数据还未处理完成,目标分区服务器还会保留分区A的分区范围检查器,以检查第一读请求请求读出的数据记录的行键所属的分区。In addition, if the target partition server detects that the data read by the first read request has not been processed, the target partition server will also retain the partition range checker of partition A to check the rows of the data records read by the first read request. The partition to which the key belongs.
560,目标分区服务器将分区A对应的时间戳调整为分区A的拆分完成时刻。560. The target partition server adjusts the time stamp corresponding to partition A to the splitting completion time of partition A.
570,目标分区服务器接收第二读请求。570. The target partition server receives the second read request.
需要说明的是,若第二读请求携带的分区版本号与目标分区服务器为分区A维护的分区版本号不同,说明发送第二读请求的客户端存储的分区版本过期,此时,目标分区服务器向客户端返回分区版本错误,以通知客户端更新分区版本。若第二读请求携带的分区版本号与目标分区服务器为分区A维护的分区版本号相同,且第二读请求请求从子分区A’中读取数据记录,则执行步骤580。It should be noted that if the partition version number carried in the second read request is different from the partition version number maintained by the target partition server for partition A, it means that the partition version stored by the client sending the second read request is expired. At this time, the target partition server Return a partition version error to the client to notify the client to update the partition version. If the partition version number carried in the second read request is the same as the partition version number maintained by the target partition server for partition A, and the second read request requests to read data records from sub-partition A', then perform
580,目标分区服务器判断第二读请求携带的时间戳,与分区A对应的时间戳之间的大小关系。580. The target partition server judges the relationship between the timestamp carried in the second read request and the timestamp corresponding to partition A.
第二读请求携带的时间戳大于分区A对应的时间戳,说明第二读请求是在分区A拆分完成之后发送的,此时,若第二读请求如果请求读取的数据记录属于子分区A’中的数据,那么执行步骤590。The timestamp carried by the second read request is greater than the timestamp corresponding to partition A, indicating that the second read request is sent after partition A is split. A', then go to step 590.
590,目标分区服务器处理第二读请求。590. The target partition server processes the second read request.
上文结合图1至图5详细地描述了本申请实施例的方法,下文结合图6至图7详细地描述本申请实施例的从分布式系统中读数据记录的装置。需要说明的是,图6至图7所示的装置可以实现上述方法中各个步骤,为了简洁,在此不再赘述。The method of the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 5 , and the apparatus for reading data records from a distributed system according to the embodiment of the present application is described in detail below in conjunction with FIG. 6 to FIG. 7 . It should be noted that the devices shown in FIG. 6 to FIG. 7 can implement each step in the above method, and for the sake of brevity, details are not repeated here.
图6是本申请实施例的写数据记录的分区服务器的示意性结构图,图6所示的分区服务器600包括:接收模块610和处理模块620。所述分区服务器用于管理数据表中的父分区,所述分区服务器存储有第一分区范围检查器,所述第一分区范围检查器的分区范围为所述父分区的分区范围,FIG. 6 is a schematic structural diagram of a partition server for writing data records according to an embodiment of the present application. The partition server 600 shown in FIG. 6 includes: a receiving module 610 and a processing module 620 . The partition server is used to manage the parent partition in the data table, the partition server stores a first partition range checker, the partition range of the first partition range checker is the partition range of the parent partition,
所述分区服务器包括:The partition servers include:
接收模块610,用于在拆分所述父分区的过程中,接收所述客户端发送的第一读请求,所述第一读请求用于请求从所述父分区中读取第一数据记录;The receiving module 610 is configured to receive a first read request sent by the client during the process of splitting the parent partition, where the first read request is used to request to read a first data record from the parent partition ;
处理模块620,用于根据所述第一读请求从所述父分区中读取所述第一数据记录,读取所述第一数据记录的完成时刻晚于所述父分区的拆分完成时刻;The processing module 620 is configured to read the first data record from the parent partition according to the first read request, and the completion time of reading the first data record is later than the split completion time of the parent partition ;
所述处理模块620,还用于使用所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。The processing module 620 is further configured to use the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs.
可选地,作为一个实施例,所述分区服务器存储有第二分区范围检查器,所述第二分区检查器的分区范围为第一目标子分区的分区范围,所述第一目标子分区为对所述父分区进行拆分得到的子分区,所述接收模块610,还用于在所述父分区拆分完成后,所述分区服务器接收第二读请求,所述第二读请求用于请求从所述第一目标子分区中读取第二数据记录;所述处理模块620,还用于从所述第一目标子分区中读取所述第二数据记录;所述处理模块620,还用于使用所述第二分区范围检查器检查所述第二数据记录的行键值所属的分区的分区范围。Optionally, as an embodiment, the partition server stores a second partition range checker, the partition range of the second partition checker is the partition range of the first target sub-partition, and the first target sub-partition is For the child partition obtained by splitting the parent partition, the receiving module 610 is further configured to, after the split of the parent partition is completed, the partition server receive a second read request, and the second read request is used for Request to read a second data record from the first target sub-partition; the processing module 620 is further configured to read the second data record from the first target sub-partition; the processing module 620, It is also used to check the partition range of the partition to which the row key value of the second data record belongs by using the second partition range checker.
可选地,作为一个实施例,在拆分所述父分区的过程中,所述父分区对应的时间戳的值为最大值,所述处理模块620还用于:获取所述第一读请求携带的时间戳;若所述第一读请求携带的时间戳小于所述父分区对应的时间戳,选择所述第一分区范围检查器检查所述第一数据记录的行键值所属的分区的分区范围。Optionally, as an embodiment, during the process of splitting the parent partition, the value of the timestamp corresponding to the parent partition is the maximum value, and the processing module 620 is further configured to: obtain the first read request The timestamp carried; if the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition, select the first partition range checker to check the partition to which the row key value of the first data record belongs partition range.
可选地,作为一个实施例,在所述父分区拆分完成后,所述父分区对应的时间戳的值为所述父分区的拆分完成时刻,所述分区服务器存储有第三分区范围检查器,所述第三分区范围检查器的分区范围为第二目标子分区的分区范围,所述第二目标子分区为对所述父分区进行拆分得到的子分区,所述接收模块610,还用于接收第三读请求,所述第三读请求用于请求从所述第二目标子分区中读取第三数据记录,所述第二目标子分区为对所述父分区进行拆分得到的子分区;所述处理单元620,还用于获取所述第三读请求携带的时间戳;所述处理单元620,还用于若所述第三读请求携带的时间戳大于所述父分区对应的时间戳,使用所述第三分区范围检查器检查所述第三数据记录的行键值所属的分区的分区范围。Optionally, as an embodiment, after the split of the parent partition is completed, the value of the time stamp corresponding to the parent partition is the split completion time of the parent partition, and the partition server stores a third partition range Inspector, the partition range of the third partition range checker is the partition range of the second target sub-partition, the second target sub-partition is a sub-partition obtained by splitting the parent partition, and the receiving module 610 , is also used to receive a third read request, the third read request is used to request to read a third data record from the second target sub-partition, the second target sub-partition is to split the parent partition the sub-partition obtained; the processing unit 620 is further configured to obtain the timestamp carried by the third read request; the processing unit 620 is also configured to obtain the timestamp carried by the third read request greater than the For the timestamp corresponding to the parent partition, use the third partition range checker to check the partition range of the partition to which the row key value of the third data record belongs.
可选地,作为一个实施例,所述分区服务器存储有所述父分区的分区元数据记录的多个副本,所述处理模块620还用于:在将所述父分区拆分为所述多个子分区之后,将所述多个副本修改为所述多个子分区的分区元数据记录。Optionally, as an embodiment, the partition server stores multiple copies of the partition metadata records of the parent partition, and the processing module 620 is further configured to: split the parent partition into the multiple After sub-partitions, modify the multiple replicas into partition metadata records of the multiple sub-partitions.
在可选的实施例中,所述接收模块610可以为输入输出接口730,所述处理模块620可以为处理器720,所述分区服务器还可以包括存储器710,具体如图7所示。In an optional embodiment, the receiving module 610 may be an input/
图7是本申请另一实施例的分区服务器集群的示意性框图。图7所示的分区服务器集群700可以包括至少一个分区服务器,每个分区服务器包括:存储器710、处理器720和输入/输出接口730。其中,存储器710、处理器720和输入/输出接口730通过内部连接通路相连,该存储器710用于存储指令,该处理器720用于执行该存储器720存储的指令,以控制输入/输出接口730接收输入的数据和信息,输出操作结果等数据。Fig. 7 is a schematic block diagram of a partitioned server cluster according to another embodiment of the present application. The partition server cluster 700 shown in FIG. 7 may include at least one partition server, and each partition server includes: a
需要说明的是,上述分区服务器集群可以包括一个分区服务器,还可以包括多个分区服务器。分区服务器集群包括多个分区服务器时,多个分区服务器相互协作,实现图1-5所示的方法中分区服务器实现的各个功能,分区服务器集群具体结构参见图7,即分区服务器集群可以包括多个存储器,多个处理器和多个输入/输出接口。分区服务器集群包括一个分区服务器时,该分区服务器的具体结构可以参见图7中一个分区服务器结构,即分区服务器集群可以包括一个存储器,一个处理器和一个输入/输出接口。It should be noted that the above partition server cluster may include one partition server, and may also include multiple partition servers. When the partition server cluster includes multiple partition servers, the multiple partition servers cooperate with each other to implement the functions of the partition servers in the method shown in Figure 1-5. The specific structure of the partition server cluster is shown in Figure 7, that is, the partition server cluster can include multiple memory, multiple processors and multiple input/output interfaces. When the partition server cluster includes one partition server, the specific structure of the partition server can refer to the partition server structure in FIG. 7 , that is, the partition server cluster can include a memory, a processor and an input/output interface.
在实现过程中,上述方法的各步骤可以通过处理器720中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器710,处理器720读取存储器710中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the
应理解,本申请实施例中,该处理器可以为中央处理单元(central processingunit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signalprocessor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor may be a central processing unit (central processing unit, CPU), and the processor may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits ( application specific integrated circuit (ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
还应理解,本申请实施例中,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。处理器的一部分还可以包括非易失性随机存取存储器。例如,处理器还可以存储设备类型的信息。It should also be understood that in the embodiment of the present application, the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor. A portion of the processor may also include non-volatile random access memory. For example, the processor may also store device type information.
应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。It should be understood that in this embodiment of the present application, "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B according to A does not mean determining B only according to A, and B may also be determined according to A and/or other information.
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够读取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital video disc,DVD))或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be read by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium, (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (digital video disc, DVD)) or a semiconductor medium (for example, a solid state disk (solid state disk, SSD) )wait.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.
Claims (13)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811323197.7A CN111221857B (en) | 2018-11-08 | 2018-11-08 | Method and apparatus for reading data records from a distributed system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811323197.7A CN111221857B (en) | 2018-11-08 | 2018-11-08 | Method and apparatus for reading data records from a distributed system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111221857A CN111221857A (en) | 2020-06-02 |
CN111221857B true CN111221857B (en) | 2023-04-18 |
Family
ID=70830168
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811323197.7A Active CN111221857B (en) | 2018-11-08 | 2018-11-08 | Method and apparatus for reading data records from a distributed system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111221857B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115113B (en) * | 2020-09-25 | 2022-03-25 | 北京百度网讯科技有限公司 | Data storage system, method, device, equipment and storage medium |
CN114547019A (en) * | 2020-11-24 | 2022-05-27 | 网联清算有限公司 | Database reading and writing method, device, server and medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7734615B2 (en) * | 2005-05-26 | 2010-06-08 | International Business Machines Corporation | Performance data for query optimization of database partitions |
US9996572B2 (en) * | 2008-10-24 | 2018-06-12 | Microsoft Technology Licensing, Llc | Partition management in a partitioned, scalable, and available structured storage |
GB2521197A (en) * | 2013-12-13 | 2015-06-17 | Ibm | Incremental and collocated redistribution for expansion of an online shared nothing database |
CN106326241A (en) * | 2015-06-15 | 2017-01-11 | 阿里巴巴集团控股有限公司 | Method and apparatus for reading/writing data table in data table splitting process |
CN105353988A (en) * | 2015-11-13 | 2016-02-24 | 曙光信息产业(北京)有限公司 | Metadata reading and writing method and device |
US10353895B2 (en) * | 2015-11-24 | 2019-07-16 | Sap Se | Atomic visibility switch for transactional cache invalidation |
US10726009B2 (en) * | 2016-09-26 | 2020-07-28 | Splunk Inc. | Query processing using query-resource usage and node utilization data |
-
2018
- 2018-11-08 CN CN201811323197.7A patent/CN111221857B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111221857A (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6033805B2 (en) | Balanced consistent hash for distributed resource management | |
US9830101B2 (en) | Managing data storage in a set of storage systems using usage counters | |
US11726984B2 (en) | Data redistribution method and apparatus, and database cluster | |
CN110071978B (en) | Cluster management method and device | |
WO2019245764A1 (en) | Hierarchical namespace with strong consistency and horizontal scalability | |
US11586641B2 (en) | Method and mechanism for efficient re-distribution of in-memory columnar units in a clustered RDBMs on topology change | |
CN110908589B (en) | Data file processing method, device, system and storage medium | |
JP2023541298A (en) | Transaction processing methods, systems, devices, equipment, and programs | |
CN111221857B (en) | Method and apparatus for reading data records from a distributed system | |
US12367296B2 (en) | Native multi-tenant row table encryption | |
WO2022083267A1 (en) | Data processing method, apparatus, computing node, and computer readable storage medium | |
US11656957B1 (en) | Managing nodes of a DBMS | |
CN112506606A (en) | Migration method, device, equipment and medium for containers in cluster | |
CN117742598A (en) | Method, device, equipment and medium for managing cache data | |
US11606277B2 (en) | Reducing the impact of network latency during a restore operation | |
CN116594551A (en) | Data storage method and device | |
CN114647697A (en) | Method, device, computing equipment and storage medium for accessing database | |
CN114519049A (en) | Data processing method and device | |
CN112000431A (en) | Object storage and read-write method and device of distributed storage system | |
CN114546580A (en) | Cache deployment system, cache deployment method, electronic device and storage medium | |
CN119690357B (en) | Data management method, device, equipment, medium and product in storage system | |
US11943316B1 (en) | Database connection multiplexing for prepared statements | |
CN113918644B (en) | Method and related device for managing data of application program | |
CN112578996B (en) | Metadata sending method of storage system and storage system | |
CN116186165A (en) | Data copying method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220208 Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province Applicant after: Huawei Cloud Computing Technologies Co.,Ltd. Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |