CN117891815B

CN117891815B - Database parallel scanning method and device, electronic equipment and storage medium

Info

Publication number: CN117891815B
Application number: CN202311811158.2A
Authority: CN
Inventors: 侯宗田
Original assignee: Primitive Data Beijing Information Technology Co ltd
Current assignee: Primitive Data Beijing Information Technology Co ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-09-10
Anticipated expiration: 2043-12-26
Also published as: CN117891815A

Abstract

The present application discloses a database parallel scanning method, device, electronic device and storage medium, and relates to the field of database technology. In the method, the data index tree of the database is first obtained and multiple scanning threads are created. The data index tree includes multiple index pages, and the index pages are stored in order according to the page numbers. The first scanning thread is used as the initial thread. The initial thread is used to obtain a synchronization lock, select an initial index page from the index page and determine a preset number, and perform data scanning on the initial index page. When performing data scanning, the initial thread releases the synchronization lock. Then, based on the page number of the initial index page, a preset number of index pages are selected in order as target index pages, and the target page numbers of the target index pages are written into the preset shared parameters in sequence. After each write, the scanning thread is awakened by the synchronization lock, and the scanning thread is used to perform data scanning on the target index page, which can effectively improve the efficiency of database query scanning.

Description

Database parallel scanning method, device, electronic device and storage medium

技术领域Technical Field

本申请涉及数据库技术领域，特别是涉及一种数据库并行扫描方法、装置、电子设备及存储介质。The present application relates to the field of database technology, and in particular to a database parallel scanning method, device, electronic device and storage medium.

背景技术Background Art

在数据库的日常应用中，对于数据量较大的场景，通常采用并行扫描的方法对数据进行扫描读取以加快速度。相关技术在对数据库的并行扫描过程中，通常是将数据库的数据页面进行分组，由多个线程同时执行扫描任务，每个线程负责扫描其中一组数据页面。虽然通过了并行执行可以提升数据库扫描查询效率，但是并没有充分利用数据库的索引，效率提升效果有限。In the daily application of databases, for scenarios with large amounts of data, parallel scanning is usually used to scan and read data to speed up the process. In the process of parallel scanning of databases, related technologies usually group the data pages of the database, and have multiple threads perform scanning tasks simultaneously, with each thread responsible for scanning one group of data pages. Although parallel execution can improve the efficiency of database scanning queries, it does not fully utilize the database index, and the efficiency improvement effect is limited.

发明内容Summary of the invention

本申请旨在至少解决现有技术中存在的技术问题之一。为此，本申请实施例提供了一种数据库并行扫描方法、装置、电子设备及存储介质，能够有效提升数据库的效率。The present application aims to solve at least one of the technical problems existing in the prior art. To this end, the present application provides a database parallel scanning method, device, electronic device and storage medium, which can effectively improve the efficiency of the database.

第一方面，本申请实施例提供了一种数据库并行扫描方法，包括：In a first aspect, an embodiment of the present application provides a database parallel scanning method, comprising:

获取数据库的数据索引树，并创建多个扫描线程；所述数据索引树包括多个索引页面，所述索引页面按照页面编号有序存储，第一个所述扫描线程作为初始线程；Obtain a data index tree of the database and create multiple scanning threads; the data index tree includes multiple index pages, the index pages are stored in order according to page numbers, and the first scanning thread is used as the initial thread;

利用所述初始线程获取同步锁，从所述索引页面中选取初始索引页面并确定预设数量，并对所述初始索引页面进行数据扫描，在进行数据扫描时，所述初始线程释放所述同步锁；Using the initial thread to acquire a synchronization lock, selecting an initial index page from the index pages and determining a preset number, and performing data scanning on the initial index page, when performing the data scanning, the initial thread releases the synchronization lock;

基于所述初始索引页面和所述页面编号有序选取所述预设数量的所述索引页面作为目标索引页面；Based on the initial index page and the page number, sequentially select the preset number of index pages as target index pages;

依次将所述目标索引页面的目标页面编号写入预设的共享参数中，在每次写入后，利用所述同步锁唤醒所述扫描线程，利用所述扫描线程对所述目标索引页面进行数据扫描。The target page numbers of the target index pages are sequentially written into the preset shared parameters. After each writing, the scanning thread is awakened by using the synchronization lock, and the scanning thread is used to perform data scanning on the target index page.

在本申请的一些实施例中，所述数据库还包括至少一个数据页面，所述数据页面包括至少一个数据元组，所述索引页面包括至少一个索引元组，所述索引元组包括用于指示所述数据页面的第一位置的数据页码和用于索引所述数据元组的第二位置的数据序号；数据扫描的过程包括以下步骤：In some embodiments of the present application, the database further includes at least one data page, the data page includes at least one data tuple, the index page includes at least one index tuple, the index tuple includes a data page number for indicating a first position of the data page and a data sequence number for indexing a second position of the data tuple; the data scanning process includes the following steps:

从所述索引页面的第一个所述索引元组开始扫描，利用所述数据页码查找所述第一位置的所述数据页面作为目标数据页面；Start scanning from the first index tuple of the index page, and use the data page number to find the data page at the first position as the target data page;

利用所述数据序号在所述目标数据页面中查找所述第二位置的数据元组；Searching the target data page for the data tuple at the second position using the data sequence number;

依次对所述索引页面的所有所述索引元组进行扫描，得到所述索引元组对应的所述数据元组。All the index tuples of the index page are scanned in sequence to obtain the data tuples corresponding to the index tuples.

在本申请的一些实施例中，所述基于所述初始索引页面和所述页面编号有序选取所述预设数量的所述索引页面作为目标索引页面，包括：In some embodiments of the present application, sequentially selecting the preset number of index pages as target index pages based on the initial index page and the page number includes:

获取所述初始索引页面的初始页面编号；Obtaining the initial page number of the initial index page;

依次增大所述初始页面编号，选取预设数量的所述目标页面编号，将所述目标页面编号的所述索引页面作为所述目标索引页面。The initial page number is increased sequentially, a preset number of target page numbers are selected, and the index pages of the target page numbers are used as the target index pages.

在本申请的一些实施例中，所述利用所述同步锁唤醒所述扫描线程，利用所述扫描线程对所述目标索引页面进行数据扫描，还包括：In some embodiments of the present application, waking up the scanning thread by using the synchronization lock, and performing data scanning on the target index page by using the scanning thread, further includes:

按照顺序选取所述扫描线程作为目标线程，利用所述目标线程获取所述同步锁；Select the scanning thread as the target thread in order, and use the target thread to acquire the synchronization lock;

从所述共享参数中获取所述目标页面编号，基于所述目标页面编号选取所述目标索引页面；Acquire the target page number from the shared parameter, and select the target index page based on the target page number;

利用所述目标线程对所述目标索引页面进行数据扫描，并同时释放所述同步锁，将所述共享参数指向下一个所述目标页面编号。The target thread is used to scan the data of the target index page, and the synchronization lock is released at the same time, and the shared parameter is pointed to the next target page number.

在本申请的一些实施例中，所述从所述索引页面中选取初始索引页面并确定预设数量，包括：In some embodiments of the present application, the selecting an initial index page from the index pages and determining a preset number includes:

根据过滤初始值在所述索引页面中进行筛选，得到所述初始索引页面；Filtering the index page according to the initial filter value to obtain the initial index page;

根据过滤结束值确定结束索引页面，并根据所述结束索引页面与所述初始索引页面得到所述预设数量。An end index page is determined according to the filtering end value, and the preset number is obtained according to the end index page and the initial index page.

在本申请的一些实施例中，所述方法还包括：In some embodiments of the present application, the method further includes:

当所述扫描线程访问所述结束索引页面时，生成扫描结束状态；When the scanning thread accesses the end index page, a scanning end status is generated;

将所述扫描结束状态广播至每个所述扫描线程，在所述扫描线程扫描完对应的所述索引页面后，根据所述扫描结束状态终止所述扫描线程。The scanning end status is broadcasted to each scanning thread, and after the scanning thread finishes scanning the corresponding index page, the scanning thread is terminated according to the scanning end status.

在本申请的一些实施例中，所述数据索引树包括多个叶子节点；所述方法还包括：In some embodiments of the present application, the data index tree includes a plurality of leaf nodes; the method further includes:

针对每个所述索引页面，获取其相邻的所述索引页面的页面编号，得到相邻页面编号；For each of the index pages, obtain the page number of the adjacent index page to obtain the adjacent page number;

将所述索引页面和所述相邻页面编号作为节点元素，按照所述页面编号的顺序依次将所述节点元素存入所述叶子节点，得到所述数据索引树。The index page and the adjacent page numbers are used as node elements, and the node elements are stored in the leaf nodes in sequence according to the order of the page numbers to obtain the data index tree.

第二方面，本申请实施例还提供了一种数据库并行扫描装置，应用如本申请第一方面实施例所述的数据库并行扫描方法，包括：In a second aspect, the embodiment of the present application further provides a database parallel scanning device, which applies the database parallel scanning method described in the embodiment of the first aspect of the present application, including:

获取模块，用于获取数据库的数据索引树，并创建多个扫描线程；所述数据索引树包括多个索引页面，所述索引页面按照页面编号有序存储，第一个所述扫描线程作为初始线程；An acquisition module is used to acquire a data index tree of a database and create multiple scanning threads; the data index tree includes multiple index pages, the index pages are stored in order according to page numbers, and the first scanning thread is used as an initial thread;

第一扫描模块，用于利用所述初始线程获取同步锁，从所述索引页面中选取初始索引页面并确定预设数量，并对所述初始索引页面进行数据扫描，在进行数据扫描时，所述初始线程释放所述同步锁；A first scanning module, used to acquire a synchronization lock using the initial thread, select an initial index page from the index pages and determine a preset number, and perform data scanning on the initial index page. When performing the data scanning, the initial thread releases the synchronization lock;

选取模块，用于基于所述初始索引页面和所述页面编号有序选取所述预设数量的所述索引页面作为目标索引页面；A selection module, configured to sequentially select the preset number of index pages as target index pages based on the initial index page and the page number;

第二扫描模块，用于依次将所述目标索引页面的目标页面编号写入预设的共享参数中，在每次写入后，利用所述同步锁唤醒所述扫描线程，利用所述扫描线程对所述目标索引页面进行数据扫描。The second scanning module is used to write the target page number of the target index page into the preset shared parameters in sequence, and after each writing, wake up the scanning thread by using the synchronization lock, and use the scanning thread to scan the data of the target index page.

第三方面，本申请实施例还提供了一种电子设备，包括存储器、处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现如本申请第一方面实施例所述的数据库并行扫描方法。In a third aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the database parallel scanning method as described in the embodiment of the first aspect of the present application is implemented.

第四方面，本申请实施例还提供了一种计算机可读存储介质，所述存储介质存储有程序，所述程序被处理器执行实现如本申请第一方面实施例所述的数据库并行扫描方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the database parallel scanning method as described in the embodiment of the first aspect of the present application.

本申请实施例至少包括以下有益效果：The embodiments of the present application include at least the following beneficial effects:

本申请实施例提供了一种数据库并行扫描方法、装置、电子设备及存储介质。方法中首先获取数据库的数据索引树并创建多个扫描线程，数据索引树包括多个索引页面，索引页面按照页面编号有序存储，第一个扫描线程作为初始线程。利用初始线程获取同步锁，从索引页面中选取初始索引页面并确定预设数量，并对初始索引页面进行数据扫描，在进行数据扫描时，初始线程释放同步锁。然后基于初始索引页面的页面编号有序选取预设数量的索引页面作为目标索引页面，依次将目标索引页面的目标页面编号写入预设的共享参数中。在每次写入后，利用同步锁唤醒扫描线程，利用扫描线程对目标索引页面进行数据扫描。由此通过设置同步锁和共享参数，让不同的扫描线程有序读取索引页面并进行数据扫描，结合了并行扫描和索引扫描，有效提升数据库扫描查询的效率。The embodiment of the present application provides a database parallel scanning method, device, electronic device and storage medium. In the method, the data index tree of the database is first obtained and multiple scanning threads are created. The data index tree includes multiple index pages. The index pages are stored in order according to the page numbers, and the first scanning thread is used as the initial thread. The initial thread is used to obtain a synchronization lock, select an initial index page from the index page and determine a preset number, and perform data scanning on the initial index page. When performing data scanning, the initial thread releases the synchronization lock. Then, based on the page number of the initial index page, a preset number of index pages are selected in order as target index pages, and the target page numbers of the target index pages are written into the preset shared parameters in sequence. After each write, the scanning thread is awakened by the synchronization lock, and the scanning thread is used to perform data scanning on the target index page. Therefore, by setting the synchronization lock and shared parameters, different scanning threads can read the index page in order and perform data scanning, combining parallel scanning and index scanning, and effectively improving the efficiency of database scanning query.

本申请的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the present application will be given in part in the description below, and in part will become apparent from the description below, or will be learned through the practice of the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本申请的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and easily understood from the description of the embodiments in conjunction with the following drawings, in which:

图1是本申请一个实施例提供的数据库并行扫描方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a database parallel scanning method provided by an embodiment of the present application;

图2是本申请另一个实施例提供的数据库并行扫描方法的流程示意图；FIG2 is a schematic diagram of a flow chart of a database parallel scanning method provided by another embodiment of the present application;

图3是图1中步骤S103的流程示意图；FIG3 is a schematic diagram of the flow chart of step S103 in FIG1 ;

图4是图1中步骤S104的流程示意图；FIG4 is a schematic diagram of the process of step S104 in FIG1 ;

图5是图1中步骤S102的流程示意图；FIG5 is a schematic diagram of the process of step S102 in FIG1 ;

图6是本申请另一个实施例提供的数据库并行扫描方法的流程示意图；FIG6 is a schematic flow chart of a database parallel scanning method provided by another embodiment of the present application;

图7是本申请又一个实施例提供的数据库并行扫描方法的流程示意图；FIG7 is a schematic flow chart of a database parallel scanning method provided by yet another embodiment of the present application;

图8是本申请一个实施例提供的数据库并行扫描方法流程图；FIG8 is a flow chart of a database parallel scanning method provided by an embodiment of the present application;

图9是本申请一个实施例提供的数据库并行扫描装置模块示意图；FIG9 is a schematic diagram of a database parallel scanning device module provided by an embodiment of the present application;

图10是本申请一个实施例提供的电子设备的结构示意图。FIG. 10 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present application.

附图标记：获取模块100、第一扫描模块200、选取模块300、第二扫描模块400、电子设备1000、处理器1001、存储器1002。Reference numerals: acquisition module 100 , first scanning module 200 , selection module 300 , second scanning module 400 , electronic device 1000 , processor 1001 , memory 1002 .

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本申请，而不能理解为对本申请的限制。The embodiments of the present application are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present application, and cannot be understood as limiting the present application.

在本申请的描述中，需要理解的是，涉及到方位描述，例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本申请和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本申请的限制。In the description of the present application, it should be understood that descriptions involving orientation, such as up, down, front, back, left, right, etc., indicating orientations or positional relationships, are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be understood as a limitation on the present application.

在本申请的描述中，若干的含义是一个或者多个，多个的含义是两个以上，大于、小于、超过等理解为不包括本数，以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of this application, "several" means one or more, "more" means more than two, "greater than", "less than", "exceed", etc. are understood to exclude the number itself, and "above", "below", "within", etc. are understood to include the number itself. If there is a description of "first" or "second", it is only used for the purpose of distinguishing technical features, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the order of the indicated technical features.

本申请的描述中，除非另有明确的限定，设置、安装、连接等词语应做广义理解，所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本申请中的具体含义。In the description of this application, unless otherwise clearly defined, terms such as setting, installing, connecting, etc. should be understood in a broad sense, and technicians in the relevant technical field can reasonably determine the specific meanings of the above terms in this application based on the specific content of the technical solution.

在数据库的日常应用中，对于数据量较大的场景，通常采用并行扫描的方法对数据进行扫描读取以加快速度。相关技术在对数据库的并行扫描过程中，通常是将数据库的数据页面进行分组，由多个线程同时执行扫描任务，每个线程负责扫描其中一组数据页面,最后将数据进行汇总形成最终结果。在扫描过程中，通常包含一些过滤条件。为提升扫描速度，常见的做法有：1、通过创建分区表，根据过滤条件对分区表进行裁剪，来减少需要扫描的数据，提升查询性能。2、使用并行扫描，每个线程负责一部分的数据扫描并进行过滤。3、使用索引扫描，以快速找到包含指定数据的页面，提升扫描速度。4、对于分区裁剪过滤后的分区表之间并行，每个线程扫描单个分区表，使用该分区表的索引来进行索引扫描。In the daily application of databases, for scenarios with large amounts of data, parallel scanning is usually used to scan and read data to speed up the process. Related technologies In the process of parallel scanning of databases, the data pages of the database are usually grouped, and multiple threads perform the scanning task at the same time. Each thread is responsible for scanning one group of data pages, and finally the data is aggregated to form the final result. In the scanning process, some filtering conditions are usually included. To improve the scanning speed, common practices are: 1. By creating a partition table, the partition table is trimmed according to the filtering conditions to reduce the data to be scanned and improve the query performance. 2. Using parallel scanning, each thread is responsible for scanning and filtering a part of the data. 3. Use index scanning to quickly find the page containing the specified data and improve the scanning speed. 4. For the parallel partition tables after partition trimming and filtering, each thread scans a single partition table and uses the index of the partition table to perform index scanning.

然而，相关技术中的场景覆盖不够全面，对于分区表之间的并行索引扫描，需要目标表为分区表，且在每个分区表中都创建索引。或者对数据库进行目录划分，每个线程单独对某个目录创建一个索引，即存在多个索引，但不同索引间是相互独立的。相当于对数据库做了分片，对每个分片单独创建一个索引。然后使用哈希重建这多个索引，按照哈希值来分散数据到不同的索引中。在扫描阶段，如果扫描条件符合哈希值，则哈希到对应的索引，进行扫描，这里是单线程或单进程对一个索引进行扫描，是串行过程。在不符合哈希条件的情况下，才进行并行的索引扫描，而且此处的并行指的是每个线程单独扫描一个索引，多个线程之间并行。对单个索引的扫描还是只有一个线程在进行。和相关技术中分区表索引类似，对每一个分区数据单独建立索引，每个线程扫描不同的分区索引达到并行的效果，其本质上是多个索引之间的并行，单个索引内串行。因此并行扫描和索引扫描两者没有结合使用，没有充分利用数据库的索引，性能提升也没有最大化，因此数据库查询扫描的效率提升效果有限。However, the scenario coverage in the related art is not comprehensive enough. For parallel index scanning between partition tables, the target table needs to be a partition table, and an index is created in each partition table. Or the database is divided into directories, and each thread creates an index for a directory separately, that is, there are multiple indexes, but different indexes are independent of each other. It is equivalent to sharding the database and creating a separate index for each shard. Then use hashing to rebuild these multiple indexes, and distribute the data to different indexes according to the hash value. In the scanning stage, if the scanning condition meets the hash value, it is hashed to the corresponding index and scanned. Here, a single thread or a single process scans an index, which is a serial process. If the hash condition is not met, parallel index scanning is performed, and the parallelism here means that each thread scans an index separately, and multiple threads are parallel. There is still only one thread scanning a single index. Similar to the partition table index in the related art, an index is created separately for each partition data, and each thread scans different partition indexes to achieve a parallel effect. It is essentially parallel between multiple indexes and serial within a single index. Therefore, parallel scanning and index scanning are not used in combination, the database index is not fully utilized, and the performance improvement is not maximized, so the efficiency improvement of database query scanning is limited.

基于此，本申请实施例提供了一种数据库并行扫描方法、装置、电子设备及存储介质，能够通过设置同步锁和共享参数，让不同的扫描线程有序读取单个数据索引树的索引页面并进行数据扫描，结合了并行扫描和索引扫描，即使是对单张表单个索引进行扫描也可以并行执行，有效提升数据库扫描查询的效率。Based on this, the embodiments of the present application provide a database parallel scanning method, device, electronic device and storage medium, which can allow different scanning threads to read the index pages of a single data index tree in an orderly manner and perform data scanning by setting synchronization locks and shared parameters. It combines parallel scanning and index scanning, and even scanning a single index of a single table can be executed in parallel, effectively improving the efficiency of database scanning queries.

本申请实施例提供数据库并行扫描方法、装置、电子设备及存储介质，具体通过如下实施例进行说明，首先描述本申请实施例中的数据库并行扫描方法。The embodiments of the present application provide a database parallel scanning method, device, electronic device and storage medium, which are specifically described through the following embodiments. First, the database parallel scanning method in the embodiments of the present application is described.

本申请实施例提供的数据库并行扫描方法，涉及数据库技术领域，尤其涉及数据库扫描技术领域。本申请实施例提供的数据库并行扫描方法可应用于终端中，也可应用于服务器端中，还可以是运行于终端或服务器端中的计算机程序。举例来说，计算机程序可以是操作系统中的原生程序或软件模块；可以是本地应用程序，即需要在操作系统中安装才能运行的程序，如支持数据库并行扫描的客户端，即只需要下载到浏览器环境中就可以运行的程序。总而言之，上述计算机程序可以是任意形式的应用程序、模块或插件。其中，终端通过网络与服务器进行通信。该数据库并行扫描方法可以由终端或服务器执行，或由终端和服务器协同执行。The database parallel scanning method provided in the embodiment of the present application relates to the field of database technology, and in particular to the field of database scanning technology. The database parallel scanning method provided in the embodiment of the present application can be applied to a terminal, can be applied to a server side, and can also be a computer program running in a terminal or a server side. For example, a computer program can be a native program or a software module in an operating system; it can be a local application, that is, a program that needs to be installed in the operating system to run, such as a client that supports database parallel scanning, that is, a program that only needs to be downloaded to a browser environment to run. In short, the above-mentioned computer program can be any form of application, module or plug-in. Among them, the terminal communicates with the server via a network. The database parallel scanning method can be executed by a terminal or a server, or by a terminal and a server in collaboration.

在一些实施例中，终端可以是智能手机、平板电脑、笔记本电脑、台式计算机或者智能手表等。服务器可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDelivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器；也可以是区块链系统中的服务节点，该区块链系统中的各服务节点之间组成点对点(Peer ToPeer，P2P)网络，P2P协议是一个运行在传输控制协议(Transmission Control Protocol，TCP)协议之上的应用层协议。服务器上可以安装数据库并行扫描系统的服务端，通过该服务端可以与终端进行交互，例如服务端上安装对应的软件，软件可以是实现数据库并行扫描方法的应用等，但并不局限于以上形式。终端与服务器之间可以通过蓝牙、通用串行总线(Universal Serial Bus，USB)或者网络等通讯连接方式进行连接，本实施例在此不做限制。In some embodiments, the terminal can be a smart phone, a tablet computer, a laptop computer, a desktop computer or a smart watch. The server can be an independent server, or it can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms; it can also be a service node in a blockchain system, and each service node in the blockchain system forms a peer-to-peer (Peer To Peer, P2P) network. The P2P protocol is an application layer protocol running on the Transmission Control Protocol (TCP) protocol. The server can be installed with a server end of a database parallel scanning system, through which the server end can interact with the terminal, for example, the corresponding software is installed on the server end, and the software can be an application that implements a database parallel scanning method, etc., but is not limited to the above form. The terminal and the server can be connected through a communication connection method such as Bluetooth, Universal Serial Bus (USB) or a network, and this embodiment is not limited here.

本申请可用于众多通用或专用的计算机系统环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本发明，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application can be used in many general or special computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments including any of the above systems or devices, etc. The present application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present invention can also be practiced in distributed computing environments, in which tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

下面描述本发明实施例中的数据库并行扫描方法。The following describes a database parallel scanning method in an embodiment of the present invention.

图1是本发明实施例提供的数据库并行扫描方法的一个可选的流程图，图1中的方法可以包括但不限于包括步骤S101至步骤S104。同时可以理解的是，本实施例对图1中步骤S101至步骤S104的顺序不做具体限定，可以根据实际需求调整步骤顺序或者减少、增加某些步骤。FIG1 is an optional flow chart of a database parallel scanning method provided by an embodiment of the present invention. The method in FIG1 may include but is not limited to steps S101 to S104. It is also understood that the present embodiment does not specifically limit the order of steps S101 to S104 in FIG1. The order of steps may be adjusted or some steps may be reduced or increased according to actual needs.

步骤S101，获取数据库的数据索引树，并创建多个扫描线程。Step S101, obtaining a data index tree of a database and creating multiple scanning threads.

在一些实施例中，数据库的数据索引树是B树(B-Tree)，B树(B-Tree)是一种自平衡的树，能够保持数据有序并对数据进行高效检索。具体的，数据索引树包括多个索引页面，索引页面按照页面编号有序存储，例如数据索引树有三个索引页面，对应的页面编号分别为1，2和3，三个索引页面可以按照编号从小到大的顺序存储，本实施例对此不做限制。In some embodiments, the data index tree of the database is a B-Tree, which is a self-balancing tree that can keep data in order and retrieve data efficiently. Specifically, the data index tree includes multiple index pages, and the index pages are stored in order according to page numbers. For example, the data index tree has three index pages, and the corresponding page numbers are 1, 2, and 3, respectively. The three index pages can be stored in order from small to large numbers, and this embodiment does not limit this.

在一些实施例中，创建多个扫描线程，并且将第一个扫描线程作为初始线程。具体的，如果是对数据库进行完整扫描，则创建的扫描线程的数量和索引页面的数量一致；如果是根据过滤条件对数据库进行部分扫描，则创建的扫描线程的数量和满足过滤条件的索引页面的数量一致，本实施例对此不做限制。In some embodiments, multiple scanning threads are created, and the first scanning thread is used as the initial thread. Specifically, if the database is completely scanned, the number of scanning threads created is consistent with the number of index pages; if the database is partially scanned according to the filtering condition, the number of scanning threads created is consistent with the number of index pages that meet the filtering condition, which is not limited in this embodiment.

步骤S102，利用初始线程获取同步锁，从索引页面中选取初始索引页面并确定预设数量，并对初始索引页面进行数据扫描，在进行数据扫描时，初始线程释放同步锁。Step S102, using the initial thread to obtain a synchronization lock, selecting an initial index page from the index pages and determining a preset number, and performing data scanning on the initial index page. When performing the data scanning, the initial thread releases the synchronization lock.

在一些实施例中，为了避免多个扫描线程同时访问数据库的同一个索引页面导致的冲突和不一致，本申请使用了同步机制，例如同步锁。同步锁可以确保在某一时刻只有一个线程可以执行预设的操作。具体的，利用初始线程获取同步锁，因此当初始线程开始执行时，它首先获取一个同步锁。这意味着，在这个同步锁被释放之前，其他尝试获取同步锁的扫描线程将被阻塞，确保初始线程可以安全地执行预设的操作。In some embodiments, in order to avoid conflicts and inconsistencies caused by multiple scanning threads accessing the same index page of the database at the same time, the present application uses a synchronization mechanism, such as a synchronization lock. The synchronization lock can ensure that only one thread can perform a preset operation at a certain time. Specifically, the initial thread is used to obtain the synchronization lock, so when the initial thread starts executing, it first obtains a synchronization lock. This means that before this synchronization lock is released, other scanning threads that attempt to obtain the synchronization lock will be blocked, ensuring that the initial thread can safely perform the preset operation.

在一些实施例中，初始线程从索引页面中选取初始索引页面并确定预设数量。具体的，如果是对数据库进行完整扫描，则初始索引页面即第一个索引页面；如果是根据过滤条件对数据库进行部分扫描，则初始索引页面是满足过滤条件的第一个索引页面，本实施例对此不做限制。In some embodiments, the initial thread selects an initial index page from the index pages and determines a preset number. Specifically, if the database is completely scanned, the initial index page is the first index page; if the database is partially scanned according to the filtering condition, the initial index page is the first index page that meets the filtering condition, which is not limited in this embodiment.

在初始线程开始对初始索引页面进行数据扫描的同时释放同步锁。由此允许其他扫描线程在等待期间获取同步锁，进而执行相关操作，可以在保证数据一致性的同时，尽可能地提高并发性能。The synchronization lock is released when the initial thread starts scanning the data of the initial index page, thereby allowing other scanning threads to obtain the synchronization lock during the waiting period and then perform related operations, thereby ensuring data consistency while maximizing concurrency performance.

步骤S103，基于初始索引页面和页面编号有序选取预设数量的索引页面作为目标索引页面。Step S103 : selecting a preset number of index pages as target index pages in order based on the initial index pages and the page numbers.

在一些实施例中，基于初始索引页面和页面编号有序选取预设数量的索引页面作为目标索引页面。示例性的，假设数据索引树包括100个索引页面，根据[1,100]的页面编码有序存储，根据过滤条件得到的初始索引页面为第15个索引页面以及预设数量为10，则目标索引页面是页面编号[16,25]对应的10个索引页面，本实施例对此不做限制。In some embodiments, a preset number of index pages are selected in order as target index pages based on the initial index page and the page number. For example, assuming that the data index tree includes 100 index pages, which are stored in order according to the page code of [1,100], the initial index page obtained according to the filtering condition is the 15th index page and the preset number is 10, then the target index pages are the 10 index pages corresponding to the page numbers [16,25], and this embodiment does not limit this.

步骤S104，依次将目标索引页面的目标页面编号写入预设的共享参数中，在每次写入后，利用同步锁唤醒扫描线程，利用扫描线程对目标索引页面进行数据扫描。Step S104, writing the target page numbers of the target index pages into the preset shared parameters in sequence, and after each writing, waking up the scanning thread by using the synchronization lock, and using the scanning thread to scan the data of the target index page.

在一些实施例中，依次将目标索引页面的目标页面编号写入预设的共享参数中，在每次写入后，利用同步锁唤醒一个扫描线程，从而利用扫描线程对目标索引页面进行数据扫描。具体的，在对初始扫描线程获取初始索引页面的同时，将目标页面编号16写入共享参数中，然后释放同步锁。利用同步锁唤醒下一个扫描线程，以根据目标页面编号获取目标索引页面，然后将下一个目标页面编号17写入共享参数中，然后释放同步锁。以此类推，由此通过设置同步锁和共享参数，可以利用不同的扫描线程并行有序地对不同的索引页面进行数据扫描，有效提升了数据库查询扫描的效率，本实施例对此不做限制。In some embodiments, the target page number of the target index page is sequentially written into the preset shared parameters, and after each write, a scanning thread is awakened by the synchronization lock, so that the target index page is scanned by the scanning thread. Specifically, while the initial scanning thread obtains the initial index page, the target page number 16 is written into the shared parameters, and then the synchronization lock is released. The next scanning thread is awakened by the synchronization lock to obtain the target index page according to the target page number, and then the next target page number 17 is written into the shared parameters, and then the synchronization lock is released. And so on, by setting the synchronization lock and shared parameters, different scanning threads can be used to perform data scans on different index pages in parallel and orderly, which effectively improves the efficiency of database query scanning, and this embodiment does not limit this.

参照图2所示，在本申请的一些实施例中，上述步骤中的数据扫描过程，可以包括但不限于以下步骤S201至步骤S202。2 , in some embodiments of the present application, the data scanning process in the above steps may include but is not limited to the following steps S201 to S202 .

步骤S201，从索引页面的第一个索引元组开始扫描，利用数据页码查找第一位置的数据页面作为目标数据页面。Step S201 , starting scanning from the first index tuple of the index page, and using the data page number to find the first data page as the target data page.

在一些实施例中，数据库还包括至少一个数据页面，每个数据页面包括至少一个数据元组，数据元组存储了对应的数据。而索引页面包括至少一个索引元组，每个索引元组包括用于指示数据页面的第一位置的数据页码。In some embodiments, the database further includes at least one data page, each data page includes at least one data tuple, and the data tuple stores corresponding data. The index page includes at least one index tuple, each index tuple includes a data page number for indicating the first position of the data page.

在扫描线程对索引页面进行数据扫描时，从索引页面的第一个索引元组开始扫描，利用索引元组中的数据页码查找第一位置的数据页面并作为目标数据页面。示例性的，数据库中包括10000个数据页面，当索引元组中的数据页码为3000时，第一位置即对应第3000个数据页面，由此得到目标数据页面，本实施例对此不做限制。When the scanning thread performs data scanning on the index page, the scanning starts from the first index tuple of the index page, and the data page number in the index tuple is used to find the data page at the first position and use it as the target data page. For example, the database includes 10,000 data pages. When the data page number in the index tuple is 3,000, the first position corresponds to the 3,000th data page, thereby obtaining the target data page, which is not limited in this embodiment.

步骤S202，利用数据序号在目标数据页面中查找第二位置的数据元组。Step S202: Search the data tuple at the second position in the target data page using the data sequence number.

在一些实施例中，索引元组还包括用于索引数据元组的第二位置的数据序号，扫描线程可以利用数据序号在目标数据页面中查找第二位置的数据元组。示例性的，目标数据页面包括500个数据元组，当索引元组中的数据序号为100时，第二位置即对应目标数据页面中的第100个数据元组，本实施例对此不做限制。In some embodiments, the index tuple also includes a data sequence number for the second position of the index data tuple, and the scanning thread can use the data sequence number to find the data tuple at the second position in the target data page. Exemplarily, the target data page includes 500 data tuples. When the data sequence number in the index tuple is 100, the second position corresponds to the 100th data tuple in the target data page, and this embodiment does not limit this.

步骤S203，依次对索引页面的所有索引元组进行扫描，得到索引元组对应的数据元组。Step S203, scan all index tuples of the index page in sequence to obtain data tuples corresponding to the index tuples.

在一些实施例中，索引元组还可以包括一个键值，通过键值可以查找对应的索引元组，从而根据索引元组中的数据页码和数据序号查找对应的数据页面中的数据元组。具体的，索引页面中的各个索引元组根据键值有序存储，例如索引元组A＝(10,(3000,100))中的键值为10，索引元组B＝(11,(1000,200))中的键值为11，因此索引元组A存储在索引元组B之前，本实施例对此不做限制。In some embodiments, the index tuple may also include a key value, and the corresponding index tuple may be searched through the key value, thereby searching the data tuple in the corresponding data page according to the data page number and data sequence number in the index tuple. Specifically, each index tuple in the index page is stored in order according to the key value, for example, the key value in index tuple A = (10, (3000, 100)) is 10, and the key value in index tuple B = (11, (1000, 200)) is 11, so index tuple A is stored before index tuple B, and this embodiment does not limit this.

利用扫描线程依次对索引页面的所有索引元组进行扫描，得到索引元组对应的数据元组，由此实现了该索引页面的数据扫描。通过多个扫描线程并行扫描不同的索引页面，实现了索引和数据的同时并行扫描，在扫描过程中充分利用索引，有效提升了数据库查询扫描的效率和性能。The scanning thread is used to scan all index tuples of the index page in sequence to obtain the data tuple corresponding to the index tuple, thereby realizing the data scanning of the index page. By scanning different index pages in parallel through multiple scanning threads, the simultaneous parallel scanning of indexes and data is realized, and the index is fully utilized in the scanning process, which effectively improves the efficiency and performance of database query scanning.

参照图3所示，在本申请的一些实施例中，上述步骤S103中基于初始索引页面和页面编号有序选取预设数量的索引页面作为目标索引页面，可以包括但不限于以下步骤S301至步骤S302。3 , in some embodiments of the present application, in the above step S103 , a preset number of index pages are sequentially selected as target index pages based on the initial index page and the page number, which may include but is not limited to the following steps S301 to S302 .

步骤S301，获取初始索引页面的初始页面编号。Step S301, obtaining the initial page number of the initial index page.

在一些实施例中，每个索引页面对应一个页面编号，索引页面根据页面编号有序存储。获取初始索引页面的初始页面编号，对应的，初始索引页面为第1个索引页面时，初始页面编号为1；初始索引页面为第15个索引页面时，初始页面编号为15。具体根据是否存在过滤条件确定，本实施例对此不做限制。In some embodiments, each index page corresponds to a page number, and the index pages are stored in order according to the page numbers. The initial page number of the initial index page is obtained. Correspondingly, when the initial index page is the first index page, the initial page number is 1; when the initial index page is the 15th index page, the initial page number is 15. The specific determination depends on whether there is a filtering condition, and this embodiment does not limit this.

步骤S302，依次增大初始页面编号，选取预设数量的目标页面编号，将目标页面编号的索引页面作为目标索引页面。Step S302 , increasing the initial page number in sequence, selecting a preset number of target page numbers, and using the index pages of the target page numbers as target index pages.

在一些实施例中，依次增大初始页面编号，选取预设数量的目标页面编号，将目标页面编号的索引页面作为目标索引页面。示例性的，当预设数量为10，初始索引页面为15时，增大初始页面编号为16，此时选取了1个目标页面编号，得到第16个索引页面作为目标索引页面；然后增大初始页面编号为17，此时选取了2个目标页面编号，得到第17个索引页面作为目标索引页面。依次类推，直至增大初始页面编号为25，对应选取了10个目标页面编号，最终得到10个目标索引页面，本实施例对此不做限制。In some embodiments, the initial page number is increased sequentially, a preset number of target page numbers are selected, and the index page of the target page number is used as the target index page. Exemplarily, when the preset number is 10 and the initial index page is 15, the initial page number is increased to 16, and 1 target page number is selected at this time, and the 16th index page is obtained as the target index page; then the initial page number is increased to 17, and 2 target page numbers are selected at this time, and the 17th index page is obtained as the target index page. And so on, until the initial page number is increased to 25, 10 target page numbers are selected correspondingly, and finally 10 target index pages are obtained, which is not limited in this embodiment.

参照图4所示，在本申请的一些实施例中，上述步骤S104中利用同步锁唤醒扫描线程，利用扫描线程对目标索引页面进行数据扫描，可以包括但不限于以下步骤S401至步骤S403。As shown in FIG. 4 , in some embodiments of the present application, in the above step S104 , the scanning thread is awakened by using a synchronization lock, and the scanning thread is used to perform data scanning on the target index page, which may include but is not limited to the following steps S401 to S403 .

步骤S401，按照顺序选取扫描线程作为目标线程，利用目标线程获取同步锁。Step S401, selecting a scanning thread as a target thread in order, and using the target thread to acquire a synchronization lock.

在一些实施例中，按照顺序选取扫描线程为目标线程，例如有10个扫描线程，依次将每个扫描线程作为目标线程去获取同步锁，本实施例对此不做限制。In some embodiments, scanning threads are selected as target threads in sequence. For example, if there are 10 scanning threads, each scanning thread is selected as a target thread to acquire a synchronization lock in turn. This embodiment does not impose any limitation on this.

步骤S402，从共享参数中获取目标页面编号，基于目标页面编号选取目标索引页面。Step S402 , obtaining a target page number from the shared parameters, and selecting a target index page based on the target page number.

在一些实施例中，目标线程从共享参数中获取目标页面编号，基于目标页面编号选取目标索引页面，例如共享参数中的目标页面编号为17，则目标线程可以根据目标页面编号选取第17个索引页面作为目标索引页面，本实施例对此不做限制。In some embodiments, the target thread obtains the target page number from the shared parameters and selects the target index page based on the target page number. For example, if the target page number in the shared parameters is 17, the target thread can select the 17th index page as the target index page according to the target page number. This embodiment does not impose any restrictions on this.

步骤S403，利用目标线程对目标索引页面进行数据扫描，并同时释放同步锁，将共享参数指向下一个目标页面编号。Step S403: Use the target thread to scan the target index page for data, and release the synchronization lock at the same time, and point the shared parameter to the next target page number.

在一些实施例中，利用目标线程对目标索引页面进行数据扫描，并同时释放同步锁，将共享参数指向下一个目标页面编号。可以理解的是，由于索引页面时根据页面编号有序存储的，因此下一个目标页面编号，可以由目标线程对目标索引页面进行数据扫描时得到。在释放同步锁之前，便更新共享参数指向下一个目标页面编号，本实施例对此不做限制。In some embodiments, the target thread is used to scan the target index page for data, and the synchronization lock is released at the same time, and the shared parameter is pointed to the next target page number. It can be understood that since the index page is stored in order according to the page number, the next target page number can be obtained when the target thread scans the target index page for data. Before releasing the synchronization lock, the shared parameter is updated to point to the next target page number, and this embodiment does not limit this.

参照图5所示，在本申请的一些实施例中，上述步骤S102中从索引页面中选取初始索引页面并确定预设数量，可以包括但不限于以下步骤S501至步骤S502。5 , in some embodiments of the present application, selecting an initial index page from the index pages and determining a preset number in the above step S102 may include but is not limited to the following steps S501 to S502 .

步骤S501，根据过滤初始值在索引页面中进行筛选，得到初始索引页面。Step S501, filtering in the index pages according to the initial filtering value to obtain the initial index page.

在一些实施例中，过滤条件包括过滤初始值，根据过滤初始值在索引页面中进行筛选，得到初始索引页面。示例性的，对于数据库查询语句“select*from t1 where c1>10000”，其中where后面即为过滤条件，c1>10000为过滤初始值，起始线程根据该过滤初始值可以从数据库表t1中选取c1>10000的索引页面，本实施例对此不做限制。In some embodiments, the filtering condition includes a filtering initial value, and the index page is screened according to the filtering initial value to obtain the initial index page. For example, for a database query statement "select * from t1 where c1>10000", where the filter condition is after where, c1>10000 is the filtering initial value, the starting thread can select the index page with c1>10000 from the database table t1 according to the filtering initial value, and this embodiment does not limit this.

步骤S502，根据过滤结束值确定结束索引页面，并根据结束索引页面与初始索引页面得到预设数量。Step S502, determining the end index page according to the filtering end value, and obtaining a preset number according to the end index page and the initial index page.

在一些实施例中，过滤条件还可以包括过滤结束值。示例性的，对于数据库查询语句“select*from t1 where c1>10000and c1<20000”，其中where后面即为过滤条件，c1>10000为过滤初始值，c1<20000为过滤结束值。因此起始线程根据该过滤条件可以从数据库表t1找到包含数据10000的初始索引页面，介于索引页面的有序性，只需要从初始索引页面起始往后查找到包含数据20000的结束索引页面，本实施例对此不做限制。In some embodiments, the filter condition may also include a filter end value. For example, for the database query statement "select * from t1 where c1>10000and c1<20000", the filter condition is after where, c1>10000 is the filter initial value, and c1<20000 is the filter end value. Therefore, the starting thread can find the initial index page containing data 10000 from the database table t1 according to the filter condition. Due to the orderliness of the index page, it only needs to search from the initial index page to the end index page containing data 20000. This embodiment does not limit this.

根据过滤初始值确定初始索引页面，根据过滤结束值确定结束索引页面，并根据结束索引页面与初始索引页面得到预设数量。例如初始索引页面对应的页面编号为15，结束索引页面对应的页面编号为25，得到预设数量为25-15＝10，本实施例对此不做限制。The initial index page is determined according to the initial filtering value, the end index page is determined according to the end filtering value, and the preset number is obtained according to the end index page and the initial index page. For example, the page number corresponding to the initial index page is 15, and the page number corresponding to the end index page is 25, and the preset number is 25-15=10, which is not limited in this embodiment.

参照图6所示，在本申请的一些实施例中，数据库并行扫描方法还可以包括但不限于以下步骤S601至步骤S602。6 , in some embodiments of the present application, the database parallel scanning method may also include but is not limited to the following steps S601 to S602 .

步骤S601，当扫描线程访问结束索引页面时，生成扫描结束状态。Step S601: When the scanning thread accesses the end index page, a scanning end status is generated.

在一些实施例中，当扫描线程访问结束索引页面时，生成扫描结束状态。扫描结束状态可以存储在共享参数中，本实施例对此不做限制。In some embodiments, when the scanning thread accesses the end index page, a scanning end status is generated. The scanning end status can be stored in a shared parameter, which is not limited in this embodiment.

步骤S602，将扫描结束状态广播至每个扫描线程，在扫描线程扫描完对应的索引页面后，根据扫描结束状态终止扫描线程。Step S602, broadcasting the scanning end status to each scanning thread, and terminating the scanning thread according to the scanning end status after the scanning thread finishes scanning the corresponding index page.

在一些实施例中，将扫描结束状态广播至每个扫描线程，在各个扫描线程扫描完对应的索引页面后，即每个扫描线程都完成对应索引页面中各个索引元组的数据扫描后，根据扫描结束状态终止扫描线程，由此整个扫描即可结束，本实施例对此不做限制。In some embodiments, the scan end status is broadcast to each scanning thread. After each scanning thread has scanned the corresponding index page, that is, after each scanning thread has completed the data scan of each index tuple in the corresponding index page, the scanning thread is terminated according to the scan end status, and the entire scan is ended. This embodiment does not impose any restrictions on this.

参照图7所示，在本申请的一些实施例中，数据库并行扫描方法还可以包括但不限于以下步骤S701至步骤S702。7 , in some embodiments of the present application, the database parallel scanning method may also include but is not limited to the following steps S701 to S702 .

步骤S701，针对每个索引页面，获取其相邻的索引页面的页面编号，得到相邻页面编号。Step S701: for each index page, obtain the page number of its adjacent index page to obtain the adjacent page number.

在一些实施例中，对于每个索引页面，获取其相邻的索引页面的页面编号，得到相邻页面编号。例如100个索引页面根据[1,100]的页面编号有序存储，对于第1个索引页面，其相邻的索引页面只有第2个索引页面，因此相邻页面编号为2；对于第50个索引页面，其相邻的索引页面为第49个索引页面和第51个索引页面，因此相邻页面编号为49和51，本实施例对此不做限制。In some embodiments, for each index page, the page numbers of its adjacent index pages are obtained to obtain the adjacent page numbers. For example, 100 index pages are stored in order according to the page numbers of [1,100]. For the first index page, its adjacent index page is only the second index page, so the adjacent page number is 2; for the 50th index page, its adjacent index pages are the 49th index page and the 51st index page, so the adjacent page numbers are 49 and 51, and this embodiment does not limit this.

步骤S702，将索引页面和相邻页面编号作为节点元素，按照页面编号的顺序依次将节点元素存入叶子节点，得到所述数据索引树。Step S702: Use the index page and adjacent page numbers as node elements, and store the node elements into leaf nodes in sequence according to the order of page numbers to obtain the data index tree.

在一些实施例中，数据索引树包括多个叶子节点。其中，将索引页面和相邻页面编号作为节点元素，按照页面编号的顺序依次将节点元素存入叶子节点，得到数据索引树。由此当扫描线程读取至该叶子节点的索引页面时，还可以获取该索引页面的相邻页面编号，从而将相邻页面编号作为目标页面编号存储至共享参数中，再释放同步锁或者通过其他条件参数唤醒下一个或者多个扫描线程，以便读取共享参数中的目标页面编号对对应的目标索引页面进行数据扫描，本实施例对此不做限制。In some embodiments, the data index tree includes multiple leaf nodes. The index page and the adjacent page numbers are used as node elements, and the node elements are stored in the leaf nodes in the order of the page numbers to obtain the data index tree. Therefore, when the scanning thread reads the index page of the leaf node, the adjacent page number of the index page can also be obtained, so that the adjacent page number is stored as the target page number in the shared parameter, and then the synchronization lock is released or the next one or more scanning threads are awakened by other conditional parameters, so as to read the target page number in the shared parameter to perform data scanning on the corresponding target index page, which is not limited in this embodiment.

在一些实施例中，对其中单个扫描线程内的数据扫描仍然保持索引数据输出的有序性，上层的查询计划可以利用该有序性，直接进行排序聚集，归并连接等操作，无需额外进行排序，本实施例对此不做限制。In some embodiments, the data scan within a single scanning thread still maintains the orderliness of the index data output, and the upper-level query plan can use this orderliness to directly perform operations such as sorting, aggregation, and merge joining without additional sorting. This embodiment does not limit this.

以下以一个完整实施例说明本申请：The present application is described below with a complete embodiment:

本申请将执行器中索引扫描和存储引擎中B-Tree索引访问的部分增加线程间的同步功能，让多个扫描线程配合起来并行访问B-Tree索引，从而实现对索引指向的数据页面并行访问的效果。参照图8所示的数据库并行扫描流程图，数据索引树B-Tree包括多个索引页面(叶子节点)，索引页面按照页面编号有序存储，并且通过相邻页面编号与相邻的索引页面相互指向，对于每个符合过滤条件的索引页面，都可以创建一个扫描线程。初始扫描线程获取同步锁，根据过滤条件访问初始索引页面，获取初始索引页面的相邻页面编号作为目标页面编号，并存储在共享参数中，同时释放同步锁，并对初始索引页面进行数据扫描。同步锁释放时，唤醒下一个扫描线程，扫描线程从共享参数中读取目标页面编号，并访问对应的目标索引页面获取下一个目标页面编号。This application adds thread synchronization functions to the index scanning in the executor and the B-Tree index access in the storage engine, so that multiple scanning threads can cooperate to access the B-Tree index in parallel, thereby achieving the effect of parallel access to the data page pointed to by the index. Referring to the database parallel scanning flow chart shown in Figure 8, the data index tree B-Tree includes multiple index pages (leaf nodes), and the index pages are stored in order according to the page numbers, and point to each other with adjacent index pages through adjacent page numbers. For each index page that meets the filtering conditions, a scanning thread can be created. The initial scanning thread obtains the synchronization lock, accesses the initial index page according to the filtering conditions, obtains the adjacent page number of the initial index page as the target page number, and stores it in the shared parameters, releases the synchronization lock at the same time, and performs data scanning on the initial index page. When the synchronization lock is released, the next scanning thread is awakened, the scanning thread reads the target page number from the shared parameters, and accesses the corresponding target index page to obtain the next target page number.

由此，不同的扫描线程被唤醒访问对应的索引页面，根据索引页面中索引元组的信息，查找数据页面读取对应的数据元组。并且在上一个扫描线程进行索引页面和数据页面扫描的过程中，由于同步锁被释放，其他被唤醒的扫描线程也在进行同样的扫描流程，同时在读取索引页面和数据页面。因此所有扫描线程对于索引和数据的扫描过程都是同时在执行的，而且扫描过程中，由于使用了索引大幅的提升了扫描效率。Therefore, different scanning threads are awakened to access the corresponding index pages, and according to the information of the index tuple in the index page, the data page is searched to read the corresponding data tuple. And when the previous scanning thread is scanning the index page and data page, because the synchronization lock is released, other awakened scanning threads are also performing the same scanning process, reading the index page and data page at the same time. Therefore, all scanning threads are performing the scanning process of index and data at the same time, and the scanning process greatly improves the scanning efficiency due to the use of index.

本申请实施例中的数据库并行扫描方法主要是对B-Tree索引类型在索引扫描阶段的并行，是多个扫描线程对单个索引树进行并行扫描，是页面级别的并行，而不需要对数据进行分区并为每个分区单独创建索引。相关技术只是对数据分区，创建多个索引，每个分区索引扫描本质上还是串行的，只是多个分区索引在被同时扫描。本申请实施例中的数据无需分区，对单个索引树进行多线程扫描，包括对单个分区索引也可以多线程并行扫描，是更底层的并行，并行的索引扫描性能可以提升40％以上，由此有效提升了数据库查询扫描的效率和性能。The database parallel scanning method in the embodiment of the present application is mainly the parallelization of the B-Tree index type in the index scanning stage, that is, multiple scanning threads perform parallel scanning on a single index tree, which is page-level parallelism, and there is no need to partition the data and create a separate index for each partition. The related technology only partitions the data and creates multiple indexes. Each partition index scan is essentially serial, but multiple partition indexes are scanned simultaneously. The data in the embodiment of the present application does not need to be partitioned, and a single index tree is scanned by multiple threads, including a single partition index. It can also be scanned by multiple threads in parallel, which is a lower-level parallelism. The parallel index scanning performance can be improved by more than 40%, thereby effectively improving the efficiency and performance of database query scanning.

本发明实施例还提供一种数据库并行扫描装置，可以实现上述数据库并行扫描方法，参照图9所示，在本申请一些实施例中，装置包括：The embodiment of the present invention further provides a database parallel scanning device, which can implement the above database parallel scanning method. As shown in FIG. 9 , in some embodiments of the present application, the device includes:

获取模块100，用于获取数据库的数据索引树，并创建多个扫描线程；数据索引树包括多个索引页面，索引页面按照页面编号有序存储，第一个扫描线程作为初始线程；The acquisition module 100 is used to acquire a data index tree of a database and create multiple scanning threads; the data index tree includes multiple index pages, and the index pages are stored in order according to page numbers, and the first scanning thread is used as the initial thread;

第一扫描模块200，用于利用初始线程获取同步锁，从索引页面中选取初始索引页面并确定预设数量，并对初始索引页面进行数据扫描，在进行数据扫描时，初始线程释放同步锁；The first scanning module 200 is used to obtain a synchronization lock using an initial thread, select an initial index page from the index pages and determine a preset number, and perform data scanning on the initial index page. When performing data scanning, the initial thread releases the synchronization lock;

选取模块300，用于基于初始索引页面和页面编号有序选取预设数量的索引页面作为目标索引页面；A selection module 300, configured to sequentially select a preset number of index pages as target index pages based on the initial index page and the page number;

第二扫描模块400，用于依次将目标索引页面的目标页面编号写入预设的共享参数中，在每次写入后，利用同步锁唤醒扫描线程，利用扫描线程对目标索引页面进行数据扫描。The second scanning module 400 is used to write the target page number of the target index page into the preset shared parameters in sequence, and after each writing, wake up the scanning thread by using the synchronization lock, and use the scanning thread to scan the data of the target index page.

本实施例的数据库并行扫描装置的具体实施方式与上述数据库并行扫描方法的具体实施方式基本一致，在此不再一一赘述。The specific implementation of the database parallel scanning device of this embodiment is basically the same as the specific implementation of the above-mentioned database parallel scanning method, and will not be described in detail here.

图10示出了本申请实施例提供的电子设备1000。电子设备1000包括：处理器1001、存储器1002及存储在存储器1002上并可在处理器1001上运行的计算机程序，计算机程序运行时用于执行上述的数据库并行扫描方法。Fig. 10 shows an electronic device 1000 provided in an embodiment of the present application. The electronic device 1000 includes: a processor 1001, a memory 1002, and a computer program stored in the memory 1002 and executable on the processor 1001, and the computer program is used to execute the above-mentioned database parallel scanning method when it is executed.

处理器1001和存储器1002可以通过总线或者其他方式连接。The processor 1001 and the memory 1002 may be connected via a bus or other means.

存储器1002作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序以及非暂态性计算机可执行程序，如本申请实施例描述的数据库并行扫描方法。处理器1001通过运行存储在存储器1002中的非暂态软件程序以及指令，从而实现上述的数据库并行扫描方法。The memory 1002 is a non-transient computer-readable storage medium that can be used to store non-transient software programs and non-transient computer executable programs, such as the database parallel scanning method described in the embodiment of the present application. The processor 1001 implements the above-mentioned database parallel scanning method by running the non-transient software program and instructions stored in the memory 1002.

存储器1002可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储执行上述的数据库并行扫描方法。此外，存储器1002可以包括高速随机存取存储器1002，还可以包括非暂态存储器1002，例如至少一个储存设备存储器件、闪存器件或其他非暂态固态存储器件。在一些实施方式中，存储器1002可选包括相对于处理器1001远程设置的存储器1002，这些远程存储器1002可以通过网络连接至该电子设备1000。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1002 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and application programs required for at least one function; the data storage area may store and execute the above-mentioned database parallel scanning method. In addition, the memory 1002 may include a high-speed random access memory 1002, and may also include a non-transient memory 1002, such as at least one storage device storage device, a flash memory device or other non-transient solid-state storage device. In some embodiments, the memory 1002 may optionally include a memory 1002 remotely arranged relative to the processor 1001, and these remote memories 1002 may be connected to the electronic device 1000 via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

实现上述的数据库并行扫描方法所需的非暂态软件程序以及指令存储在存储器1002中，当被一个或者多个处理器1001执行时，执行上述的数据库并行扫描方法，例如，执行图1中的方法步骤S101至步骤S104、图2中的方法步骤S201至步骤S203、图3中的方法步骤S301至步骤S302、图4中的方法步骤S401至步骤S403、图5中的方法步骤S501至步骤S502、图6中的方法步骤S601至步骤S602、图7中的方法步骤S701至步骤S702。The non-transient software programs and instructions required to implement the above-mentioned database parallel scanning method are stored in the memory 1002. When executed by one or more processors 1001, the above-mentioned database parallel scanning method is executed, for example, method steps S101 to S104 in Figure 1, method steps S201 to S203 in Figure 2, method steps S301 to S302 in Figure 3, method steps S401 to S403 in Figure 4, method steps S501 to S502 in Figure 5, method steps S601 to S602 in Figure 6, and method steps S701 to S702 in Figure 7 are executed.

本申请实施例还提供了一种存储介质，存储介质为计算机可读存储介质，该存储介质存储有计算机程序，该计算机程序被处理器执行时实现上述数据库并行扫描方法。存储器作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外，存储器可以包括高速随机存取存储器，还可以包括非暂态存储器，例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中，存储器可选包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The present application also provides a storage medium, which is a computer-readable storage medium, and the storage medium stores a computer program, which implements the above-mentioned database parallel scanning method when the computer program is executed by the processor. The memory, as a non-transient computer-readable storage medium, can be used to store non-transient software programs and non-transient computer executable programs. In addition, the memory may include a high-speed random access memory, and may also include a non-transient memory, such as at least one disk storage device, a flash memory device, or other non-transient solid-state storage devices. In some embodiments, the memory may optionally include a memory remotely arranged relative to the processor, and these remote memories may be connected to the processor via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

本申请实施例提供的数据库并行扫描方法、装置、电子设备及存储介质，首先获取数据库的数据索引树并创建多个扫描线程，数据索引树包括多个索引页面，索引页面按照页面编号有序存储，第一个扫描线程作为初始线程。利用初始线程获取同步锁，从索引页面中选取初始索引页面并确定预设数量，并对初始索引页面进行数据扫描，在进行数据扫描时，初始线程释放同步锁。然后基于初始索引页面的页面编号有序选取预设数量的索引页面作为目标索引页面，依次将目标索引页面的目标页面编号写入预设的共享参数中。在每次写入后，利用同步锁唤醒扫描线程，利用扫描线程对目标索引页面进行数据扫描。由此通过设置同步锁和共享参数，让不同的扫描线程有序读取单个数据索引树中的索引页面并进行数据扫描，结合了并行扫描和索引扫描，有效提升数据库扫描查询的效率。The database parallel scanning method, device, electronic device and storage medium provided by the embodiment of the present application first obtain the data index tree of the database and create multiple scanning threads. The data index tree includes multiple index pages, and the index pages are stored in order according to the page numbers. The first scanning thread is used as the initial thread. The initial thread is used to obtain the synchronization lock, select the initial index page from the index page and determine the preset number, and perform data scanning on the initial index page. When performing data scanning, the initial thread releases the synchronization lock. Then, based on the page number of the initial index page, a preset number of index pages are selected in order as the target index page, and the target page number of the target index page is written into the preset shared parameters in sequence. After each write, the scanning thread is awakened by the synchronization lock, and the scanning thread is used to perform data scanning on the target index page. Therefore, by setting the synchronization lock and shared parameters, different scanning threads can read the index pages in a single data index tree in order and perform data scanning, which combines parallel scanning and index scanning, and effectively improves the efficiency of database scanning query.

以上所描述的实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The above described embodiments are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the present embodiment.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、储存设备存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包括计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。It will be appreciated by those skilled in the art that all or some of the steps and systems in the disclosed method above may be implemented as software, firmware, hardware and appropriate combinations thereof. Some physical components or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor or a microprocessor, or may be implemented as hardware, or may be implemented as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or a non-transitory medium) and a communication medium (or a temporary medium). As known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, storage device storage or other magnetic storage devices, or any other medium that may be used to store desired information and may be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media generally include computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

还应了解，本申请实施例提供的各种实施方式可以任意进行组合，以实现不同的技术效果。以上是对本申请的较佳实施进行了具体说明，但本申请并不局限于上述实施方式，熟悉本领域的技术人员在不违背本申请精神的共享条件下还可作出种种等同的变形或替换。It should also be understood that the various implementations provided in the embodiments of the present application can be arbitrarily combined to achieve different technical effects. The above is a specific description of the preferred implementation of the present application, but the present application is not limited to the above implementations, and technicians familiar with the art can also make various equivalent modifications or substitutions under the shared conditions that do not violate the spirit of the present application.

Claims

1. A database parallel scanning method, characterized by comprising:

Obtain a data index tree of the database and create multiple scanning threads; the data index tree includes multiple index pages, the index pages are stored in order according to page numbers, and the first scanning thread is used as the initial thread;

Using the initial thread to acquire a synchronization lock, selecting an initial index page from the index pages and determining a preset number, and performing data scanning on the initial index page, when performing the data scanning, the initial thread releases the synchronization lock;

Based on the initial index page and the page number, sequentially select the preset number of index pages as target index pages;

Writing the target page number of the target index page into the preset shared parameters in sequence, and after each writing, waking up the scanning thread by using the synchronization lock, and performing data scanning on the target index page by using the scanning thread;

The database further includes at least one data page, the data page includes at least one data tuple, the index page includes at least one index tuple, the index tuple includes a data page number for indicating a first position of the data page and a data sequence number for indexing a second position of the data tuple; the data scanning process includes the following steps:

Start scanning from the first index tuple of the index page, and use the data page number to find the data page at the first position as the target data page;

Searching the target data page for the data tuple at the second position using the data sequence number;

Scanning all the index tuples of the index page in sequence to obtain the data tuples corresponding to the index tuples;

The selecting an initial index page from the index pages and determining a preset number includes:

Filtering the index page according to the initial filter value to obtain the initial index page;

Determining an end index page according to the filtering end value, and obtaining the preset number according to the end index page and the initial index page;

When the scanning thread accesses the end index page, a scanning end status is generated;

The scanning end status is broadcasted to each scanning thread, and after the scanning thread finishes scanning the corresponding index page, the scanning thread is terminated according to the scanning end status.

2. The database parallel scanning method according to claim 1, characterized in that the step of sequentially selecting the preset number of index pages as target index pages based on the initial index pages and the page numbers comprises:

Obtaining the initial page number of the initial index page;

The initial page number is increased sequentially, a preset number of target page numbers are selected, and the index pages of the target page numbers are used as the target index pages.

3. The database parallel scanning method according to claim 2, characterized in that the step of waking up the scanning thread by using the synchronization lock and scanning the target index page by using the scanning thread further comprises:

Select the scanning thread as the target thread in order, and use the target thread to acquire the synchronization lock;

Acquire the target page number from the shared parameter, and select the target index page based on the target page number;

The target thread is used to scan the data of the target index page, and the synchronization lock is released at the same time, and the shared parameter is pointed to the next target page number.

4. The database parallel scanning method according to any one of claims 1 to 3, characterized in that the data index tree includes a plurality of leaf nodes; the method further comprises:

For each of the index pages, obtain the page number of the adjacent index page to obtain the adjacent page number;

The index page and the adjacent page numbers are used as node elements, and the node elements are stored in the leaf nodes in sequence according to the order of the page numbers to obtain the data index tree.

5. A database parallel scanning device, characterized in that the database parallel scanning method according to any one of claims 1 to 4 is applied, comprising:

An acquisition module is used to acquire a data index tree of a database and create multiple scanning threads; the data index tree includes multiple index pages, the index pages are stored in order according to page numbers, and the first scanning thread is used as an initial thread;

A first scanning module, used to acquire a synchronization lock using the initial thread, select an initial index page from the index pages and determine a preset number, and perform data scanning on the initial index page. When performing the data scanning, the initial thread releases the synchronization lock;

A selection module, configured to sequentially select the preset number of index pages as target index pages based on the initial index page and the page number;

A second scanning module is used to write the target page number of the target index page into the preset shared parameter in sequence, and after each writing, wake up the scanning thread by using the synchronization lock, and use the scanning thread to perform data scanning on the target index page;

6. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the database parallel scanning method according to any one of claims 1 to 4 when executing the computer program.

7. A computer-readable storage medium, characterized in that the storage medium stores a program, and the program is executed by a processor to implement the database parallel scanning method according to any one of claims 1 to 4.