CN118885483A

CN118885483A - Data loading method, device, equipment and storage medium

Info

Publication number: CN118885483A
Application number: CN202410998278.6A
Authority: CN
Inventors: 胥建平
Original assignee: China Merchants Bank Co Ltd
Current assignee: China Merchants Bank Co Ltd
Priority date: 2024-07-24
Filing date: 2024-07-24
Publication date: 2024-11-01

Abstract

The application discloses a data loading method, a device, equipment and a storage medium, which relate to the technical field of data processing and disclose the data loading method, comprising the following steps: under the condition that a data loading request is acquired, reserving a data index structure and old data in the last time period; and acquiring new data to be loaded, and loading the new data to the position with the same main key as the old data in the data index structure. The application realizes the technical effect of improving the loading efficiency of large-scale data by reserving old data as a reference for loading new data.

Description

Data loading method, device, equipment and storage medium

技术领域Technical Field

本申请涉及数据处理技术领域，尤其涉及一种数据加载方法、装置、设备及存储介质。The present application relates to the field of data processing technology, and in particular to a data loading method, device, equipment and storage medium.

背景技术Background Art

Mysql是当前最主流的数据库产品之一，InnoDB引擎更是因其均衡的性能，对事务的良好支持，在Mysql 8.0版本后，被作为默认引擎使用。InnoDB引擎的数据结构采用的是B+树，这种数据结构在OLTP(On-Line Transaction Processing，联机事务处理)场景下，数据的写入与查询，都有不错的表现。MySQL is one of the most mainstream database products. InnoDB engine is used as the default engine after MySQL 8.0 due to its balanced performance and good support for transactions. The data structure of InnoDB engine adopts B+ tree, which has good performance in data writing and query in OLTP (On-Line Transaction Processing) scenario.

在处理大规模数据写入时，由于B+树的平衡性要求，需要不断进行节点翻转以维持树的结构平衡。同时，InnoDB存储引擎为了提高查询效率，会尝试将叶子节点中的相邻数据存储在磁盘的连续空间中。然而，这种平衡和数据连续性的要求导致早期写入的数据可能需要多次在磁盘上移动才能到达其最终位置。这种由于B+树结构调整引起的数据迁移，不可避免地增加了磁盘I/O(Input/Ouput，输入/输出)的开销，从而影响了数据加载的整体效率。When processing large-scale data writes, due to the balance requirements of the B+ tree, it is necessary to continuously flip nodes to maintain the structural balance of the tree. At the same time, in order to improve query efficiency, the InnoDB storage engine will try to store adjacent data in leaf nodes in continuous space on the disk. However, this balance and data continuity requirement means that the data written earlier may need to be moved on the disk multiple times to reach its final location. This data migration caused by the adjustment of the B+ tree structure inevitably increases the disk I/O (Input/Ouput) overhead, thereby affecting the overall efficiency of data loading.

发明内容Summary of the invention

本申请的主要目的在于提供一种数据加载方法、装置、设备及存储介质，旨在解决在大规模数据写入过程中由于数据迁移次数过多，导致数据加载整体效率低的技术问题。The main purpose of the present application is to provide a data loading method, device, equipment and storage medium, aiming to solve the technical problem of low overall data loading efficiency due to excessive data migration times during large-scale data writing.

为实现上述目的，本申请提出一种数据加载方法，所述的方法包括：To achieve the above purpose, the present application proposes a data loading method, which comprises:

在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；When a data loading request is received, the data index structure and old data of the previous time period are retained;

获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置。New data to be loaded is obtained, and the new data is loaded into a position in the data index structure having the same primary key as the old data.

在一实施例中，所述将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置的步骤包括：In one embodiment, the step of loading the new data into the data index structure at a location having the same primary key as the old data comprises:

确定所述新数据在所述数据索引结构中的数据路径；Determine a data path of the new data in the data index structure;

将所述数据路径中各个节点的状态位设置为与根节点的状态位一致，以及将所述数据路径中各个指针的联通性状态设置为联通；Setting the status bit of each node in the data path to be consistent with the status bit of the root node, and setting the connectivity status of each pointer in the data path to be connected;

通过所述数据路径，将所述新数据从所述根节点加载至所述相同主键的叶节点。The new data is loaded from the root node to the leaf node of the same primary key through the data path.

在一实施例中，所述保留上一时间周期的数据索引结构和旧数据的步骤之后还包括：In one embodiment, the step of retaining the data index structure and old data of the previous time period further includes:

将所述数据索引结构中根节点的状态位从第一状态修改为第二状态；Modify the state bit of the root node in the data index structure from a first state to a second state;

确定所述根节点的子节点，将所述根节点和所述子节点之间的指针的联通性状态设置为断开。The child nodes of the root node are determined, and the connectivity state of the pointers between the root node and the child nodes is set to disconnected.

在一实施例中，获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置的步骤之后还包括：In one embodiment, after the step of obtaining new data to be loaded and loading the new data into a position having the same primary key as the old data in the data index structure, the step further includes:

获取所述叶节点的脏页判定阈值；Obtaining a dirty page determination threshold of the leaf node;

确定所述叶节点中的旧数据条数和未刷盘数据条数，计算所述未刷盘数据条数和所述旧数据条数之间的比值；Determine the number of old data entries and the number of unflushed data entries in the leaf node, and calculate the ratio between the number of unflushed data entries and the number of old data entries;

在所述比值大于所述脏页判定阈值的情况下，将所述叶节点对应的存储数据页判定为脏页。When the ratio is greater than the dirty page determination threshold, the storage data page corresponding to the leaf node is determined to be a dirty page.

在一实施例中，所述获取所述叶节点的脏页判定阈值的步骤之前还包括：In one embodiment, before the step of obtaining the dirty page determination threshold of the leaf node, the step further includes:

确定缓冲池中的空白数据页，所述空白数据页中不存在旧数据；Determine a blank data page in the buffer pool, where no old data exists in the blank data page;

将所述空白数据页的脏页判定阈值设置为零。The dirty page determination threshold of the blank data page is set to zero.

在一实施例中，所述的方法还包括：In one embodiment, the method further comprises:

检测所述数据索引结构中的根节点和所述根节点的各个下级节点之间的状态一致性；Detecting the state consistency between the root node in the data index structure and each subordinate node of the root node;

在所述状态一致性为不一致的情况下，确定所述下级节点为无效节点；In the case where the state consistency is inconsistent, determining the subordinate node as an invalid node;

在所述状态一致性为一致的情况下，确定所述下级节点为有效节点。When the state consistency is consistent, the subordinate node is determined to be a valid node.

获取加载优化请求；Get loading optimization request;

根据所述加载优化请求，将所述上一时间周期的旧数据删除，重新构建数据索引结构进行数据加载。According to the loading optimization request, the old data of the previous time period is deleted, and the data index structure is rebuilt to load the data.

此外，为实现上述目的，本申请还提出一种数据加载装置，所述数据加载装置包括：In addition, to achieve the above-mentioned purpose, the present application also proposes a data loading device, the data loading device comprising:

保留模块，用于在接收到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；A retention module is used to retain the data index structure and old data of the previous time period when a data loading request is received;

加载模块，用于获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置。The loading module is used to obtain new data to be loaded, and load the new data into a position in the data index structure having the same primary key as the old data.

此外，为实现上述目的，本申请还提出一种数据加载设备，所述设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述计算机程序配置为实现如上文所述的数据加载方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also proposes a data loading device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of the data loading method described above.

此外，为实现上述目的，本申请还提出一种存储介质，所述存储介质为计算机可读存储介质，所述存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上文所述的数据加载方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also proposes a storage medium, which is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, the steps of the data loading method described above are implemented.

本申请提出的一个或多个技术方案，至少具有以下技术效果：在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据，数据索引结构能够提供查询数据的索引，得知旧数据的存储位置，保留后可作为本次数据加载过程的参考；获取新数据，若新数据和旧数据之间重复，则旧数据和新数据之间存在相同主键，主键是数据查询的索引；将所述新数据加载至所述数据索引结构中与所述旧数据相同主键的位置，根据相同主键的索引，可以将新数据加载至相同主键的位置，也即旧数据所在的位置，用新数据覆盖上一时间周期的旧数据。现有技术在加载数据前需要将上一时间周期的旧数据删除，重新建立数据索引结构，而本申请方案保留上一时间周期的数据索引结构，作为新数据加载的参考，利用不同加载过程中新旧数据存在重复的特点，可以直接将重复的新数据加载至与旧数据相同的位置，在新旧数据之间重复率较高的情况下，极大地降低数据迁移的时间损耗，提升数据加载效率。One or more technical solutions proposed in this application have at least the following technical effects: when a data loading request is obtained, the data index structure and old data of the previous time period are retained, and the data index structure can provide an index for querying data, and the storage location of the old data is known. After being retained, it can be used as a reference for this data loading process; new data is obtained, and if the new data and the old data are repeated, the old data and the new data have the same primary key, and the primary key is the index of data query; the new data is loaded into the position of the same primary key as the old data in the data index structure, and according to the index of the same primary key, the new data can be loaded into the position of the same primary key, that is, the position where the old data is located, and the old data of the previous time period is covered by the new data. The prior art needs to delete the old data of the previous time period before loading data and re-establish the data index structure, while the solution of this application retains the data index structure of the previous time period as a reference for loading new data, and utilizes the characteristics of duplication of new and old data in different loading processes, and can directly load the repeated new data to the same position as the old data, and in the case of a high duplication rate between new and old data, the time loss of data migration is greatly reduced, and the data loading efficiency is improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the present application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1为本申请数据加载方法第一实施例提供的流程示意图；FIG1 is a schematic diagram of a flow chart of a first embodiment of a data loading method of the present application;

图2为本申请实施例数据加载方法涉及的Mysql数据库首次加载数据的示意图；FIG2 is a schematic diagram of loading data for the first time into a Mysql database involved in the data loading method according to an embodiment of the present application;

图3为本申请实施例数据加载方法涉及的保留旧数据的B+树结构示意图；FIG3 is a schematic diagram of a B+ tree structure for retaining old data involved in the data loading method according to an embodiment of the present application;

图4为本申请实施例数据加载方法涉及的新数据加载场景示意图；FIG4 is a schematic diagram of a new data loading scenario involved in the data loading method according to an embodiment of the present application;

图5为相关技术中buffer pool的刷新机制示意图；FIG5 is a schematic diagram of a buffer pool refresh mechanism in the related art;

图6为相关技术中的原有方案与本实施例中脏页判定的对比示意图；FIG6 is a schematic diagram showing a comparison between the original solution in the related art and the dirty page determination in this embodiment;

图7为本申请实施例数据加载方法涉及的一种脏页判定场景示意图FIG. 7 is a schematic diagram of a dirty page determination scenario involved in the data loading method of an embodiment of the present application.

图8为本申请第三实施例提供的数据加载方法的简要流程示意图；FIG8 is a schematic diagram of a brief flow chart of a data loading method provided in the third embodiment of the present application;

图9为本申请实施例数据加载装置的模块结构示意图；FIG9 is a schematic diagram of the module structure of a data loading device according to an embodiment of the present application;

图10为本申请实施例中数据加载方法涉及的硬件运行环境的设备结构示意图。FIG. 10 is a schematic diagram of the device structure of the hardware operating environment involved in the data loading method in an embodiment of the present application.

本申请目的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The purpose, features and advantages of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解，此处所描述的具体实施例仅仅用以解释本申请的技术方案，并不用于限定本申请。为了更好的理解本申请的技术方案，下面将结合说明书附图以及具体的实施方式进行详细的说明。It should be understood that the specific embodiments described herein are only used to explain the technical solution of the present application and are not used to limit the present application. In order to better understand the technical solution of the present application, the following will be described in detail in conjunction with the accompanying drawings and specific implementation methods.

在数据仓库同步数据的场景下，每日都需要从数据仓库把数据加载到Mysql，在加载前，通常先删除昨日数据，再加载今日数据。如果数据规模较大，如银行这类机构的客户数据，数量超过1亿，每次加载都较为耗时。但存在客户数据每日变化不大的情况，尤其是数据的主键，昨天数据的主键和今日数据的主键重复率可能超过99％。而InnoDB引擎B+树的结构只跟主键相关，这就意味着昨日数据与今日数据在加载完成后生成的B+树结构几乎相同。In the scenario of synchronizing data in a data warehouse, data needs to be loaded from the data warehouse to MySQL every day. Before loading, yesterday's data is usually deleted first, and then today's data is loaded. If the data scale is large, such as customer data of institutions such as banks, the number of which exceeds 100 million, each loading is relatively time-consuming. However, there are cases where customer data does not change much every day, especially the primary key of the data. The duplication rate of the primary key of yesterday's data and the primary key of today's data may exceed 99%. The structure of the InnoDB engine B+ tree is only related to the primary key, which means that the B+ tree structure generated after the loading of yesterday's data and today's data is almost the same.

在相关技术中，新数据加载前需清除表中的旧数据，导致每次加载都需重新构建B+树。例如，主键为100的数据最初可能存放在磁盘位置PLACE 1，但随着更多数据的插入，B+树为了保持平衡会调整先写入的数据位置，可能导致该数据移动到PLACE 2，甚至最终移动到PLACE N。In the related art, the old data in the table needs to be cleared before loading new data, which means that the B+ tree needs to be rebuilt every time it is loaded. For example, data with a primary key of 100 may be initially stored in disk location PLACE 1, but as more data is inserted, the B+ tree will adjust the location of the data written first to maintain balance, which may cause the data to be moved to PLACE 2 or even to PLACE N.

本申请实施例的主要解决方案是：在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置。The main solution of the embodiment of the present application is: when a data loading request is obtained, the data index structure and old data of the previous time period are retained; the new data to be loaded is obtained, and the new data is loaded into the position in the data index structure with the same primary key as the old data.

相比之下，本申请实施例采取了一种新的策略，即保留先前数据作为新数据加载的参考。如果再次遇到主键为100的数据，且发现旧数据中的主键100位于PLACE N，则新数据将直接写入PLACE N并覆盖旧数据。这种方法减少了数据在加载过程中不必要的移动，从而提高了加载效率。In contrast, the embodiment of the present application adopts a new strategy, that is, retaining the previous data as a reference for loading new data. If data with a primary key of 100 is encountered again, and it is found that the primary key 100 in the old data is located at PLACE N, the new data will be directly written to PLACE N and overwrite the old data. This method reduces unnecessary movement of data during the loading process, thereby improving loading efficiency.

需要说明的是，本实施例的执行主体可以是一种具有数据处理、网络通信以及程序运行功能的计算服务设备，例如平板电脑、个人电脑、手机等，或者是一种能够实现上述功能的电子设备、服务器等。It should be noted that the execution subject of this embodiment can be a computing service device with data processing, network communication and program running functions, such as a tablet computer, personal computer, mobile phone, etc., or an electronic device, server, etc. that can realize the above functions.

基于此，本申请实施例提供了一种数据加载方法，参照图1，图1为本申请数据加载方法第一实施例的流程示意图。Based on this, an embodiment of the present application provides a data loading method, referring to FIG. 1 , which is a flow chart of the first embodiment of the data loading method of the present application.

本实施例中，所述数据加载方法包括步骤S10～S20：In this embodiment, the data loading method includes steps S10 to S20:

步骤S10，在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；Step S10, when a data loading request is obtained, retaining the data index structure and old data of the previous time period;

本实施例可以应用在将数据仓库同步数据的场景下，这种同步可以是周期性进行的，而相邻两个周期的数据之间可能存在一定的重复性。本实施例以使用Mysql数据库存储数据的情况为例进行阐述。在Mysql数据库中，可以采用B+树的数据结构处理数据的写入和查询，在处理大规模数据写入时，B+树的平衡性要求不断进行节点翻转以维持树的结构平衡，这种要求使得进行同步的过程中需要对加载的数据进行重新布局。也即相关技术采用将前次同步的数据全量删除，再加载新的数据的方式。This embodiment can be applied in the scenario of synchronizing data in a data warehouse. This synchronization can be performed periodically, and there may be a certain degree of repetition between the data of two adjacent cycles. This embodiment is explained using the case of using a Mysql database to store data as an example. In the Mysql database, the B+ tree data structure can be used to process data writing and querying. When processing large-scale data writing, the balance of the B+ tree requires continuous node flipping to maintain the structural balance of the tree. This requirement requires the loaded data to be rearranged during the synchronization process. That is, the relevant technology adopts the method of deleting all the data synchronized in the previous time and then loading new data.

本实施例在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据。数据加载请求是指用于指示进行数据加载的请求，获取到数据加载请求，表示触发进行数据加载的条件。可以理解的是，数据加载的过程可以发生在指定的时间段，数据加载请求可以是在当前时间满足指定时间段要求的情况下自动触发的。上一时间周期是指上一次进行数据加载的时间周期。数据索引结构是指索引数据存储位置，用于进行数据写入和查询的数据结构，例如上文中提及的B+树。旧数据是指上一时间周期加载至数据库存储的数据。数据索引结构和旧数据之间存在关联关系，数据索引结构可以为旧数据的查询提供索引。保留数据索引结构和旧数据，也即并未将存储的旧数据进行物理删除，数据索引结构和旧数据可以作为本次数据加载过程的参考。可以理解的是，可以对保留的数据索引结构进行修改，以维持其对外查询结果和旧数据删除具有相同的效果。In this embodiment, when a data loading request is obtained, the data index structure and old data of the previous time period are retained. The data loading request refers to a request for indicating data loading, and obtaining the data loading request indicates a condition for triggering data loading. It is understandable that the data loading process can occur in a specified time period, and the data loading request can be automatically triggered when the current time meets the requirements of the specified time period. The previous time period refers to the time period of the last data loading. The data index structure refers to a data structure for indexing data storage locations and used for data writing and querying, such as the B+ tree mentioned above. The old data refers to the data loaded into the database storage in the previous time period. There is an association relationship between the data index structure and the old data, and the data index structure can provide an index for the query of the old data. The data index structure and the old data are retained, that is, the stored old data is not physically deleted, and the data index structure and the old data can be used as a reference for this data loading process. It is understandable that the retained data index structure can be modified to maintain the same effect as the external query result and the old data deletion.

步骤S20，获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置。Step S20, obtaining new data to be loaded, and loading the new data into a position in the data index structure having the same primary key as the old data.

新数据是指本次加载过程需要加载至Mysql数据库中的数据。可以理解的是，新数据和旧数据之间存在一定的重复性，例如对于银行这类机构的客户数据，数据规模非常庞大，但是每天进行数据同步的过程中，相邻两天之间的数据重复率可以达到99％。在这种数据重复率高的情况下，就可以将旧数据作为新数据加载的参考。主键是指每条数据存储的索引。数据索引结构中存储主键，通过主键来表示数据的存储位置。由于新数据和旧数据之间的重复，重复的数据存储在磁盘空间中的相同位置，故重复的新数据和旧数据之间存在相同主键。New data refers to the data that needs to be loaded into the MySQL database during this loading process. It is understandable that there is a certain degree of duplication between new data and old data. For example, for customer data of institutions such as banks, the data scale is very large, but in the process of daily data synchronization, the data duplication rate between two consecutive days can reach 99%. In the case of such a high data duplication rate, the old data can be used as a reference for loading new data. The primary key refers to the index of each data storage. The primary key is stored in the data index structure, and the storage location of the data is indicated by the primary key. Due to the duplication between new data and old data, the duplicate data is stored in the same location in the disk space, so the duplicate new data and old data have the same primary key.

旧数据和新数据之间存在相同主键，可以直接将新数据加载至旧数据的存储位置，覆盖旧数据。在新数据的加载过程中，若新数据的主键未存储在数据索引结构中，则可以将新数据的主键插入数据索引结构中，以满足连续性要求，按照主键在数据索引结构中的位置进行加载，这个过程中可能涉及到旧数据位置的移动，但新数据和旧数据之间的重复率高，相比于全量删除旧数据后再进行加载的方式，旧数据的移动次数大大减少，加载时间缩短。If the old data and the new data have the same primary key, the new data can be directly loaded into the storage location of the old data to overwrite the old data. During the loading process of the new data, if the primary key of the new data is not stored in the data index structure, the primary key of the new data can be inserted into the data index structure to meet the continuity requirements. The data is loaded according to the position of the primary key in the data index structure. This process may involve the movement of the old data position, but the duplication rate between the new data and the old data is high. Compared with the method of deleting all the old data and then loading, the number of times the old data is moved is greatly reduced, and the loading time is shortened.

作为一种示例，该方法还包括：As an example, the method further includes:

步骤A10，获取加载优化请求；Step A10, obtaining a loading optimization request;

加载优化请求是指进行数据加载且对旧数据冗余进行优化的请求。可以理解的是，本实施例保留上一时间周期的数据索引结构和旧数据，相比于原本的对旧数据进行全量删除的方式，存在一部分与新数据之间无相同主键的旧数据得以保留，这部分旧数据虽然在本次数据加载过程中未被新数据覆盖，仍保存在数据库中，可以在后续的每次数据加载过程中作为新数据加载的参考。而随着时间的累积，未被利用的旧数据冗余情况可能越发严重，可以通过加载优化请求进行冗余优化。可以理解的是，冗余优化的过程可以与数据加载的过程一同进行。A loading optimization request refers to a request to load data and optimize the redundancy of old data. It is understandable that this embodiment retains the data index structure and old data of the previous time period. Compared with the original method of deleting all old data, a part of the old data that does not have the same primary key as the new data is retained. Although this part of the old data is not overwritten by the new data during this data loading process, it is still stored in the database and can be used as a reference for loading new data in each subsequent data loading process. As time accumulates, the redundancy of unused old data may become more and more serious, and redundancy optimization can be performed through loading optimization requests. It is understandable that the redundancy optimization process can be carried out together with the data loading process.

步骤A20，根据所述加载优化请求，将所述上一时间周期的旧数据删除，重新构建数据索引结构进行数据加载。Step A20: according to the loading optimization request, the old data of the previous time period is deleted, and the data index structure is rebuilt to load the data.

本实施例采用全量删除的方式进行旧数据冗余的优化。在获取到加载优化请求的情况下，通过牺牲一定的数据加载效率，将旧数据删除，不再使用旧数据作为新数据加载的参考，重新构建新数据的数据索引结构，将新数据加载至数据库。在完成一次旧数据冗余优化之后，所有前次数据加载过程中留存的旧数据被清除。This embodiment uses a full deletion method to optimize the redundancy of old data. When a loading optimization request is obtained, the old data is deleted by sacrificing a certain amount of data loading efficiency, and the old data is no longer used as a reference for loading new data. The data index structure of the new data is rebuilt and the new data is loaded into the database. After completing an old data redundancy optimization, all old data retained in the previous data loading process is cleared.

通过加载优化请求，可以在数据加载的过程中进行旧数据冗余优化，避免数据库中存储过多冗余数据导致的存储空间浪费。By loading optimization requests, you can optimize the redundancy of old data during data loading to avoid wasting storage space caused by storing too much redundant data in the database.

本实施例提供了一种数据加载方法，在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据，数据索引结构能够提供查询数据的索引，得知旧数据的存储位置，保留后可作为本次数据加载过程的参考；获取新数据，若新数据和旧数据之间重复，则旧数据和新数据之间存在相同主键，主键是数据查询的索引；将所述新数据加载至所述数据索引结构中与所述旧数据相同主键的位置，根据相同主键的索引，可以将新数据加载至相同主键的位置，也即旧数据所在的位置，用新数据覆盖上一时间周期的旧数据。现有技术在加载数据前需要将上一时间周期的旧数据删除，重新建立数据索引结构，而本申请方案保留上一时间周期的数据索引结构，作为新数据加载的参考，利用不同加载过程中新旧数据存在重复的特点，可以直接将重复的新数据加载至与旧数据相同的位置，在新旧数据之间重复率较高的情况下，极大地降低数据迁移的时间损耗，提升数据加载效率。This embodiment provides a data loading method, in the case of obtaining a data loading request, retaining the data index structure and old data of the previous time period, the data index structure can provide an index for querying data, and know the storage location of the old data, which can be used as a reference for this data loading process after being retained; obtaining new data, if the new data and the old data are repeated, there is the same primary key between the old data and the new data, and the primary key is the index of data query; loading the new data into the position of the same primary key as the old data in the data index structure, according to the index of the same primary key, the new data can be loaded into the position of the same primary key, that is, the position where the old data is located, and the old data of the previous time period is covered by the new data. The prior art needs to delete the old data of the previous time period before loading data and re-establish the data index structure, while the present application scheme retains the data index structure of the previous time period as a reference for loading new data, and utilizes the characteristics of duplication of new and old data in different loading processes, so that the repeated new data can be directly loaded into the same position as the old data, and in the case of a high duplication rate between new and old data, the time loss of data migration is greatly reduced, and the data loading efficiency is improved.

基于本申请第一实施例，在本申请第二实施例中，与上述第一实施例相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，步骤S10之后还包括：Based on the first embodiment of the present application, in the second embodiment of the present application, the same or similar contents as those in the first embodiment can be referred to the above description, and will not be described in detail later. On this basis, after step S10, the following is also included:

步骤B10，将所述数据索引结构中根节点的状态位从第一状态修改为第二状态；Step B10, modifying the state bit of the root node in the data index structure from the first state to the second state;

本实施例以B+树形式的数据索引结构进行阐述，那么数据索引结构就包括根节点、根节点的相连节点和叶节点。状态位是指为数据索引结构中的各类节点设置的表示不同状态的信息。在一种可实施的方式中，每个节点定义一个1bit的状态位，包括0/1两种状态。第一状态可以为0或者1，第二状态与第一状态不同，当第一状态为0时，第二状态为1；当第一状态为1时，第二状态为0。根节点的状态位修改之后，所有与根节点相连的节点和根节点之间的状态位不一致，此时对外无法查询到数据，与物理删除旧数据的对外查询结果一致。This embodiment is described in the form of a data index structure in the form of a B+ tree, so the data index structure includes a root node, nodes connected to the root node, and leaf nodes. The status bit refers to information representing different states set for various types of nodes in the data index structure. In an implementable manner, each node defines a 1-bit status bit, including two states 0/1. The first state can be 0 or 1, and the second state is different from the first state. When the first state is 0, the second state is 1; when the first state is 1, the second state is 0. After the status bit of the root node is modified, the status bits of all nodes connected to the root node are inconsistent with the root node. At this time, the data cannot be queried externally, which is consistent with the external query result of physically deleting the old data.

步骤B20，确定所述根节点的子节点，将所述根节点和所述子节点之间的指针的联通性状态设置为断开。Step B20, determining the child nodes of the root node, and setting the connectivity state of the pointer between the root node and the child nodes to disconnected.

根节点的子节点是指与根节点相连的下级节点，子节点还可以进一步与叶节点相连。不同类型的节点之间通过指针相连。联通性状态是指用于控制应用数据查询时指针的联通性的状态信息。联通性状态也可以包括0/1两种状态，其中0表示断开，1表示联通。在根节点的状态为修改之后，将根节点和子节点之间的联通性状态设置为断开，则根节点和子节点之间的路径无法查询到数据。The child node of the root node refers to the lower-level node connected to the root node, and the child node can be further connected to the leaf node. Different types of nodes are connected through pointers. Connectivity status refers to the status information used to control the connectivity of the pointer when querying application data. Connectivity status can also include two states 0/1, where 0 means disconnected and 1 means connected. After the state of the root node is modified, the connectivity state between the root node and the child node is set to disconnected, and the path between the root node and the child node cannot query data.

在旧数据保留之后通过根节点的节点状态翻转和路径置零操作，可以维持旧数据对应用查询的结果与旧数据物理删除的查询结果一致。After the old data is retained, the node state of the root node is flipped and the path is zeroed, so that the query results of the old data for the application can be kept consistent with the query results of the physical deletion of the old data.

图2为Mysql数据库首次加载数据的示意图，假设有主键为1、3、4、7、11、15的数据首次加载到Mysql中，节点状态位与指针状态位都置为1，生成如图2所示的B+树。此时根节点状态位1，所有的叶节点状态位也为1，与根节点一致，因此所有节点有效，所有指针状态也为1，均为连通状态。因此所有数据对应用可查。Figure 2 is a schematic diagram of the first loading of data into the MySQL database. Assuming that data with primary keys 1, 3, 4, 7, 11, and 15 are loaded into MySQL for the first time, the node status bit and the pointer status bit are set to 1, and a B+ tree as shown in Figure 2 is generated. At this time, the root node status bit is 1, and the status bits of all leaf nodes are also 1, which is consistent with the root node. Therefore, all nodes are valid, and all pointer states are also 1, which are all connected. Therefore, all data can be checked by the application.

图3为保留旧数据的B+树结构示意图，如图3所示，在进行数据加载的过程中，在原本需要进行删除数据的时机，将根节点的状态位翻转，同时把翻转节点指向子节点的路径上的指针置为0。以根节点为基准，相连的子节点与根节点状态位不一致，并且指针的状态为无效，因此与根节点相连的子节点均为无效节点，无法查到任何数据，等同于Mysql的truncate效果。但并未物理删除数据，而是将数据保留，用于后续数据的加载的参考。FIG3 is a schematic diagram of a B+ tree structure that retains old data. As shown in FIG3, during the data loading process, when the data is originally required to be deleted, the state bit of the root node is flipped, and the pointer on the path from the flipped node to the child node is set to 0. Taking the root node as the reference, the state bits of the connected child nodes are inconsistent with the root node, and the state of the pointer is invalid. Therefore, the child nodes connected to the root node are all invalid nodes, and no data can be found, which is equivalent to the truncate effect of Mysql. However, the data is not physically deleted, but retained for reference in subsequent data loading.

作为一种示例，将新数据加载至数据索引结构中相同主键的位置的步骤包括：As an example, the steps for loading new data into the location of the same primary key in the data index structure include:

步骤S31，确定所述新数据在所述数据索引结构中的数据路径；Step S31, determining the data path of the new data in the data index structure;

数据路径是指新数据在数据索引结构中经过的路径。相对于数据路径，非数据路径是指没有新数据经过的路径。新数据的主键所在的叶节点为数据路径中的最终节点，根节点指向此叶节点的路径为数据路径。The data path refers to the path that new data passes through in the data index structure. Compared with the data path, the non-data path refers to the path that no new data passes through. The leaf node where the primary key of the new data is located is the final node in the data path, and the path from the root node to this leaf node is the data path.

步骤S32，将所述数据路径中各个节点的状态位设置为与根节点的状态位一致，以及将所述数据路径中各个指针的联通性状态设置为联通；Step S32, setting the status bit of each node in the data path to be consistent with the status bit of the root node, and setting the connectivity status of each pointer in the data path to be connected;

在保留旧数据的过程中进行一次根节点的节点翻转，使得对应用查询为无数据，此时将数据路径中子节点和叶节点的状态位一致，在叶节点和子节点翻转的过程中，将数据路径中指针的联通性状态设置为联通，提供新数据对外查询的基础。In the process of retaining old data, a root node flip is performed so that there is no data for application queries. At this time, the status bits of the child nodes and leaf nodes in the data path are consistent. In the process of flipping the leaf nodes and child nodes, the connectivity status of the pointers in the data path is set to connected, providing a basis for external queries of new data.

步骤S33，通过所述数据路径，将所述新数据从所述根节点加载至所述相同主键的叶节点。Step S33: Load the new data from the root node to the leaf node of the same primary key through the data path.

图4为新数据加载的场景示意图，如图4所示，以插入主键数据3、4、7为例，由于旧数据已经存在3、4、7，根据B+树数据插入的寻址规则，新数据最终位置依然是落在原位置。插入数据3时，将其叶节点的父节点翻转与根节点一致，同时数据1由于不与新数据重复，处于非数据路径，则指针置为0；插入数据4、7，将父节点翻转与根节点一致，且都是数据路径，指针状态置1。Figure 4 is a schematic diagram of the scenario of loading new data. As shown in Figure 4, taking the insertion of primary key data 3, 4, and 7 as an example, since the old data 3, 4, and 7 already exist, according to the addressing rules of B+ tree data insertion, the final position of the new data still falls on the original position. When inserting data 3, the parent node of its leaf node is flipped to be consistent with the root node. At the same time, since data 1 is not repeated with the new data and is in a non-data path, the pointer is set to 0; when inserting data 4 and 7, the parent node is flipped to be consistent with the root node, and both are data paths, and the pointer state is set to 1.

在此场景下，只有数据3、4、7同时满足节点与指针的状态规则，对外可查。如后续有相同主键的数据继续写入时，也能够直接寻址在最终数据所在位置，减少了数据写入过程中，B+树逐渐成树的翻转过程，从而节省了总数据加载时间。在相关技术中，由于每次写入新数据前，会清空旧数据，因此加载完成后数据表对外查询只有新数据。而本实施例不会物理清除旧数据，就可以通过上述设置状态位的方式来确保新数据写入完成后，只有新数据可以被外部查到，且保留旧数据B+树结构的同时，对外表现也是查询无数据。In this scenario, only data 3, 4, and 7 satisfy the node and pointer status rules at the same time and can be checked externally. If data with the same primary key is subsequently written, it can also be directly addressed at the location of the final data, reducing the flipping process of the B+ tree gradually forming a tree during the data writing process, thereby saving the total data loading time. In the related art, because the old data is cleared before each new data is written, only the new data is available for external query of the data table after the loading is completed. However, this embodiment does not physically clear the old data, and can ensure that only the new data can be checked externally after the new data is written by setting the status bit as described above, and while retaining the B+ tree structure of the old data, the external display is also that there is no data in the query.

步骤C10，检测所述数据索引结构中的根节点和所述根节点的各个下级节点之间的状态一致性；Step C10, detecting the state consistency between the root node in the data index structure and each subordinate node of the root node;

本实施例还设置有节点状态规则，也即表示状态一致性的规则。状态一致性是指根节点和其下级节点之间状态是否一致的性质。根节点的下级节点包括直接与其相连的子节点，以及存储主键数据的叶节点。状态一致性能够决定叶节点中数据的对外可查性。检测各个类型节点的状态位，比较状态位是否一致，可得到状态一致性的结果。This embodiment is also provided with node state rules, that is, rules indicating state consistency. State consistency refers to the property of whether the states of the root node and its subordinate nodes are consistent. The subordinate nodes of the root node include the child nodes directly connected to it and the leaf nodes that store the primary key data. State consistency can determine the external accessibility of the data in the leaf nodes. By detecting the state bits of each type of node and comparing whether the state bits are consistent, the state consistency result can be obtained.

步骤C20，在所述状态一致性为不一致的情况下，确定所述下级节点为无效节点；Step C20, when the state consistency is inconsistent, determining that the lower-level node is an invalid node;

无效节点是指对外查询无数据的节点。对于叶节点来说，无效节点可以直接表现为对外查询无数据，位于此叶节点的数据对外无法查询到。在一种可实施的方式中，节点之间的指针也可以设置状态规则。数据路径中的指针状态为有效，而非数据路径中的指针状态为无效。An invalid node refers to a node with no data when queried externally. For a leaf node, an invalid node can be directly represented as having no data when queried externally, and the data located at this leaf node cannot be queried externally. In an implementable manner, the pointers between nodes can also set state rules. The pointer state in the data path is valid, while the pointer state in the non-data path is invalid.

步骤C30，在所述状态一致性为一致的情况下，确定所述下级节点为有效节点。Step C30, when the state consistency is consistent, determining that the lower-level node is a valid node.

有效节点是指对外可查询的节点。数据路径中根节点的子节点和叶节点为有效节点，以此提供叶节点中主键数据的对外可查性。A valid node is a node that can be queried externally. The child nodes and leaf nodes of the root node in the data path are valid nodes, which provides external queryability of the primary key data in the leaf node.

在本实施例中，通过节点状态位的设置，在保留旧数据的情况下进行节点状态的翻转，能够维持与旧数据物理删除情况相同的对外查询结果，保证查询结果和数据加载过程之间的逻辑准确性。In this embodiment, by setting the node status bit, the node status is flipped while retaining the old data, which can maintain the same external query results as when the old data is physically deleted, ensuring the logical accuracy between the query results and the data loading process.

基于本申请上述实施例，在本申请第三实施例中，与上述实施例相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，步骤S30之后还包括：Based on the above embodiments of the present application, in the third embodiment of the present application, the same or similar contents as those in the above embodiments can be referred to the above introduction, and will not be described in detail later. On this basis, after step S30, the following is also included:

步骤D10，获取所述叶节点的脏页判定阈值；Step D10, obtaining a dirty page determination threshold of the leaf node;

脏页是指等待刷入磁盘的数据页。buffer pool(缓冲池)是为了减少数据刷盘频率而在内存中建立的缓存区，当buffer pool的数据满足一定条件时，就写入磁盘。图5为相关技术中buffer pool的刷新机制示意图，buffer pool保存数据的最小单位为页，每页对应B+树的一个叶节点，阴影部分表示脏页。一页大小为16KB，一般可以保存几十到几百条数据不等。当一个空白页写入了至少一条数据后，就会被标记为脏页，脏页的地址会被记录到flush链表中，当flush链表的脏页与全部页比例超过了系统设置的阈值后，便会触发刷盘机制，此时buffer pool中的数据会被刷入到磁盘中。Dirty pages refer to data pages waiting to be flushed to disk. The buffer pool is a cache area established in memory to reduce the frequency of data flushing. When the data in the buffer pool meets certain conditions, it is written to disk. Figure 5 is a schematic diagram of the refresh mechanism of the buffer pool in the related art. The smallest unit for storing data in the buffer pool is a page. Each page corresponds to a leaf node of the B+ tree, and the shaded part represents a dirty page. The size of a page is 16KB, which can generally store dozens to hundreds of data. When a blank page is written with at least one piece of data, it will be marked as a dirty page, and the address of the dirty page will be recorded in the flush list. When the ratio of dirty pages to all pages in the flush list exceeds the threshold set by the system, the flush mechanism will be triggered, and the data in the buffer pool will be flushed to disk.

对于本实施例所限定的场景而言，可以用旧数据来预测该叶节点最终会被写入多少数据。因此可以设置脏页判定阈值，作为数据页被判定为脏页的条件。对于脏页判定阈值T，有0≤T≤1。脏页判定阈值的具体数值可以根据数据处理人员的经验来设定，本实施例对此不进行具体限制。在一种可实施的方式中，将所有数据页的脏页判定阈值设置为同一个值。For the scenario defined in this embodiment, old data can be used to predict how much data will eventually be written to the leaf node. Therefore, a dirty page determination threshold can be set as a condition for a data page to be determined as a dirty page. For the dirty page determination threshold T, 0≤T≤1. The specific value of the dirty page determination threshold can be set based on the experience of the data processing personnel, and this embodiment does not impose specific restrictions on this. In an implementable manner, the dirty page determination threshold of all data pages is set to the same value.

步骤D20，确定所述叶节点中的旧数据条数和未刷盘数据条数，计算所述未刷盘数据条数和所述旧数据条数之间的比值；Step D20, determining the number of old data entries and the number of unflushed data entries in the leaf node, and calculating the ratio between the number of unflushed data entries and the number of old data entries;

未刷盘数据是指未刷入磁盘中的数据。当每个叶节点有数据写入时，可以计算当前叶节点对应的存储数据页中未刷盘数据条数和旧数据条数之间的比值。旧数据是保留在数据库中的，此比值反映了未刷盘数据占总的旧数据的比重。Unflushed data refers to data that has not been flushed to disk. When each leaf node has data written, the ratio between the number of unflushed data and the number of old data in the storage data page corresponding to the current leaf node can be calculated. The old data is retained in the database, and this ratio reflects the proportion of unflushed data to the total old data.

步骤D30，在所述比值大于所述脏页判定阈值的情况下，将所述叶节点对应的存储数据页判定为脏页。Step D30: When the ratio is greater than the dirty page determination threshold, the storage data page corresponding to the leaf node is determined as a dirty page.

当该比值大于脏页判定阈值T时，可以判定该存储数据页为脏页，等待刷入磁盘。在比值小于或等于脏页判定阈值的情况下，存储数据页中还具有容纳新数据的空间，继续等待新数据加载存储进入。When the ratio is greater than the dirty page determination threshold T, the stored data page can be determined to be a dirty page and wait to be flushed to disk. When the ratio is less than or equal to the dirty page determination threshold, there is still space in the stored data page to accommodate new data, and it continues to wait for new data to be loaded and stored.

图6为相关技术中的原有方案与本实施例中脏页判定的对比示意图，如图6所示，对于原有方案而言，当一个空白页写入了至少一条数据后，便被标记为脏页，标记到flush链表中，等待刷入磁盘；本实施例采用保留旧数据的方式，在至少一条新数据写入，且写入的新数据条数和旧数据条数之间的比值大于脏页判定阈值T的情况下，将该存储数据页判定为脏页，标记到flush链表中，等待刷入磁盘。FIG6 is a schematic diagram comparing the dirty page determination between the original scheme in the related art and the present embodiment. As shown in FIG6 , for the original scheme, when a blank page is written with at least one piece of data, it is marked as a dirty page, marked in the flush list, and waits to be flushed to the disk; the present embodiment adopts a method of retaining old data, and when at least one piece of new data is written and the ratio between the number of new data written and the number of old data is greater than the dirty page determination threshold T, the stored data page is judged as a dirty page, marked in the flush list, and waits to be flushed to the disk.

图7为一种脏页判定场景示意图，如图7所示，设置脏页判定阈值T为0.5，在数据加载前数据页中存储有5条旧数据old data1、old data2、old data3、old data4、old data5，在一次数据加载过程中写入2条新数据new data3和new data5，则比值为0.4，不判定为脏页，不加入flush链表中，接着再进行数据加载，共写入3条数据new data2、new data3和newdata5，比值为0.6，则判定为脏页，加入flush链表中。通过设置阈值T的大小，可以有减小内存数据刷新到磁盘的频率，相同的数据写入数据库，因为减少了刷盘次数，因此加载时间可以进一步缩短。FIG7 is a schematic diagram of a dirty page determination scenario. As shown in FIG7 , the dirty page determination threshold T is set to 0.5. Before data loading, 5 old data old data1, old data2, old data3, old data4, and old data5 are stored in the data page. Two new data new data3 and new data5 are written during a data loading process. The ratio is 0.4, and it is not determined to be a dirty page and is not added to the flush list. Then, data loading is performed again, and a total of 3 data new data2, new data3, and newdata5 are written. The ratio is 0.6, and it is determined to be a dirty page and added to the flush list. By setting the size of the threshold T, the frequency of refreshing memory data to the disk can be reduced. The same data is written to the database, and the loading time can be further shortened because the number of disk flushes is reduced.

作为一种示例，步骤D10之前还包括：As an example, before step D10, the following steps may also be included:

步骤D01，确定缓冲池中的空白数据页，所述空白数据页中不存在旧数据；Step D01, determining a blank data page in the buffer pool, wherein the blank data page does not contain old data;

空白数据页是指不存在旧数据的数据页。检测各数据页的存储情况，可以确定缓冲池中的各个空白数据页。A blank data page refers to a data page without old data. By detecting the storage status of each data page, each blank data page in the buffer pool can be determined.

步骤D02，将所述空白数据页的脏页判定阈值设置为零。Step D02: setting the dirty page determination threshold of the blank data page to zero.

可以理解的是，在设置所有数据页的脏页判定阈值的过程中，可以单独将空白数据页的脏页判定阈值设置为零，其他存在旧数据的数据页进行额外设置。如此设置，则本实施例可以兼容原有的方案。It is understandable that in the process of setting the dirty page determination thresholds for all data pages, the dirty page determination thresholds for blank data pages can be set to zero, and other data pages with old data can be additionally set. In this way, the present embodiment can be compatible with the original solution.

在本实施例中，由于保留了旧数据，因此可以用旧数据来预测每个节点最终的数据有哪些，并且以此来优化buffer pool的刷盘判断条件，进一步降低刷盘的频率，以此提升数据加载的效率。In this embodiment, since the old data is retained, the old data can be used to predict the final data of each node, and thereby optimize the buffer pool's disk flushing judgment conditions, further reducing the frequency of disk flushing, thereby improving data loading efficiency.

示例性地，为了助于理解本实施例结合上述第一实施例后所得到的数据加载方法的实现流程，请参照图8，图8提供了一种数据加载方法的简要流程示意图，具体地：在数据加载开始之后，首先新数据进入根节点，然后判断是否能根据主键索引到下一级节点，能够根据主键索引到下一级节点，表示存在相同主键，可以将节点状态翻转为与根节点一致，数据路径的指针置1，非数据路径的指针置0，再判断是否为存储位置的数据节点，是数据节点则将本节点覆盖位新数据，非数据节点则进入判断是否能根据主键索引到下一级节点的步骤。若一开始确定无法根据主键索引到下一级节点，则按照原有方案，按照连续性要求将新数据插入B+树结构中。Exemplarily, in order to help understand the implementation process of the data loading method obtained by combining this embodiment with the above-mentioned first embodiment, please refer to Figure 8. Figure 8 provides a brief flow chart of the data loading method. Specifically: after the data loading starts, the new data first enters the root node, and then it is determined whether it can be indexed to the next level node according to the primary key. If it can be indexed to the next level node according to the primary key, it means that there is the same primary key, and the node state can be flipped to be consistent with the root node, the pointer of the data path is set to 1, and the pointer of the non-data path is set to 0. Then it is determined whether it is a data node at the storage location. If it is a data node, the node is overwritten with new data. If it is not a data node, it enters the step of determining whether it can be indexed to the next level node according to the primary key. If it is determined at the beginning that the next level node cannot be indexed according to the primary key, the new data is inserted into the B+ tree structure according to the original plan and the continuity requirement.

需要说明的是，上述示例仅用于理解本申请，并不构成对本申请数据加载方法的限定，基于此技术构思进行更多形式的简单变换，均在本申请的保护范围内。It should be noted that the above examples are only used to understand the present application and do not constitute a limitation on the data loading method of the present application. More simple transformations based on this technical concept are all within the scope of protection of the present application.

本申请还提供一种数据加载装置，请参照图9，所述数据加载装置包括：The present application also provides a data loading device, please refer to FIG. 9 , the data loading device comprises:

保留模块10，用于在接收到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；The retention module 10 is used to retain the data index structure and old data of the previous time period when receiving a data loading request;

加载模块20，用于获取待加载的新数据，将所述新数据加载至所述数据索引结构中与所述旧数据具有相同主键的位置。The loading module 20 is used to obtain new data to be loaded, and load the new data into a position in the data index structure having the same primary key as the old data.

在一种可实施的方式中，加载模块20还用于：In an practicable manner, the loading module 20 is further used for:

在一种可实施的方式中，数据加载装置还包括设置模块，用于：In an practicable manner, the data loading device further includes a setting module, which is used to:

在一种可实施的方式中，数据加载装置还包括计算模块，用于：In an practicable manner, the data loading device further includes a computing module, which is used to:

在一种可实施的方式中，计算模块还用于：In one practicable manner, the computing module is further configured to:

在一种可实施的方式中，数据加载装置还包括检测模块，用于：In an practicable manner, the data loading device further includes a detection module, which is used to:

在一种可实施的方式中，数据加载装置还包括优化模块，用于：In an practicable manner, the data loading device further includes an optimization module, which is used to:

获取加载优化请求；Get loading optimization request;

本申请提供的数据加载装置，采用上述实施例中的数据加载方法，能够解决大规模数据写入过程中由于数据迁移次数过多，导致数据加载整体效率低的技术问题。与现有技术相比，本申请提供的数据加载装置的有益效果与上述实施例提供的数据加载方法的有益效果相同，且所述数据加载装置中的其他技术特征与上述实施例方法公开的特征相同，在此不做赘述。The data loading device provided by the present application adopts the data loading method in the above embodiment, which can solve the technical problem of low overall efficiency of data loading due to excessive data migration times during large-scale data writing. Compared with the prior art, the beneficial effects of the data loading device provided by the present application are the same as the beneficial effects of the data loading method provided by the above embodiment, and other technical features in the data loading device are the same as the features disclosed in the above embodiment method, which will not be repeated here.

本申请提供一种数据加载设备，数据加载设备包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述第一实施例中的数据加载方法。The present application provides a data loading device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the data loading method in the above-mentioned first embodiment.

下面参考图10，其示出了适于用来实现本申请实施例的数据加载设备的结构示意图。本申请实施例中的数据加载设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(Personal Digital Assistant：个人数字助理)、PAD(PortableApplication Description：平板电脑)、PMP(Portable Media Player：便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图10示出的数据加载设备仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring to FIG. 10 below, it shows a schematic diagram of the structure of a data loading device suitable for implementing an embodiment of the present application. The data loading device in the embodiment of the present application may include, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Descriptions), PMPs (Portable Media Players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The data loading device shown in FIG. 10 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present application.

如图10所示，数据加载设备可以包括处理装置1001(例如中央处理器、图形处理器等)，其可以根据存储在只读存储器(ROM：Read Only Memory)1002中的程序或者从存储装置1003加载到随机访问存储器(RAM：Random Access Memory)1004中的程序而执行各种适当的动作和处理。在RAM1004中，还存储有数据加载设备操作所需的各种程序和数据。处理装置1001、ROM1002以及RAM1004通过总线1005彼此相连。输入/输出(I/O)接口1006也连接至总线。通常，以下系统可以连接至I/O接口1006：包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置1007；包括例如液晶显示器(LCD：LiquidCrystal Display)、扬声器、振动器等的输出装置1008；包括例如磁带、硬盘等的存储装置1003；以及通信装置1009。通信装置1009可以允许数据加载设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的数据加载设备，但是应理解的是，并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。As shown in FIG10 , the data loading device may include a processing device 1001 (e.g., a central processing unit, a graphics processor, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM: Read Only Memory) 1002 or a program loaded from a storage device 1003 to a random access memory (RAM: Random Access Memory) 1004. In RAM1004, various programs and data required for the operation of the data loading device are also stored. The processing device 1001, ROM1002, and RAM1004 are connected to each other through a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. Generally, the following systems can be connected to the I/O interface 1006: an input device 1007 including, for example, a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; an output device 1008 including, for example, a liquid crystal display (LCD: Liquid Crystal Display), a speaker, a vibrator, etc.; a storage device 1003 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1009. The communication device 1009 can allow the data loading device to communicate with other devices wirelessly or by wire to exchange data. Although the data loading device with various systems is shown in the figure, it should be understood that it is not required to implement or have all the systems shown. More or fewer systems can be implemented or have alternatively.

特别地，根据本申请公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本申请公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置从网络上被下载和安装，或者从存储装置1003被安装，或者从ROM1002被安装。在该计算机程序被处理装置1001执行时，执行本申请公开实施例的方法中限定的上述功能。In particular, according to the embodiments disclosed in the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments disclosed in the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network through a communication device, or installed from a storage device 1003, or installed from a ROM 1002. When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the method of the embodiment disclosed in the present application are executed.

本申请提供的数据加载设备，采用上述实施例中的数据加载方法，能解决大规模数据写入过程中由于数据迁移次数过多，导致数据加载整体效率低的技术问题。与现有技术相比，本申请提供的数据加载设备的有益效果与上述实施例提供的数据加载方法的有益效果相同，且该数据加载设备中的其他技术特征与上一实施例方法公开的特征相同，在此不做赘述。The data loading device provided by the present application adopts the data loading method in the above embodiment, which can solve the technical problem of low overall efficiency of data loading due to excessive data migration times during large-scale data writing. Compared with the prior art, the beneficial effects of the data loading device provided by the present application are the same as the beneficial effects of the data loading method provided by the above embodiment, and other technical features in the data loading device are the same as the features disclosed in the method of the previous embodiment, which will not be repeated here.

应当理解，本申请公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中，具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。It should be understood that the various parts disclosed in this application can be implemented by hardware, software, firmware or a combination thereof. In the description of the above embodiments, specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

本申请提供一种计算机可读存储介质，具有存储在其上的计算机可读程序指令(即计算机程序)，计算机可读程序指令用于执行上述实施例中的数据加载方法。The present application provides a computer-readable storage medium having computer-readable program instructions (ie, computer programs) stored thereon, wherein the computer-readable program instructions are used to execute the data loading method in the above-mentioned embodiment.

本申请提供的计算机可读存储介质例如可以是U盘，但不限于电、磁、光、电磁、红外线、或半导体的系统、系统或器件，或者任意以上的组合。计算机可读存储介质的更具体地例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM：Random Access Memory)、只读存储器(ROM：Read Only Memory)、可擦式可编程只读存储器(EPROM：Erasable Programmable Read Only Memory或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM：CD-Read Only Memory)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、系统或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(Radio Frequency：射频)等等，或者上述的任意合适的组合。The computer-readable storage medium provided in the present application may be, for example, a USB flash drive, but is not limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, systems or devices, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, system or device. The program code contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the above.

上述计算机可读存储介质可以是数据加载设备中所包含的；也可以是单独存在，而未装配入数据加载设备中。The computer-readable storage medium may be included in the data loading device; or may exist independently without being assembled into the data loading device.

上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被数据加载设备执行时，使得数据加载设备：在获取到数据加载请求的情况下，保留上一时间周期的数据索引结构和旧数据；获取待加载的新数据，判断所述旧数据和所述新数据之间是否存在相同主键；在所述旧数据和所述新数据之间存在相同主键的情况下，将所述新数据加载至所述数据索引结构中所述相同主键的位置。The above-mentioned computer-readable storage medium carries one or more programs. When the above-mentioned one or more programs are executed by the data loading device, the data loading device: when obtaining a data loading request, retains the data index structure and old data of the previous time period; obtains the new data to be loaded, and determines whether the old data and the new data have the same primary key; when the old data and the new data have the same primary key, loads the new data to the position of the same primary key in the data index structure.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN：Local Area Network)或广域网(WAN：Wide Area Network)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present application. In this regard, each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，模块的名称在某种情况下并不构成对该单元本身的限定。The modules involved in the embodiments described in this application may be implemented by software or hardware, wherein the name of the module does not constitute a limitation on the unit itself in some cases.

本申请提供的可读存储介质为计算机可读存储介质，所述计算机可读存储介质存储有用于执行上述数据加载方法的计算机可读程序指令(即计算机程序)，能够解决大规模数据写入过程中由于数据迁移次数过多，导致数据加载整体效率低的技术问题。与现有技术相比，本申请提供的计算机可读存储介质的有益效果与上述实施例提供的数据加载方法的有益效果相同，在此不做赘述。The readable storage medium provided in the present application is a computer-readable storage medium, which stores computer-readable program instructions (i.e., computer programs) for executing the above-mentioned data loading method, and can solve the technical problem of low overall efficiency of data loading due to excessive data migration times during large-scale data writing. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in the present application are the same as the beneficial effects of the data loading method provided in the above-mentioned embodiment, and will not be repeated here.

本申请还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述的数据加载方法的步骤。The present application also provides a computer program product, including a computer program, which implements the steps of the above-mentioned data loading method when executed by a processor.

本申请提供的计算机程序产品能够解决大规模数据写入过程中由于数据迁移次数过多，导致数据加载整体效率低的技术问题。与现有技术相比，本申请提供的计算机程序产品的有益效果与上述实施例提供的数据加载方法的有益效果相同，在此不做赘述。The computer program product provided by the present application can solve the technical problem of low overall efficiency of data loading due to excessive data migration times during large-scale data writing. Compared with the prior art, the beneficial effects of the computer program product provided by the present application are the same as the beneficial effects of the data loading method provided by the above embodiment, and will not be described in detail here.

以上所述仅为本申请的部分实施例，并非因此限制本申请的专利范围，凡是在本申请的技术构思下，利用本申请说明书及附图内容所作的等效结构变换，或直接/间接运用在其他相关的技术领域均包括在本申请的专利保护范围内。The above descriptions are only some embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structural changes made using the contents of the present application specification and drawings under the technical concept of the present application, or direct/indirect applications in other related technical fields are included in the patent protection scope of the present application.

Claims

1. A data loading method, characterized in that the method comprises:

When a data loading request is received, the data index structure and old data of the previous time period are retained;

New data to be loaded is obtained, and the new data is loaded into a position in the data index structure having the same primary key as the old data.

2. The method according to claim 1, wherein the step of loading the new data into the location in the data index structure having the same primary key as the old data comprises:

Determine a data path of the new data in the data index structure;

Setting the status bit of each node in the data path to be consistent with the status bit of the root node, and setting the connectivity status of each pointer in the data path to be connected;

The new data is loaded from the root node to the leaf node of the same primary key through the data path.

3. The method according to claim 2, characterized in that the step of retaining the data index structure and old data of the previous time period further comprises:

Modify the state bit of the root node in the data index structure from a first state to a second state;

The child nodes of the root node are determined, and the connectivity state of the pointers between the root node and the child nodes is set to disconnected.

4. The method according to claim 2, characterized in that after the step of obtaining the new data to be loaded and loading the new data into the position having the same primary key as the old data in the data index structure, it further comprises:

Obtaining a dirty page determination threshold of the leaf node;

Determine the number of old data entries and the number of unflushed data entries in the leaf node, and calculate the ratio between the number of unflushed data entries and the number of old data entries;

When the ratio is greater than the dirty page determination threshold, the storage data page corresponding to the leaf node is determined to be a dirty page.

5. The method according to claim 4, characterized in that before the step of obtaining the dirty page determination threshold of the leaf node, it also includes:

Determine a blank data page in the buffer pool, where no old data exists in the blank data page;

The dirty page determination threshold of the blank data page is set to zero.

6. The method according to any one of claims 1 to 5, characterized in that the method further comprises:

Detecting the state consistency between the root node in the data index structure and each subordinate node of the root node;

In the case where the state consistency is inconsistent, determining the subordinate node as an invalid node;

When the state consistency is consistent, the subordinate node is determined to be a valid node.

7. The method according to any one of claims 1 to 5, characterized in that the method further comprises:

Get loading optimization request;

According to the loading optimization request, the old data of the previous time period is deleted, and the data index structure is rebuilt to load the data.

8. A data loading device, characterized in that the data loading device comprises:

A retention module is used to retain the data index structure and old data of the previous time period when a data loading request is received;

The loading module is used to obtain new data to be loaded, and load the new data into a position in the data index structure having the same primary key as the old data.

9. A data loading device, characterized in that the device comprises: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of the data loading method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that the storage medium is a computer-readable storage medium, a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the data loading method according to any one of claims 1 to 7 are implemented.