[go: up one dir, main page]

CN115221131A - High-speed data reading and writing method and device for time sequence database - Google Patents

High-speed data reading and writing method and device for time sequence database Download PDF

Info

Publication number
CN115221131A
CN115221131A CN202210632591.9A CN202210632591A CN115221131A CN 115221131 A CN115221131 A CN 115221131A CN 202210632591 A CN202210632591 A CN 202210632591A CN 115221131 A CN115221131 A CN 115221131A
Authority
CN
China
Prior art keywords
index
data
file
time
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210632591.9A
Other languages
Chinese (zh)
Inventor
张炜刚
贾德星
梁波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunxi Technology Co ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN202210632591.9A priority Critical patent/CN115221131A/en
Publication of CN115221131A publication Critical patent/CN115221131A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of distributed databases, in particular to a high-speed data reading and writing method of a time sequence database, which comprises the following steps: s1, a memory management cache block; s2, managing a pre-written log WAL; s3, managing metadata; and S4, indexing information. Compared with the prior art, the method and the system reduce the hardware configuration requirement of the side-end system of the time sequence database, reduce the hardware cost and improve the product competitiveness. The memory management structure is simplified, and the use of the time sequence data to the CPU and the disk of the system resource is more balanced. The time sequence database realizes the additional writing of the index file and has high processing efficiency. When the tables partitioned by the historical time are inquired and analyzed, the table indexes can be quickly positioned without loading data of irrelevant tables.

Description

一种时序数据库高速数据读写方法及装置High-speed data reading and writing method and device for time series database

技术领域technical field

本发明涉及分布式数据库技术领域,具体提供一种时序数据库高速数据读写方法及装置。The invention relates to the technical field of distributed databases, and in particular provides a high-speed data reading and writing method and device for a time series database.

背景技术Background technique

为了充分发挥智能微网对分布式电源、存储装置以及相关负荷的管理能力,有效提高微电网安全、稳定和经济运行水平,智能微网能量管理系统(Microgrid EnergyManagement System,MEMS)成为必不可少的有效手段。与传统电力系统的能量管理系统不同,MEMS 需要充分结合微电网自身的特点,通过对智能微网内部数据的实时监控以及外部信息的及时交互,制定合理的经济运行方案,对智能微网设备进行有效的管理和控制。In order to give full play to the ability of the smart microgrid to manage distributed power sources, storage devices and related loads, and effectively improve the security, stability and economic operation level of the microgrid, the smart microgrid energy management system (Microgrid Energy Management System, MEMS) has become indispensable. effective means. Different from the energy management system of the traditional power system, MEMS needs to fully combine the characteristics of the microgrid itself, through the real-time monitoring of the internal data of the smart microgrid and the timely interaction of external information, to formulate a reasonable economic operation plan, and carry out the intelligent microgrid equipment. Effective management and control.

MEMS需综合考虑智能微网的各种运行模式、设备运行条件以及外部相关信息,对分布式电源、储能装置、负荷进行协调优化控制和管理。MEMS在边端提供以能源路由器为核心的数据采集、分析、系统监视告警、能量控制调度与云边协作等功能。MEMS needs to comprehensively consider various operating modes of the smart microgrid, equipment operating conditions and external related information to coordinate optimal control and management of distributed power sources, energy storage devices, and loads. MEMS provides functions such as data acquisition, analysis, system monitoring and alarming, energy control scheduling, and cloud-side collaboration with energy routers as the core at the edge.

智能微网产生的数据量比传统的关系型数据应用场景(如银行、交易)要多数千倍甚至数万倍,并且是实时采集、高频度、高密度,数据模型随时可能变化。用传统数据库对这些数据进行存储、查询、分析很难满足要求。时序数据库是用于存储和管理时间序列数据的专业化数据库,擅长应对写多读写少、无事务、冷热分离、高并发写、海量数据持续写入、基于时间区间聚合分析等场景。The amount of data generated by smart microgrids is thousands or even tens of thousands of times more than traditional relational data application scenarios (such as banks and transactions), and is collected in real time, with high frequency and high density, and the data model may change at any time. It is difficult to store, query and analyze these data with traditional databases. Time series database is a specialized database for storing and managing time series data. It is good at dealing with scenarios such as more writes and fewer reads, no transactions, separation of hot and cold, high concurrent writes, continuous writing of massive data, and aggregation analysis based on time intervals.

目前主流的时序数据库(如TDEngine,已开源)的数据写入流程基本如图5所示。为了保证数据的安全性,先写预写日志(Write Ahead Log,WAL)。再将一组表的数据写入一个大的缓存(Cache)。当缓存中的数据达到一定规模或缓存创建超过一定时间,系统会拉起落盘线程将数据落盘,清空缓存,建立索引,删除WAL。为了提高压缩率和加速计算分析,数据文件一般是以列存格式存储。每个列存块包含多行记录,为了防止列存块中的行数过少,会有合并线程将小的列存块合并成满足行数要求的大的列存块。The data writing process of the current mainstream time series database (such as TDEngine, which has been open sourced) is basically shown in Figure 5. In order to ensure data security, write ahead log (Write Ahead Log, WAL) first. Then write the data of a group of tables into a large cache (Cache). When the data in the cache reaches a certain size or the cache creation exceeds a certain time, the system will pull up the disk drop thread to drop the data to the disk, clear the cache, build an index, and delete the WAL. In order to improve the compression ratio and speed up the calculation and analysis, the data files are generally stored in the column storage format. Each column storage block contains multiple rows of records. In order to prevent the number of rows in the column storage block from being too small, a merge thread will merge the small column storage block into a large column storage block that meets the requirements of the number of rows.

在能源路由器中的低配置板卡硬件上,这种架构主要问题有:On low-profile board hardware in energy routers, the main problems with this architecture are:

每次要攒满一个大的缓存(默认32MB)才会落盘。数据写缓存时磁盘是空闲的。而落盘时由于缓存比较大,数据的排序、压缩、列存格式构建等,也容易引起一段CPU计算尖峰。这不利于整个硬件资源的均衡使用。Every time a large cache (default 32MB) is filled, it will be dropped. The disk is free when data is written to the cache. When the disk is placed, due to the large cache, data sorting, compression, column storage format construction, etc., it is also easy to cause a CPU computing spike. This is not conducive to the balanced use of the entire hardware resources.

由于缓存中包含多个表的数据,缓存满时,某些表的写入记录数未达到一个列存块的最小行数要求,这时落盘会产生很多小的列存块。小的列存块不利于数据压缩、分析和管理。在TDEngine里是通过将未满足行数的列存块写入一个单独的last文件中,每次落盘时将缓存与last文件合并,再将满足行数的block写入另一个data数据文件。这样做虽然减少了列存块的数量,但是会多使用一个合并线程,而且每次合并都会导致last文件的重写,造成系统性能上的消耗。Since the cache contains data from multiple tables, when the cache is full, the number of written records in some tables does not meet the minimum number of rows required for a column storage block. At this time, many small column storage blocks will be generated when the disk is dropped. Small column storage blocks are not conducive to data compression, analysis and management. In TDEngine, the column storage blocks that do not meet the number of rows are written into a separate last file, the cache is merged with the last file each time the disk is placed, and the blocks that meet the number of rows are written into another data data file. Although this reduces the number of column storage blocks, one more merge thread will be used, and each merge will cause the last file to be rewritten, resulting in system performance consumption.

虽然data数据文件是追加写的,但每次落盘需要重新构建其索引,索引文件随着落盘每次都要覆盖重写。Although the data data file is written additionally, its index needs to be rebuilt every time the disk is placed, and the index file needs to be overwritten and rewritten every time the disk is placed.

读历史时间分区的数据时,如果只读一个表的索引,要把所有表的索引加载到内存。产生多余的内存占用和性能浪费。When reading data in historical time partitions, if you only read the index of one table, you need to load the indexes of all tables into memory. Generate redundant memory footprint and performance waste.

发明内容SUMMARY OF THE INVENTION

本发明是针对上述现有技术的不足,提供一种实用性强时序数据库高速数据读写方法。The invention provides a high-speed data reading and writing method for a time series database with strong practicability, aiming at the shortcomings of the above-mentioned prior art.

本发明进一步的技术任务是提供一种设计合理,安全适用的时序数据库高速数据读写装置。A further technical task of the present invention is to provide a high-speed data reading and writing device for time series database with reasonable design, safety and application.

本发明解决其技术问题所采用的技术方案是:The technical scheme adopted by the present invention to solve its technical problems is:

一种时序数据库高速数据读写方法,具有如下步骤:A high-speed data reading and writing method for a time series database, comprising the following steps:

S1、内存管理缓存块;S1, memory management cache block;

S2、预写日志WAL管理;S2, write-ahead log WAL management;

S3、元数据管理;S3, metadata management;

S4、索引信息。S4. Index information.

进一步的,在步骤S1中,在系统启动时预先分配或者在创建表的时候进行分配两个缓存块,一个缓存块叫Mutable Buffer,另外一个缓存块叫Immutable Buffer,所述Mutable Buffer用于接收写入新的数据,以行存形式直接将追加复制到Mutable Buffer中。Further, in step S1, two buffer blocks are allocated in advance when the system is started or when the table is created, one buffer block is called Mutable Buffer, and the other buffer block is called Immutable Buffer, and the Mutable Buffer is used to receive writes. Enter new data, and copy the append directly to the Mutable Buffer in the form of row storage.

进一步的,若Mutable Buffer存储满,将会转换为所述Immutable Buffer,并被放到等待队列表中,等待进行后续的预计算、压缩、落盘和更新,其中,各个表的预计算和压缩互不干扰。Further, if the Mutable Buffer is full, it will be converted into the Immutable Buffer and placed in the waiting queue table, waiting for subsequent pre-computation, compression, disk placement and update, among which, the pre-computation and compression of each table Do not interfere with each other.

进一步的,每个所述Mutable Buffer大小是根据表的属性个数及数据类型来确定的,每个所述Mutable Buffer的大小设置有两个原则,第一个原则为第一个原则为每个Mutable Buffer初始化大小要超过一定阈值;第二个原则为每个Mutable Buffer能够容纳一定数量的数据。Further, the size of each Mutable Buffer is determined according to the number of attributes and data types of the table, and there are two principles for setting the size of each Mutable Buffer, the first principle is that the first principle is that each The initial size of the Mutable Buffer must exceed a certain threshold; the second principle is that each Mutable Buffer can hold a certain amount of data.

进一步的,在步骤S2中,记录每个表的Mutable Buffer的生成时间,如果超过阈值还未达到落盘的最小条数,则需强制落盘;Further, in step S2, the generation time of the Mutable Buffer of each table is recorded, and if the minimum number of bars to be dropped has not been reached beyond the threshold, it is necessary to forcibly drop the disc;

在系统重启从WAL恢复数据时,索引里记录了每个表落盘的列存块信息,从列存块信息里读出该表最后落盘的时间点,恢复WAL时该时间点之后写入的数据才需要恢复到内存,之前的数据已经落盘可以丢弃。When the system restarts to restore data from the WAL, the index records the column storage block information of each table, and reads the time point when the table was last dropped from the column storage block information, and writes after the time point when restoring the WAL. Only the data needs to be restored to the memory, and the previous data has been placed on the disk and can be discarded.

进一步的,在步骤S3中,元数据会维护每个表的唯一标识UUID,在存储引擎内部的元数据中是按顺序存储的,存储的顺序是固定不变的,每个表对应的顺序定义为id,数据类型为整数,所述id对于每个表是固定的,在后面的索引文件存储中,表的索引存储顺序与id的顺序一致,便于在查询中快速定位正在查询的表的索引位置。Further, in step S3, the metadata will maintain the unique identifier UUID of each table, which is stored in order in the metadata inside the storage engine, the storage order is fixed, and the order corresponding to each table is defined. It is id, the data type is an integer, and the id is fixed for each table. In the subsequent index file storage, the index storage order of the table is consistent with the order of the id, which is convenient for quickly locating the index of the table being queried in the query. Location.

进一步的,在步骤S4中,在索引信息中,包括内存中的和文件中的索引信息,其中,内存中的索引信息为每个表在内存中维护当前文件中的所有索引信息。Further, in step S4, the index information includes index information in the memory and in the file, wherein the index information in the memory maintains all index information in the current file for each table in the memory.

进一步的,文件中的索引信息包括当前时间分区的索引格式和历史时间分区的索引文件格式,在当前时间分区索引格式中采用文件末尾追加的方式进行更新,系统在正常运行过程中不需要读取当前索引文件,只有在系统出现故障等需要重启时,要把整个索引文件读取到内存中;Further, the index information in the file includes the index format of the current time partition and the index file format of the historical time partition. The index format of the current time partition is updated by appending at the end of the file, and the system does not need to read during normal operation. For the current index file, the entire index file should be read into the memory only when the system fails and needs to be restarted;

每条索引记录如下id、start、end和data offset,id是表在存储内部的唯一标识,按顺序编号;start为数据块中的最小时间戳; end为数据块中的最大时间戳;data offset是数据块在当前原始数据文件中的偏移。Each index record is as follows id, start, end and data offset. id is the unique identifier of the table in the storage, numbered in sequence; start is the minimum timestamp in the data block; end is the maximum timestamp in the data block; data offset is the offset of the data block in the current raw data file.

进一步的,在历史时间分区索引文件格式中,当前文件不再写入数据,转为历史分区文件时,要将索引文件清空,然后将内存中的索引信息按照表依次写入索引文件中,同一个表的索引信息是顺序存放的。Further, in the historical time partition index file format, the current file is no longer written with data. When it is converted to a historical partition file, the index file should be cleared, and then the index information in the memory will be written into the index file in sequence according to the table. The index information of a table is stored sequentially.

一种时序数据库高速数据读写装置,包括:至少一个存储器和至少一个处理器;A high-speed data reading and writing device for time series database, comprising: at least one memory and at least one processor;

所述至少一个存储器,用于存储机器可读程序;the at least one memory for storing a machine-readable program;

所述至少一个处理器,用于调用所述机器可读程序,执行一种时序数据库高速数据读写方法。The at least one processor is configured to invoke the machine-readable program to execute a high-speed data reading and writing method for a time series database.

本发明的一种时序数据库高速数据读写方法及装置和现有技术相比,具有以下突出的有益效果:Compared with the prior art, the high-speed data reading and writing method and device of a time series database of the present invention have the following outstanding beneficial effects:

本发明降低时序数据库对边端系统的硬件配置要求,降低硬件成本,提升产品竞争力。简化内存管理结构,使时序数据对系统资源的 CPU和磁盘使用更均衡。时序数据库实现追加写索引文件,处理高效。查询和分析历史时间分区的表时,可快速定位表索引,无需加载无关表的数据。The invention reduces the hardware configuration requirements of the time series database on the side-end system, reduces the hardware cost, and improves the product competitiveness. Simplify the memory management structure to make time series data use more balanced CPU and disk usage of system resources. The time series database implements additional write index files for efficient processing. When querying and analyzing historical time-partitioned tables, you can quickly locate table indexes without loading data from unrelated tables.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

附图1是一种时序数据库高速数据读写方法中表缓存块结构示意图;1 is a schematic diagram of a table cache block structure in a high-speed data reading and writing method of a time series database;

附图2是一种时序数据库高速数据读写方法中元数据格式示意图;Accompanying drawing 2 is a kind of schematic diagram of metadata format in time series database high-speed data reading and writing method;

附图3是一种时序数据库高速数据读写方法中当前分区索引文件结构示意图;3 is a schematic diagram of the structure of the current partition index file in a time series database high-speed data reading and writing method;

附图4是一种时序数据库高速数据读写方法中历史分区索引文件结构示意附图;Accompanying drawing 4 is a kind of schematic drawing of historical partition index file structure in a time series database high-speed data reading and writing method;

图5是一种时序数据库高速数据读写方法中现有技术下时序数据库数据写入流程示意图。FIG. 5 is a schematic diagram of a flow chart of data writing in a time series database in the prior art in a high-speed data reading and writing method for a time series database.

具体实施方式Detailed ways

为了使本技术领域的人员更好的理解本发明的方案,下面结合具体的实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例都属于本发明保护的范围。In order to make those skilled in the art better understand the solution of the present invention, the present invention will be further described in detail below with reference to specific embodiments. Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

下面给出一个最佳实施例:A preferred embodiment is given below:

如图1-4所示,本实施例中的一种时序数据库高速数据读写方法,具有如下步骤:As shown in Figure 1-4, a high-speed data reading and writing method for a time series database in this embodiment includes the following steps:

S1、内存管理缓存块;S1, memory management cache block;

每个表维护各自两个缓存块,在系统启动时预先分配,或者在创建表的时候分配。其中一个缓存块叫Mutable Buffer(可变缓存块),另外一个叫Immutable Buffer(不可变缓存块)。Mutable Buffer用于接收写入新的数据,以行存形式直接将追加复制到MutableBuffer 中。Each table maintains its own two cache blocks, pre-allocated at system startup, or allocated when the table is created. One of the cache blocks is called Mutable Buffer (mutable cache block), and the other is called Immutable Buffer (immutable cache block). Mutable Buffer is used to receive and write new data, and copy the appending directly to MutableBuffer in the form of row storage.

如果Mutable Buffer满了,将会转换为Immutable Buffer,并被放到等待队列表中,等待进行后续的预计算、压缩、落盘、更新索引等操作,其中,各个表的预计算和压缩互不干扰,因此可以根据资源情况采用多线程进行并发处理。If the Mutable Buffer is full, it will be converted to Immutable Buffer and placed in the waiting queue table, waiting for subsequent operations such as pre-computation, compression, disk placement, and index update. Therefore, multi-threading can be used for concurrent processing according to resource conditions.

每个Mutable Buffer大小是根据表的属性个数及其数据类型来确定的,每个Mutable Buffer的大小参数可配置,如64K,设置有两个原则:The size of each Mutable Buffer is determined according to the number of attributes of the table and its data type. The size parameter of each Mutable Buffer can be configured, such as 64K. There are two principles for setting:

(1)为防止频繁申请内存,每个Mutable Buffer初始化大小要超过一定阈值;(1) In order to prevent frequent memory application, the initialization size of each Mutable Buffer must exceed a certain threshold;

(2)每个块至少要能容纳一定数量的数据。(2) Each block must be able to hold at least a certain amount of data.

由于是每个表的Buffer达到一定行数才能落盘,所以不会像 TDEngine那样产生小的列存块,也就不需要再通过合并线程进行规整。Since the buffer of each table reaches a certain number of rows before it can be placed on the disk, it will not generate small column storage blocks like TDEngine, and it does not need to be adjusted by the merge thread.

S2、预写日志WAL管理;S2, write-ahead log WAL management;

由于内存中按照表的Buffer进行管理,而WAL是根据写入时间进行追加记录。假设A表的Buffer满了可以落盘,但是B表的写的频率较慢还没有达到落盘的条件,这时WAL里存有A和B表的数据,不能直接删除WAL。因为B表的数据还在内存里,如果删除会导致异常重启时B表在内存的数据丢失。所以何时清理WAL是需要考虑的问题。Since the memory is managed according to the Buffer of the table, WAL appends records according to the writing time. Suppose the buffer of table A is full and can be dropped, but the writing frequency of table B is slow and has not reached the condition of dropping. At this time, the data of table A and table B are stored in WAL, and WAL cannot be deleted directly. Because the data of table B is still in the memory, if it is deleted, the data in the memory of table B will be lost during abnormal restart. So when to clean up the WAL is something to consider.

解决方法是记录每个表的Mutable Buffer的生成时间,如果超过阈值(默认3小时)还未达到落盘的最小条数,则需强制落盘。这样虽然会产生小的列存块,但是由于阈值间隔比较长,这种列存块不会很多。另外也限制了WAL的最大保留时间不会超过阈值设置时间,防止WAL无限增大。The solution is to record the generation time of the Mutable Buffer of each table. If the threshold (default 3 hours) is exceeded and the minimum number of records has not been reached, the disk must be forced to be placed. In this way, although a small column storage block will be generated, because the threshold interval is relatively long, there will not be many such column storage blocks. In addition, the maximum retention time of the WAL is also limited to not exceed the threshold setting time to prevent the WAL from increasing indefinitely.

在系统重启从WAL恢复数据时,索引里记录了每个表落盘的列存块信息,可以从列存块信息里读出该表最后落盘的时间点,恢复WAL 时该时间点之后写入的数据才需要恢复到内存,之前的数据已经落盘可以丢弃。When the system restarts to restore data from the WAL, the index records the column storage block information of each table, and can read the time point when the table was last placed on the disk from the column storage block information, and write after the time point when restoring the WAL. Only the entered data needs to be restored to the memory, and the previous data has been placed on the disk and can be discarded.

S3、元数据管理;S3, metadata management;

元数据会维护每个表的唯一标识UUID,在存储引擎内部的元数据中是按顺序存储的,而且存储的顺序是固定不变的,每个表对应的顺序定义为id,其数据类型是整型,以此为0,1,2,......,而这个id对于每个表是固定的,在后面的索引文件存储中,其表的索引存储顺序与id的顺序一致,便于在查询中快速定位正在查询的表的索引位置。每个表的存储格式如图2所示:The metadata will maintain the unique identifier UUID of each table, which is stored in order in the metadata inside the storage engine, and the storage order is fixed. The order corresponding to each table is defined as id, and its data type is Integer, which is 0, 1, 2, ..., and this id is fixed for each table. In the subsequent index file storage, the index storage order of the table is consistent with the order of the id. It is convenient to quickly locate the index position of the table being queried in a query. The storage format of each table is shown in Figure 2:

allocated,uuid,attributesallocated, uuid, attributes

其中,allocated为bool类型,表示该id是否已分配,true为已分配,false为未分配/已删除。已删除的id可以回收利用,创建新的表时可以将其分配给已删除的id。Among them, allocated is bool type, indicating whether the id has been allocated, true is allocated, false is unallocated/deleted. Deleted ids can be recycled and can be assigned to deleted ids when a new table is created.

S4、索引信息;S4, index information;

(1)索引信息的更新,包括内存中和文件中的索引信息。(1) Update of index information, including index information in memory and in files.

内存索引memory index

每个表在内存中维护当前文件中的所有索引信息,主要信息如下:Each table maintains all index information in the current file in memory. The main information is as follows:

index countindex count

开始时间,结束时间,data offset 1start time, end time, data offset 1

开始时间,结束时间,data offset 2start time, end time, data offset 2

开始时间,结束时间,data offset 3start time, end time, data offset 3

............

其中data offset指向原始数据文件中的相应数据块的偏移。Where data offset points to the offset of the corresponding data block in the original data file.

(2)当前时间分区的索引文件格式(2) The index file format of the current time partition

为了提高索引文件的更新效率,也将采用文件末尾追加的方式进行更新,由于内存中依然保存着一份索引信息,因此系统正常运行过程中不需要读取这个当前索引文件,只有在系统出现故障等需要重启时,要把整个索引文件读取到内存中。In order to improve the update efficiency of the index file, it will also be updated by appending at the end of the file. Since a copy of the index information is still stored in the memory, the current index file does not need to be read during the normal operation of the system. Only when the system fails When restarting is required, the entire index file must be read into memory.

每一条索引记录如下:id,start,end,data offsetEach index record is as follows: id, start, end, data offset

id是表在存储内部的唯一标识,按顺序编号;start为数据块中的最小时间戳;end为数据块中的最大时间戳;data offset是数据块在当前原始数据文件中的偏移。如图3所示,当前索引文件是顺序往后追加的。id is the unique identifier of the table in the storage, numbered in sequence; start is the minimum timestamp in the data block; end is the maximum timestamp in the data block; data offset is the offset of the data block in the current original data file. As shown in Figure 3, the current index file is appended sequentially.

(3)历史时间分区的索引文件格式(3) Index file format of historical time partition

当前文件不再写入数据,转为历史分区文件时,要将索引文件清空,然后将内存中的索引信息按照表依次写入此索引文件,从而让同一个表的索引信息是顺序存放的:The current file no longer writes data. When it is converted to a historical partition file, the index file should be cleared, and then the index information in the memory will be written into the index file in sequence according to the table, so that the index information of the same table is stored sequentially:

HeaderHeader

索引头0(id=0)Index header 0 (id=0)

索引头1(id=1)Index header 1 (id=1)

索引头2(id=2)Index header 2 (id=2)

............

索引块0(id=0)index block 0 (id=0)

索引块1(id=1)index block 1 (id=1)

索引快2(id=2)Index fast 2 (id=2)

............

其中,索引头大致定义如下:index count,index offsetAmong them, the index header is roughly defined as follows: index count, index offset

每个索引头的大小都是固定的,根据表的id就可以计算出其索引头在索引文件中的偏移。索引头中的index count记录这个表对应的索引块中有几条索引信息,indexoffset则是其索引块在索引文件中的偏移。The size of each index header is fixed, and the offset of its index header in the index file can be calculated according to the id of the table. The index count in the index header records several pieces of index information in the index block corresponding to the table, and the indexoffset is the offset of the index block in the index file.

每个索引块的大小是不固定的,取决于索引块中有多少条索引信息,即索引头中的index count。索引块大致定义如下:The size of each index block is not fixed and depends on how many pieces of index information are in the index block, that is, the index count in the index header. An index block is roughly defined as follows:

索引信息0index info 0

索引信息1Index information 1

索引信息2Index information 2

............

其中,每个索引信息大致定义如下:start,end,data offset, lengthAmong them, each index information is roughly defined as follows: start, end, data offset, length

start和end分别是对应数据块的开始时间和结束时间,data offset则是该数据块在原始数据文件中的偏移。start and end are the start time and end time of the corresponding data block, respectively, and data offset is the offset of the data block in the original data file.

历史索引文件和原始数据文件映射格式如图4所示。The mapping format of the historical index file and the original data file is shown in Figure 4.

基于上述方法,本实施例中的一种时序数据库高速数据读写装置,包括:至少一个存储器和至少一个处理器;Based on the above method, a high-speed data reading and writing device for a time series database in this embodiment includes: at least one memory and at least one processor;

所述至少一个存储器,用于存储机器可读程序;the at least one memory for storing a machine-readable program;

所述至少一个处理器,用于调用所述机器可读程序,执行一种时序数据库高速数据读写方法。The at least one processor is configured to invoke the machine-readable program to execute a high-speed data reading and writing method for a time series database.

上述具体的实施方式仅是本发明具体的个案,本发明的专利保护范围包括但不限于上述具体的实施方式,任何符合本发明的一种时序数据库高速数据读写方法及装置权利要求书的且任何所述技术领域普通技术人员对其做出的适当变化或者替换,皆应落入本发明的专利保护范围。The above-mentioned specific embodiments are only specific cases of the present invention, and the scope of patent protection of the present invention includes but is not limited to the above-mentioned specific embodiments, any high-speed data reading and writing method and device for a time series database in accordance with the present invention. Any appropriate changes or substitutions made by those of ordinary skill in the technical field shall fall into the scope of patent protection of the present invention.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, and substitutions can be made in these embodiments without departing from the principle and spirit of the invention and modifications, the scope of the present invention is defined by the appended claims and their equivalents.

Claims (10)

1. A high-speed data reading and writing method for a time sequence database is characterized by comprising the following steps:
s1, a memory management cache block;
s2, managing a pre-written log WAL;
s3, managing metadata;
and S4, indexing information.
2. The high-speed data reading and writing method for the time series database according to claim 1, wherein in step S1, two cache blocks are pre-allocated at system startup or allocated at table creation time, one cache block is called a executable Buffer, the other cache block is called an Immutable Buffer, and the executable Buffer is used for receiving and writing new data and directly copying an additional Buffer into the executable Buffer in a line memory manner.
3. The high-speed data reading and writing method for the time series database according to claim 2, characterized in that if the table Buffer is full, the table Buffer is converted into the Imutable Buffer and is placed into a waiting queue list to wait for subsequent pre-calculation, compression, disk dropping and updating, wherein the pre-calculation and compression of each table are not interfered with each other.
4. The high-speed data reading and writing method for the time sequence database according to claim 3, wherein the size of each of the table buffers is determined according to the attribute number and the data type of the table, two principles are set for the size of each of the table buffers, and the first principle is that the initialized size of each of the table buffers exceeds a certain threshold; the second principle is that each volatile Buffer can hold a certain amount of data.
5. The high-speed data reading and writing method for the time series database according to claim 4, characterized in that in step S2, the generating time of the table Buffer of each table is recorded, and if the minimum number of the dropped disks is not reached after exceeding a threshold, the dropped disks need to be forced;
when the system restarts to recover data from the WAL, the column block information of each table falling off the disk is recorded in the index, the time point of the last falling off of the table is read from the column block information, the data written after the time point when the WAL is recovered needs to be recovered to the memory, and the previous data falling off the disk can be discarded.
6. The method as claimed in claim 4, wherein in step S3, the metadata maintains a unique identifier UUID of each table, the UUID is stored in the metadata in the storage engine in sequence, the storage sequence is fixed, the sequence corresponding to each table is defined as id, the data type is an integer, the id is fixed for each table, and in the following index file storage, the index storage sequence of the table is consistent with the sequence of the id, so as to facilitate fast locating the index position of the table being queried in the query.
7. The method according to claim 6, wherein in step S4, the index information includes index information in the memory and index information in the file, and wherein the index information in the memory maintains all index information in the current file in the memory for each table.
8. The high-speed data reading-writing method for the time sequence database according to claim 6, wherein the index information in the file comprises an index format of a current time partition and an index file format of a historical time partition, the index format of the current time partition is updated in a way of adding the tail of the file, the system does not need to read the current index file in the normal operation process, and the whole index file is read into the memory only when the system is in failure and needs to be restarted;
each index records the following id, start, end and data offset, wherein the id is a unique identifier of the table in the storage and is numbered in sequence; start is the minimum timestamp in the data block; end is the maximum timestamp in the data block; the data offset is the offset of the data block in the current original data file.
9. The method as claimed in claim 8, wherein in the historical time partition index file format, when the current file is not written with data any more and is converted into the historical partition file, the index file is emptied, and then the index information in the memory is sequentially written into the index file according to the tables, and the index information of the same table is sequentially stored.
10. A high-speed data read-write device of a time sequence database is characterized by comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 9.
CN202210632591.9A 2022-06-07 2022-06-07 High-speed data reading and writing method and device for time sequence database Pending CN115221131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210632591.9A CN115221131A (en) 2022-06-07 2022-06-07 High-speed data reading and writing method and device for time sequence database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210632591.9A CN115221131A (en) 2022-06-07 2022-06-07 High-speed data reading and writing method and device for time sequence database

Publications (1)

Publication Number Publication Date
CN115221131A true CN115221131A (en) 2022-10-21

Family

ID=83607491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210632591.9A Pending CN115221131A (en) 2022-06-07 2022-06-07 High-speed data reading and writing method and device for time sequence database

Country Status (1)

Country Link
CN (1) CN115221131A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794900A (en) * 2022-11-10 2023-03-14 南京捷崎信息科技有限公司 Data processing method and system
CN116414839A (en) * 2023-04-14 2023-07-11 中国科学院软件研究所 SSD-oriented time sequence data storage method and system based on LSM_Tree
CN117149081A (en) * 2023-09-07 2023-12-01 武汉麓谷科技有限公司 Time sequence database storage engine construction method based on ZNS solid state disk
CN117194369A (en) * 2023-08-16 2023-12-08 西北工业大学 Airborne embedded data reading and writing method and application
CN117632016A (en) * 2023-11-28 2024-03-01 天翼云科技有限公司 A distributed storage asynchronous data compression method
CN118245461A (en) * 2024-05-28 2024-06-25 山东云海国创云计算装备产业创新中心有限公司 Log processing method, computer program product, equipment and medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794900A (en) * 2022-11-10 2023-03-14 南京捷崎信息科技有限公司 Data processing method and system
CN116414839A (en) * 2023-04-14 2023-07-11 中国科学院软件研究所 SSD-oriented time sequence data storage method and system based on LSM_Tree
CN116414839B (en) * 2023-04-14 2024-06-11 中国科学院软件研究所 SSD-oriented time sequence data storage method and system based on LSM_Tree
CN117194369A (en) * 2023-08-16 2023-12-08 西北工业大学 Airborne embedded data reading and writing method and application
CN117149081A (en) * 2023-09-07 2023-12-01 武汉麓谷科技有限公司 Time sequence database storage engine construction method based on ZNS solid state disk
CN117149081B (en) * 2023-09-07 2024-02-06 武汉麓谷科技有限公司 Time sequence database storage engine construction method based on ZNS solid state disk
CN117632016A (en) * 2023-11-28 2024-03-01 天翼云科技有限公司 A distributed storage asynchronous data compression method
CN118245461A (en) * 2024-05-28 2024-06-25 山东云海国创云计算装备产业创新中心有限公司 Log processing method, computer program product, equipment and medium

Similar Documents

Publication Publication Date Title
CN115221131A (en) High-speed data reading and writing method and device for time sequence database
US12038873B2 (en) Database management system
US10944807B2 (en) Organizing present and future reads from a tiered streaming data storage layer
US7979399B2 (en) Database journaling in a multi-node environment
US11023453B2 (en) Hash index
US8626717B2 (en) Database backup and restore with integrated index reorganization
US7257690B1 (en) Log-structured temporal shadow store
US11100083B2 (en) Read only bufferpool
US8560500B2 (en) Method and system for removing rows from directory tables
US20170351543A1 (en) Heap data structure
EP4530875A1 (en) Key-value store and file system
WO2015024474A1 (en) Rapid calculation method for electric power reliability index based on multithread processing of cache data
CN110309233A (en) Method, apparatus, server and the storage medium of data storage
US20090307287A1 (en) Database Journaling in a Multi-Node Environment
CN113377292B (en) A stand-alone storage engine
CN107665219A (en) A kind of blog management method and device
CN117874012A (en) A method for implementing an expiration schedule for a distributed relational database based on rocksdb
CN118093592A (en) Metadata index storage method and device for distributed object storage system
CN104731716B (en) A kind of date storage method
CN119597765A (en) Flink-based incremental data lake entering method, device, equipment and storage medium
Pedreira et al. Rethinking concurrency control for in-memory OLAP dbmss
Xue et al. TagTree: Global Tagging Index with Efficient Querying for Time Series Databases
Li et al. An Optimized Storage Method for Small Files in Ceph System
Sanghavi et al. Goku: A Schemaless Time Series Database for Large Scale Monitoring at Pinterest
Xavier et al. Beelog: Online Log Compaction for Dependable Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240731

Address after: Room 305-22, Building 2, No. 1158 Zhangdong Road and No. 1059 Dangui Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120

Applicant after: Shanghai Yunxi Technology Co.,Ltd.

Country or region after: China

Address before: 250100 Ji'nan hi tech Zone No. 2877, Shandong Province

Applicant before: INSPUR SOFTWARE GROUP Co.,Ltd.

Country or region before: China