CN110389967B

CN110389967B - Data storage method, device, server and storage medium

Info

Publication number: CN110389967B
Application number: CN201910680739.4A
Authority: CN
Inventors: 张云; 黄宇; 吕越; 冯理; 张粤峰
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2024-06-04
Anticipated expiration: 2039-07-26
Also published as: CN110389967A

Abstract

The present application discloses a data storage method, device, server and storage medium, which relate to the field of databases. The method includes: obtaining a message to be stored; extracting the data to be stored in the message to be stored; if the data to be stored meets the target query rules, storing the data to be stored in a non-relational database, and storing the data to be stored in a time series database, and the concurrent query requirements and query delay requirements of the data that meet the target query rules are higher than the data that do not meet the target query rules; if the data to be stored does not meet the target query rules, storing the data to be stored in a time series database. Since non-relational databases can provide high-concurrency and low-latency data query services, storing data with higher concurrent query requirements and query delay requirements in non-relational databases can increase the speed of subsequent data queries.

Description

Data storage method, device, server and storage medium

技术领域Technical Field

本申请实施例涉及数据库领域，特别涉及一种数据存储方法、装置、服务器及存储介质。The embodiments of the present application relate to the field of databases, and in particular to a data storage method, device, server, and storage medium.

背景技术Background technique

互联网中每天都有会产生海量数据，为了实现对海量数据的有效管理，互联网服务提供商都会将产生的数据存储在数据库(DataBase，DB)中。Massive amounts of data are generated on the Internet every day. In order to effectively manage the massive amounts of data, Internet service providers will store the generated data in a database (DataBase, DB).

相关技术中，通常采用关系数据库对数据进行存储，常见的关系数据库包括MySQL和Oracle。关系数据库中，数据以表格的形式进行存储，表格中的每个字段都预先经过定义，可靠性与稳定性较高。此外，关系数据库采用结构化查询语言进行数据查询，并支持对数据库中的数据进行增删改查操作以及跨表查询功能。In the related art, relational databases are usually used to store data. Common relational databases include MySQL and Oracle. In relational databases, data is stored in the form of tables. Each field in the table is pre-defined, and the reliability and stability are high. In addition, relational databases use structured query language for data query, and support the addition, deletion, modification, and query of data in the database as well as cross-table query functions.

然而，面对大量的数据查询请求，基于关系数据库进行数据查询的速度较慢(尤其是在进行跨表查询)，无法满足查询需求。However, faced with a large number of data query requests, the speed of data query based on relational databases is slow (especially when performing cross-table queries) and cannot meet the query needs.

发明内容Summary of the invention

本申请实施例提供了一种数据存储方法、装置、服务器及存储介质，可以解决相关技术中数据查询速度较慢，无法满足查询需求的问题。所述技术方案如下：The embodiments of the present application provide a data storage method, device, server and storage medium, which can solve the problem that the data query speed in the related art is slow and cannot meet the query requirements. The technical solution is as follows:

一方面，本申请实施例提供了一种数据存储方法，所述方法包括：In one aspect, an embodiment of the present application provides a data storage method, the method comprising:

获取待存储消息；Get the message to be stored;

提取所述待存储消息中的待存储数据；Extracting the data to be stored in the message to be stored;

若所述待存储数据符合目标查询规则，则将所述待存储数据存储至非关系数据库，并将所述待存储数据存储至时序数据库，其中，符合所述目标查询规则的数据的并发查询需求以及查询延迟需求高于不符合所述目标查询规则的数据；If the data to be stored meets the target query rule, the data to be stored is stored in a non-relational database, and the data to be stored is stored in a time series database, wherein the concurrent query requirement and query delay requirement of the data meeting the target query rule are higher than those of the data not meeting the target query rule;

若所述待存储数据不符合所述目标查询规则，则将所述待存储数据存储至所述时序数据库。If the data to be stored does not conform to the target query rule, the data to be stored is stored in the time series database.

另一方面，本申请实施例提供了一种数据存储装置，所述装置包括：On the other hand, an embodiment of the present application provides a data storage device, the device comprising:

第一获取模块，用于获取待存储消息；A first acquisition module, used to acquire a message to be stored;

提取模块，用于提取所述待存储消息中的待存储数据；An extraction module, used for extracting the data to be stored in the message to be stored;

第一存储模块，用于当所述待存储数据符合目标查询规则时，将所述待存储数据存储至非关系数据库，并将所述待存储数据存储至时序数据库，其中，符合所述目标查询规则的数据的并发查询需求以及查询延迟需求高于不符合所述目标查询规则的数据；A first storage module is used to store the data to be stored in a non-relational database and store the data to be stored in a time series database when the data to be stored meets the target query rule, wherein the concurrent query requirement and the query delay requirement of the data meeting the target query rule are higher than those of the data not meeting the target query rule;

第二存储模块，用于当所述待存储数据不符合所述目标查询规则时，将所述待存储数据存储至所述时序数据库。The second storage module is used to store the data to be stored in the time series database when the data to be stored does not meet the target query rule.

另一方面，本申请实施例提供了一种服务器，所述服务器包括处理器和存储器，所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述方面所述的数据存储方法。On the other hand, an embodiment of the present application provides a server, which includes a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or an instruction set, and the at least one instruction, the at least one program, the code set or the instruction set are loaded and executed by the processor to implement the data storage method as described in the above aspects.

另一方面，提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述方面所述的数据存储方法。On the other hand, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a code set or an instruction set is stored, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to implement the data storage method as described in the above aspects.

另一方面，提供了一种计算机程序产品，当所述计算机程序产品在计算机上运行时，使得计算机执行如上述方面所述的数据存储方法。On the other hand, a computer program product is provided. When the computer program product is executed on a computer, the computer is enabled to execute the data storage method as described in the above aspects.

本申请实施例提供的技术方案带来的有益效果至少包括:The beneficial effects brought by the technical solution provided by the embodiment of the present application include at least:

获取到待存储消息后，提取待存储消息中的待存储数据，并检测待存储数据是否符合目标查询规则，在待存储数据符合目标查询规则时，将待存储数据存储至非关系数据库以及时序数据库，在待存储数据不符合目标查询规则时，将待存储数据存储至时序数据库；由于非关系数据库能够提供高并发以及低延迟的数据查询服务，因此将对并发查询需求以及查询延迟需求较高的数据存储至非关系数据库后，能够提高后续数据查询的速度；同时，将对并发查询需求以及查询延迟需求较低的数据存储至时序数据库后，方便后续数据查询过程中进行多维度组合查询。After obtaining the message to be stored, extract the data to be stored in the message to be stored, and detect whether the data to be stored meets the target query rules. When the data to be stored meets the target query rules, store the data to be stored in the non-relational database and the time series database. When the data to be stored does not meet the target query rules, store the data to be stored in the time series database. Since non-relational databases can provide high concurrency and low-latency data query services, storing data with high concurrent query requirements and query latency requirements in non-relational databases can improve the speed of subsequent data queries. At the same time, storing data with low concurrent query requirements and query latency requirements in the time series database facilitates multi-dimensional combined queries in subsequent data queries.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1示出了本申请一个示例性实施例提供的数据分类存储查询系统的系统架构图；FIG1 shows a system architecture diagram of a data classification storage query system provided by an exemplary embodiment of the present application;

图2是本申请实施例提供的数据存储方法的原理示意图；FIG2 is a schematic diagram showing the principle of a data storage method provided in an embodiment of the present application;

图3示出了本申请一个示例性实施例提供的数据存储方法的流程图；FIG3 shows a flow chart of a data storage method provided by an exemplary embodiment of the present application;

图4示出了本申请另一个示例性实施例提供的数据存储方法的流程图；FIG4 shows a flow chart of a data storage method provided by another exemplary embodiment of the present application;

图5是人工配置维度集合过程的界面示意图；FIG5 is a schematic diagram of an interface for manually configuring a dimension set process;

图6是一个实施例示出的向数据库中存储日志消息过程的流程图；FIG6 is a flow chart showing a process of storing log messages in a database according to an embodiment;

图7是一个实施例示出的从数据库中查询日志消息过程的流程图；FIG. 7 is a flow chart showing a process of querying log messages from a database according to an embodiment;

图8是一个示例性实施例提供的维度集合生成过程的流程图；FIG8 is a flowchart of a dimension set generation process provided by an exemplary embodiment;

图9至11是数据压缩格式的示意图；9 to 11 are schematic diagrams of data compression formats;

图12是本申请一个示例性实施例提供的数据存储装置的结构框图；FIG12 is a structural block diagram of a data storage device provided by an exemplary embodiment of the present application;

图13示出了本申请一个示例性实施例提供的服务器的结构示意图。FIG. 13 shows a schematic diagram of the structure of a server provided by an exemplary embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application more clear, the implementation methods of the present application will be further described in detail below with reference to the accompanying drawings.

为了便于理解，下面对本申请实施例中涉及的一些名词进行简单介绍。To facilitate understanding, some terms involved in the embodiments of the present application are briefly introduced below.

非关系数据库(Not only SQL，NoSQL)：指非关系型、分布式、不提供原子性、一致性、隔离性、持久性(Atomicity Consistency Isolation Durability，ACID)的数据库设计模式。不同于关系数据库中采用表格存储数据，非关系数据库采用列式(Column-Oriented)结构、键值(Key-Value，KV)结构、文档(Document-Oriented)结构存储数据。本申请实施例中的非关系数据库采用KV结构存储数据，能够提供高并发、低延迟的数据查询服务。Non-relational database (Not only SQL, NoSQL): refers to a non-relational, distributed, and non-ACID database design pattern. Unlike relational databases that use tables to store data, non-relational databases use column-oriented structures, key-value (KV) structures, and document-oriented structures to store data. The non-relational database in the embodiment of the present application uses a KV structure to store data, which can provide high concurrency and low latency data query services.

时序数据库(time series database)：又称为时间序列数据库，主要用于存储并处理带时间标签的数据，其中带有时间标签的数据也被称为时间序列数据，常被用于存储随时间变化产生的海量数据。常见的时序数据库包括Druid、TimescaleDB、KairosDB、InfluxDB等等。Time series database: Also known as time series database, it is mainly used to store and process data with time tags. Data with time tags is also called time series data. It is often used to store massive data that changes over time. Common time series databases include Druid, TimescaleDB, KairosDB, InfluxDB, etc.

请参考图1，其示出了本申请一个示例性实施例提供的数据分类存储查询系统的系统结构图，该系统中包括：分类服务器11、第一数据库服务器12和第二数据库服务器13。Please refer to FIG. 1 , which shows a system structure diagram of a data classification storage query system provided by an exemplary embodiment of the present application. The system includes: a classification server 11 , a first database server 12 and a second database server 13 .

分类服务器11是用于提供数据分类存储以及数据分类查询服务的服务器，其可以是一台服务器，由若干台服务器构成的服务器集群或云计算中心。其中，数据分类存储是指按照数据分类标准，将待存储数据存储至至少一种数据库的过程；数据分类查询指按照数据分类标准，将查询请求路由至指定数据库，并对数据库反馈的数据查询结果进行反馈的过程。The classification server 11 is a server for providing data classification storage and data classification query services, which can be a server, a server cluster composed of several servers, or a cloud computing center. Among them, data classification storage refers to the process of storing the data to be stored in at least one database according to the data classification standard; data classification query refers to the process of routing the query request to the specified database according to the data classification standard and providing feedback on the data query results fed back by the database.

在一种可能的实施方式中，分类服务器11对应有可视化操作平台，用于接收用户设置的数据分类标准，或，接收用户触发的查询请求。In a possible implementation manner, the classification server 11 corresponds to a visualization operation platform for receiving data classification standards set by a user, or receiving a query request triggered by a user.

分类服务器11与第一数据库服务器12以及第二数据库服务器13之间通过有线网络或无线网络相连。The classification server 11 is connected to the first database server 12 and the second database server 13 via a wired network or a wireless network.

第一数据库服务器12是设置有非关系数据库的服务器，其可以是一台服务器，由若干台服务器构成的服务器集群或云计算中心。本申请实施例中，第一数据库服务器12用于实现对具有高并发、低延迟查询需求的数据进行存储和查询，且非关系数据库采用KV格式存储数据The first database server 12 is a server with a non-relational database, which can be a single server, a server cluster consisting of several servers, or a cloud computing center. In the embodiment of the present application, the first database server 12 is used to store and query data with high concurrency and low latency query requirements, and the non-relational database uses KV format to store data.

第二数据库服务器13是设置有时序性数据库的服务器，其可以是一台服务器，由若干台服务器构成的服务器集群或云计算中心。本申请实施例中，第二数据库服务器13用于实现对多维度组合数据进行存储以及查询。The second database server 13 is a server with a time-series database, which can be a single server, a server cluster consisting of several servers, or a cloud computing center. In the embodiment of the present application, the second database server 13 is used to store and query multi-dimensional combined data.

在一种可能的应用场景下，分类服务器11获取到待存储消息后，即检测待存储消息中的待存储数据是否具有高并发、低延迟查询需求的数据，若是，分类服务器11同时将待存储数据存储至第一数据库服务器12和第二数据库服务器13；若否，分类服务器11则仅将待存储数据存储至第二数据库服务器13。由于具有高并发、低延迟查询需求的数据被存储在第一数据库服务器12中，因此分类服务器11后续接收到针对高并发、低延迟查询需求的数据的查询请求时，可以通过第一数据库服务器12实现高并发、低时延的数据查询；而当接收到针对多维度组合数据的查询请求时，可以通过第二数据库服务器13实现多维度组合数据的数据查询。In a possible application scenario, after the classification server 11 obtains the message to be stored, it detects whether the data to be stored in the message to be stored has data with high concurrency and low latency query requirements. If so, the classification server 11 stores the data to be stored in the first database server 12 and the second database server 13 at the same time; if not, the classification server 11 only stores the data to be stored in the second database server 13. Since the data with high concurrency and low latency query requirements are stored in the first database server 12, when the classification server 11 subsequently receives a query request for data with high concurrency and low latency query requirements, high concurrency and low latency data query can be implemented through the first database server 12; and when a query request for multi-dimensional combination data is received, data query for multi-dimensional combination data can be implemented through the second database server 13.

如图2所示，本申请实施例提供的数据存储方法中，分类服务器21从消息队列22中获取待存储消息后，检测待存储消息中包含的待存储数据是否符合高并发、低延迟查询需求；若符合，为了实现高并发、低延迟数据查询，以及历史数据回溯可查，分类服务器21将待存储数据同时存储至非关系数据库23(高并发低延迟数据查询)和时序数据库24(历史数据回溯可查)；若不符合，为了实现多维度组合数据查询，分类服务器21将待存储数据存储至时序数据库24。后续接收到查询请求25时，分类服务器21检测该查询请求25所要查询的数据是否符合高并发、低延迟查询需求；若符合，分类服务器21即将查询请求路由至非关系数据库23，由非关系数据库23提供高并发低延迟的数据查询服务；若不符合，分类服务器21即将查询请求路由至时序数据库24，由时序数据库24提供多维度组合的数据查询服务。As shown in FIG2 , in the data storage method provided by the embodiment of the present application, after the classification server 21 obtains the message to be stored from the message queue 22, it detects whether the data to be stored contained in the message to be stored meets the high concurrency and low latency query requirements; if it meets, in order to achieve high concurrency and low latency data query, and historical data can be traced back, the classification server 21 stores the data to be stored in the non-relational database 23 (high concurrency and low latency data query) and the time series database 24 (historical data can be traced back) at the same time; if it does not meet, in order to achieve multi-dimensional combination data query, the classification server 21 stores the data to be stored in the time series database 24. When a query request 25 is subsequently received, the classification server 21 detects whether the data to be queried by the query request 25 meets the high concurrency and low latency query requirements; if it meets, the classification server 21 will route the query request to the non-relational database 23, and the non-relational database 23 will provide high concurrency and low latency data query services; if it does not meet, the classification server 21 will route the query request to the time series database 24, and the time series database 24 will provide multi-dimensional combination data query services.

相较于相关技术中使用单一的关系数据库存储数据时，无法满足数据查询需求，本申请实施例中，创造性的提出了“分类服务器+双数据库”的数据存储查询架构，由分类服务器依据“是否具有高并发低延迟的数据查询需求”将待存储数据分类存储至相应的数据库中的，并进一步将查询请求路由至相应的数据库，由相应的数据库提供数据查询服务，即实现了高并发、低延迟的数据查询，又能够实现多维度组合的数据查询。下面采用示意性的实施例对数据存储以及查询过程进行说明。Compared with the use of a single relational database to store data in the related art, which cannot meet the data query requirements, the embodiment of the present application creatively proposes a data storage query architecture of "classification server + dual database", in which the classification server classifies the data to be stored into the corresponding database according to "whether there is a high concurrency and low latency data query requirement", and further routes the query request to the corresponding database, which provides data query services, thus realizing high concurrency and low latency data query, and realizing multi-dimensional combined data query. The following uses an illustrative embodiment to illustrate the data storage and query process.

请参考图3，其示出了本申请一个示例性实施例提供的数据存储方法的流程图。本实施例以该方法用于图1所示的分类服务器11为例进行说明，该方法包括如下步骤。Please refer to Fig. 3, which shows a flow chart of a data storage method provided by an exemplary embodiment of the present application. This embodiment takes the method used in the classification server 11 shown in Fig. 1 as an example for explanation, and the method includes the following steps.

步骤301，获取待存储消息。Step 301, obtaining the message to be stored.

在一种可能的实施方式中，分类服务器从消息队列中获取待存储消息，该消息队列中存储有系统运行过程中产生日志消息。比如，对于支付系统而言，该日志消息可以是支付行为产生的支付成功日志消息或支付失败日志消息等等。本申请实施例并不对待存储消息的具体类型进行限定。In a possible implementation, the classification server obtains the message to be stored from the message queue, and the message queue stores the log message generated during the operation of the system. For example, for a payment system, the log message can be a payment success log message or a payment failure log message generated by a payment behavior, etc. The embodiment of the present application does not limit the specific type of the message to be stored.

步骤302，提取待存储消息中的待存储数据。Step 302: extract the data to be stored in the message to be stored.

在一种可能的实施方式中，分类服务器通过消息提取模板，从待存储消息中提取待存储数据，该待存储数据可以包括维度数据和指标数据，其中，该维度数据可以进一步包括维度名称和维度值，指标数据可以进一步包括指标名称和指标值。In a possible implementation, the classification server extracts the data to be stored from the message to be stored through a message extraction template, and the data to be stored may include dimension data and indicator data, wherein the dimension data may further include dimension names and dimension values, and the indicator data may further include indicator names and indicator values.

在一个示意性的例子中，服务器提取到的维度数据包括维度名称渠道号和维度值001，指标数据包括指标名称支付成功和指标值1(表示支付成功)。In an illustrative example, the dimension data extracted by the server includes a dimension name of channel number and a dimension value of 001, and the indicator data includes an indicator name of payment success and an indicator value of 1 (indicating payment success).

可选的，提取到待存储数据后，分类服务器进一步获取目标查询规则，并检测待存储数据是否符合目标查询规则，若符合，则执行步骤303；若不符合，则执行步骤304。Optionally, after extracting the data to be stored, the classification server further obtains the target query rule and detects whether the data to be stored conforms to the target query rule. If so, step 303 is executed; if not, step 304 is executed.

步骤303，若待存储数据符合目标查询规则，则将待存储数据存储至非关系数据库，并将待存储数据存储至时序数据库，其中，符合目标查询规则的数据的并发查询需求以及查询延迟需求高于不符合目标查询规则的数据。Step 303: If the data to be stored meets the target query rules, the data to be stored is stored in a non-relational database, and the data to be stored is stored in a time series database, wherein the concurrent query requirements and query delay requirements of the data that meet the target query rules are higher than those of the data that do not meet the target query rules.

可选的，该目标查询规则由用户手动设置，或者，目标查询规则由分类服务器自动学习生成。Optionally, the target query rule is manually set by a user, or the target query rule is automatically learned and generated by a classification server.

在一种可能的实施方式中，目标查询规则用于指示具有高并发、低延迟查询需求的数据的维度特征，即具有特定维度特征的数据需要实现高并发、低延时的数据查询，相应的，当待存储数据中的维度数据符合目标查询规则指示的维度特征时，分类服务器确定该待存储数据是具有高并发、低延迟查询需求的数据，从而将其存储至非关系数据库。In one possible implementation, the target query rule is used to indicate the dimensional characteristics of data with high concurrency and low latency query requirements, that is, data with specific dimensional characteristics need to achieve high concurrency and low latency data query. Accordingly, when the dimensional data in the data to be stored meets the dimensional characteristics indicated by the target query rule, the classification server determines that the data to be stored is data with high concurrency and low latency query requirements, and thus stores it in a non-relational database.

可选的，为了在非关系数据库中实现多维度组合查询，将待存储数据存储至非关系数据库前，分类服务器还需要根据预定义的维度组合规则，对待存储数据中的维度数据进行组合拆分，从而将基于待存储数据组合拆分后生成的数据存储至非关系数据库。Optionally, in order to implement multi-dimensional combined query in a non-relational database, before storing the data to be stored in the non-relational database, the classification server also needs to combine and split the dimensional data in the data to be stored according to predefined dimensional combination rules, so as to store the data generated after the combination and splitting of the data to be stored in the non-relational database.

由于存储至非关系数据库的数据是经过处理的待存储数据，而并非原始的待存储数据，因此为了实现历史数据回溯可查，分类服务器向非关系数据库中存储数据的同时，该需要将待存储数据存储至时序数据库中。Since the data stored in the non-relational database is processed data to be stored rather than the original data to be stored, in order to make historical data traceable, the classification server needs to store the data to be stored in the time series database while storing the data in the non-relational database.

在一种可能的实施方式中，将待存储数据存储至时序数据库时，分类服务器将待存储数据输入时序数据库入库程序，由入库程序根据预定义的数据格式，对待存储数据进行聚合存储。本申请实施例并不对时序数据库中数据的入库过程进行限定。In a possible implementation, when storing the data to be stored in the time series database, the classification server inputs the data to be stored into the time series database storage program, and the storage program aggregates and stores the data to be stored according to a predefined data format. The embodiment of the present application does not limit the storage process of data in the time series database.

步骤304，若待存储数据不符合目标查询规则，则将待存储数据存储至时序数据库。Step 304: If the data to be stored does not meet the target query rule, the data to be stored is stored in the time series database.

为了在非关系数据库中实现多维度组合查询，存入非关系数据库中的数据需要经过组合拆分处理，而组合拆分处理过程中将会造成数据膨胀，导致实际存储数据的数据量远大于原始数据的数据量。若将所有待存储数据均存入非关系数据库中，将消耗大量的存储资源，且组合拆分处理需要花费较长时间，导致存储效率较低。In order to implement multi-dimensional combined query in non-relational databases, the data stored in non-relational databases needs to be processed by combination and splitting, which will cause data expansion, resulting in the actual amount of stored data being much larger than the amount of original data. If all the data to be stored is stored in a non-relational database, a large amount of storage resources will be consumed, and the combination and splitting process will take a long time, resulting in low storage efficiency.

虽然时序数据库在进行数据查询时，需要经历磁盘文件和内存的交换，以及索引文件的内存解压缩步骤，相较于非关系数据库的数据查询速度较慢，但是其存储数据时并不会发生数据膨胀，因此，本申请实施例中，当确定待存储数据不符合目标查询规则，即待存储数据并非具有高并发、低延迟查询需求的数据时，分类服务器仅将待存储数据存储至时序数据库，而不会将其存储至非关系数据库中。Although the time series database needs to go through the exchange of disk files and memory, as well as the memory decompression step of index files when performing data query, and the data query speed is slower than that of non-relational databases, data expansion does not occur when it stores data. Therefore, in the embodiment of the present application, when it is determined that the data to be stored does not meet the target query rules, that is, the data to be stored is not data with high concurrency and low latency query requirements, the classification server only stores the data to be stored in the time series database, and does not store it in the non-relational database.

在一种可能的实施方式中，时序数据库为Druid(一种实时大数据处理平台)，实现了基于日志schema的存储模型，在不对日志的维度进行组合拆分的情况下(即存储的数据不会发生数据膨胀)，提供了日志任意维度组合的查询功能。In one possible implementation, the time series database is Druid (a real-time big data processing platform), which implements a storage model based on a log schema and provides a query function for any combination of log dimensions without combining or splitting the log dimensions (that is, the stored data will not experience data expansion).

综上所述，本申请实施例提供数据存储方法中，获取到待存储消息后，提取待存储消息中的待存储数据，并检测待存储数据是否符合目标查询规则，在待存储数据符合目标查询规则时，将待存储数据存储至非关系数据库以及时序数据库，在待存储数据不符合目标查询规则时，将待存储数据存储至时序数据库；由于非关系数据库能够提供高并发以及低延迟的数据查询服务，因此将对并发查询需求以及查询延迟需求较高的数据存储至非关系数据库后，能够提高后续数据查询的速度；同时，将对并发查询需求以及查询延迟需求较低的数据存储至时序数据库后，方便后续数据查询过程中进行多维度组合查询。To summarize, in the data storage method provided in the embodiment of the present application, after obtaining the message to be stored, the data to be stored in the message to be stored is extracted, and it is detected whether the data to be stored meets the target query rules. When the data to be stored meets the target query rules, the data to be stored is stored in a non-relational database and a time series database; when the data to be stored does not meet the target query rules, the data to be stored is stored in a time series database; since non-relational databases can provide high concurrency and low-latency data query services, after storing data with high concurrency query requirements and query delay requirements in the non-relational database, the speed of subsequent data queries can be improved; at the same time, after storing data with low concurrency query requirements and query delay requirements in the time series database, it is convenient to perform multi-dimensional combined queries in subsequent data query processes.

在一种可能的实施方式中，本申请实施例中的非关系数据库采用KV格式进行数据存储，相应的，将待存储数据存储至非关系数据库时，需要将待存储数据转化为KV格式的数据后再进行存储。下面采用示意性的实施例进行说明。In a possible implementation, the non-relational database in the embodiment of the present application uses KV format for data storage, and accordingly, when storing the data to be stored in the non-relational database, the data to be stored needs to be converted into KV format data before being stored.

请参考图4，其示出了本申请另一个示例性实施例提供的数据存储方法的流程图。本实施例以该方法用于图1所示的分类服务器11为例进行说明，该方法包括如下步骤。Please refer to Fig. 4, which shows a flow chart of a data storage method provided by another exemplary embodiment of the present application. This embodiment takes the method used in the classification server 11 shown in Fig. 1 as an example for explanation, and the method includes the following steps.

步骤401，获取待存储消息。Step 401, obtaining the message to be stored.

步骤402，提取待存储消息中的待存储数据，待存储数据包括维度数据和指标数据。Step 402: extract the data to be stored in the message to be stored, where the data to be stored includes dimension data and indicator data.

上述步骤401至402的实施过程可以参考步骤301至302，本实施例在此不再赘述。The implementation process of the above steps 401 to 402 may refer to steps 301 to 302, and this embodiment will not be repeated here.

步骤403，若待存储数据中包含预设过滤维度数据，则对待存储数据中与预设过滤维度数据匹配的数据进行格式化。Step 403: If the data to be stored includes preset filtering dimension data, format the data in the data to be stored that matches the preset filtering dimension data.

在一种可能的实施方式中，分类服务器中设置有过滤程序，该过滤程序用于对待存储数据中的异常维度数据进行格式化处理。In a possible implementation, a filtering program is provided in the classification server, and the filtering program is used to format abnormal dimensional data in the data to be stored.

可选的，过滤程序中配置有需要进行格式化的维度配置信息，该维度配置信息包括异常维度数据对应的维度名称和维度值。相应的，过滤程序检测提取到的维度数据中是否包含异常维度数据，若包含，则对异常维度数据进行格式化处理4；若不包含，则确定待存储数据中不包含异常维度数据，无需对待存储数据进行格式化处理。Optionally, the filter program is configured with dimension configuration information that needs to be formatted, and the dimension configuration information includes the dimension name and dimension value corresponding to the abnormal dimension data. Accordingly, the filter program detects whether the extracted dimension data contains abnormal dimension data. If so, the abnormal dimension data is formatted 4; if not, it is determined that the data to be stored does not contain abnormal dimension data, and there is no need to format the data to be stored.

可选的，对异常维度数据进行格式化处理时，过滤程序将异常维度数据对应的纬度值替换为预设值，比如，该预设值为error。Optionally, when formatting the abnormal dimension data, the filter program replaces the latitude value corresponding to the abnormal dimension data with a preset value, for example, the preset value is error.

完成对待存储数据的格式化处理后，分类服务器进一步检测待存储数据是否符合目标查询规则。由于通常使用维度进行数据查询，因此在一种可能的实施方式中，目标查询规则中包含维度集合，该维度集合用于指示符合目标查询规则(高并发低延迟查询需求)的数据的维度特征(包括维度名称)，其中，该维度集合由人工设置，或者，由分类服务器自动学习生成。After completing the formatting of the data to be stored, the classification server further detects whether the data to be stored meets the target query rules. Since dimensions are usually used for data query, in a possible implementation, the target query rules include a dimension set, which is used to indicate the dimensional features (including dimension names) of the data that meets the target query rules (high concurrency and low latency query requirements), wherein the dimension set is manually set or automatically learned and generated by the classification server.

示意性的，如图5所示，维度配置界面51中显示有若干已设置的维度集合，当接收到对某一维度集合配置对应修改控件的触发操作时，即显示修改修改界面52，通过该修改界面，用户可以对接口、维度以及时间粒度进行修改。Schematically, as shown in FIG5 , a dimension configuration interface 51 displays several dimension sets that have been set. When a trigger operation for configuring a corresponding modification control for a dimension set is received, a modification interface 52 is displayed. Through the modification interface, the user can modify the interface, dimension, and time granularity.

可选的，分类服务器检测待存储数据中的维度数据是否属于维度集合，若属于，则确定待存储数据符合目标查询规则，并执行下述步骤404至407；若不属于，则确定待存储数据不符合目标查询规则，并执行下述步骤408至409。Optionally, the classification server detects whether the dimensional data in the data to be stored belongs to the dimension set. If so, it determines that the data to be stored meets the target query rules and executes the following steps 404 to 407; if not, it determines that the data to be stored does not meet the target query rules and executes the following steps 408 to 409.

步骤404，若维度数据属于维度集合，则确定待存储数据符合目标查询规则。Step 404: If the dimension data belongs to the dimension set, it is determined that the data to be stored meets the target query rule.

在一个示意性的例子中，维度集合为{渠道号+支付方式，支付方式+平台，支付方式+来源，支付方式+错误码}，由于待存储消息中提取到的维度数据中包括“渠道号+支付方式”这一维度名称，因此，分类服务器确定待存储数据符合目标查询规则。In an illustrative example, the dimension set is {channel number + payment method, payment method + platform, payment method + source, payment method + error code}. Since the dimension data extracted from the message to be stored includes the dimension name "channel number + payment method", the classification server determines that the data to be stored meets the target query rules.

步骤405，根据维度数据生成目标key值，并根据指标数据生成目标value值。Step 405, generating a target key value according to the dimension data, and generating a target value according to the indicator data.

由于非关系数据库采用KV格式存储数据，因此将待存储数据存入非关系数据库前，分类服务器还需要根据维度数据生成目标key值，并根据指标数据生成目标value值。Since the non-relational database uses the KV format to store data, before storing the data to be stored in the non-relational database, the classification server also needs to generate a target key value based on the dimension data and generate a target value based on the indicator data.

在一种可能的实施方式中，为了使非关系数据库能够支持多维度组合的数据查询，分类服务器对提取到的维度数据进行组合，从而根据组合后的维度数据生成多个目标key值，进而将多个目标key值及其对应的目标value值存储至非关系数据库。In a possible implementation, in order to enable a non-relational database to support multi-dimensional combined data queries, the classification server combines the extracted dimensional data, thereby generating multiple target key values based on the combined dimensional data, and then stores the multiple target key values and their corresponding target value values in the non-relational database.

在一个示意性的例子中，当从待存储消息中提取到A、B、C、D、E共5个维度数据时，分类服务器对维度数据进行组合，可以得到AB、AC、AD、AE、BC、BD、BE、CD、CE、DE、ABC、ABD、ABE、BCD、BCE、BDE、CDE、ABCD、ABCE、BCDE和ABCDE。In an illustrative example, when five dimensional data A, B, C, D, and E are extracted from the message to be stored, the classification server combines the dimensional data to obtain AB, AC, AD, AE, BC, BD, BE, CD, CE, DE, ABC, ABD, ABE, BCD, BCE, BDE, CDE, ABCD, ABCE, BCDE, and ABCDE.

然而，采用上述方式直接对维度数据进行组合时，将会造成严重的数据膨胀，从而占用无关系数据库中大量存储空间。However, when the dimensional data is directly combined in the above manner, it will cause serious data expansion, thereby occupying a large amount of storage space in the non-relational database.

为了避免出现严重数据膨胀，在一种可能的实施方式中，分类服务器仅根据属于维度集合的维度数据生成目标key值，并根据指标数据生成对应的目标value值，以此降低数据膨胀程度。本步骤可以包括如下步骤。In order to avoid serious data expansion, in a possible implementation, the classification server generates a target key value only based on the dimension data belonging to the dimension set, and generates a corresponding target value based on the indicator data, so as to reduce the degree of data expansion. This step may include the following steps.

一、将属于维度集合的维度数据确定为目标维度数据，并根据目标维度数据生成目标key值。1. Determine the dimension data belonging to the dimension set as the target dimension data, and generate the target key value according to the target dimension data.

由于仅属于维度集合中的维度数据需要支持高并发、低延迟查询，因此分类服务器将待存储数据中，属于维度集合的维度数据确定为目标维度数据，进而对目标维度数据进行组合，生成目标key值。Since only the dimension data belonging to the dimension set needs to support high-concurrency, low-latency queries, the classification server determines the dimension data belonging to the dimension set in the data to be stored as the target dimension data, and then combines the target dimension data to generate the target key value.

在一个示意性的例子中，当从待存储消息中提取到A、B、C、D、E共5个维度数据，且维度集合中包括维度A和B时，分类服务器将维度数据A和维度数据B确定为目标维度数据，并对维度数据A和B组合，生成目标key值。In an illustrative example, when five dimensional data, A, B, C, D, and E, are extracted from the message to be stored, and the dimensional set includes dimensions A and B, the classification server determines dimension data A and dimension data B as target dimensional data, and combines dimension data A and B to generate a target key value.

相较于直接对维度数据进行组合，基于维度结合首先确定出目标维度数据，然后对目标维度数据进行组合生成目标key值，在实现多维度组合查询功能的前提下，能够降低存储数据的膨胀程度，节约非关系服务器的存储空间，并有助于提高数据的入库效率。Compared with directly combining dimensional data, the target dimensional data is first determined based on dimensional combination, and then the target dimensional data is combined to generate the target key value. Under the premise of realizing multi-dimensional combined query function, it can reduce the expansion of stored data, save storage space of non-relational servers, and help improve the efficiency of data entry.

二、根据目标粒度对指标数据进行聚合处理，生成目标value值。2. Aggregate the indicator data according to the target granularity to generate the target value.

在一种可能的实施方式中，当提取到的指标数据的粒度与目标粒度不一致时，分类服务器还需要根据目标粒度对指标数据进行聚合处理，从而生成符合目标粒度的目标value值。其中，该目标粒度可以为时间粒度，比如，分钟、小时、天等等，且该目标粒度可以通过人工设置。In a possible implementation, when the granularity of the extracted indicator data is inconsistent with the target granularity, the classification server also needs to aggregate the indicator data according to the target granularity to generate a target value that meets the target granularity. The target granularity may be a time granularity, such as minutes, hours, days, etc., and the target granularity may be manually set.

比如，当提取到的指标数据的时间粒度为秒，而目标粒度为分钟时，分类服务器即对连续60s的指标数据进行聚合，从而生成以分钟为时间粒度的目标value值。For example, when the time granularity of the extracted indicator data is seconds and the target granularity is minutes, the classification server aggregates the indicator data for 60 consecutive seconds to generate a target value with a time granularity of minutes.

当然，当指标数据的粒度与目标粒度一致时，分类服务器也可以直接根据指标数据生成value值，本实施例对此不做限定。Of course, when the granularity of the indicator data is consistent with the target granularity, the classification server may also directly generate a value according to the indicator data, which is not limited in this embodiment.

步骤406，将目标key值和目标value值存储至非关系数据库。Step 406, storing the target key value and the target value in a non-relational database.

在一种可能的实施方式中，上述步骤405和406可以由分类服务器中的Flink实时处理系统执行，并由Flink实时处理系统将生成的目标key值和目标value值存储至非关系数据库中。In a possible implementation, the above steps 405 and 406 may be executed by a Flink real-time processing system in the classification server, and the Flink real-time processing system stores the generated target key value and target value in a non-relational database.

可选的，目标key值还可以经过哈希(Hash)处理得到相应的哈希值，从而将哈希值域目标value值存储至非关系数据库中，以此提高后续查询的效率。Optionally, the target key value may also be hashed to obtain a corresponding hash value, thereby storing the hash value domain target value in a non-relational database to improve the efficiency of subsequent queries.

步骤407，将待存储数据存储至时序数据库。Step 407: store the data to be stored in the time series database.

由于生成目标key值过程中，可能仅基于待存储数据中的部分维度数据生成目标key值(即部分维度可能未存储到非关系数据库中)，因此为了保证历史数据的可回溯性，分类服务器还需要将待存储数据完整存储至时序数据库中，由时序数据库实现任意维度组合的数据查询。Since the target key value may be generated based on only some dimensional data in the data to be stored (that is, some dimensions may not be stored in the non-relational database), in order to ensure the traceability of historical data, the classification server also needs to store the data to be stored completely in the time series database, and the time series database can realize data query of any dimensional combination.

步骤408，若维度数据不属于维度集合，则确定待存储数据不符合目标查询规则。Step 408: If the dimension data does not belong to the dimension set, it is determined that the data to be stored does not meet the target query rule.

在一个示意性的例子中，维度集合为{渠道号+支付方式，支付方式+平台，支付方式+来源，支付方式+错误码}，由于待存储消息中提取到的维度数据中包括“业务+id”这一维度名称，因此，分类服务器确定待存储数据不符合目标查询规则。In an illustrative example, the dimension set is {channel number + payment method, payment method + platform, payment method + source, payment method + error code}. Since the dimension data extracted from the message to be stored includes the dimension name "business + id", the classification server determines that the data to be stored does not meet the target query rules.

步骤409，将待存储数据存储至时序数据库。Step 409: store the data to be stored in the time series database.

当待存储数据不符合目标查询规则时，表明待存储数据无需支持高并发、低延迟数据查询，因此分类服务器可以仅将待存储数据存储至时序数据库中，而不会将其存储至非关系数据库，从而避免因将数据存储至非关系数据库造成的数据膨胀。When the data to be stored does not meet the target query rules, it indicates that the data to be stored does not need to support high-concurrency, low-latency data queries. Therefore, the classification server can only store the data to be stored in the time series database instead of the non-relational database, thereby avoiding data expansion caused by storing the data in the non-relational database.

通过上述步骤，分类服务器完成对数据的分类存储。当接收到查询请求时，分类服务器进一步根据查询请求的并发性以及延迟需求，将查询请求发送至相应的数据库。Through the above steps, the classification server completes the classification storage of data. When receiving a query request, the classification server further sends the query request to the corresponding database according to the concurrency and delay requirements of the query request.

步骤410，接收查询请求，查询请求中包括查询维度数据。Step 410: receiving a query request, wherein the query request includes query dimension data.

在一种可能的实施方式中，分类服务器提供统一的查询入口，通过该查询入口，用户可以发起KV和多维度组合两种类型的查询请求。相应的，分类服务器接收到查询请求后，解析查询请求中包含的查询维度数据，从而根据该查询维度数据确定该查询请求的类型。In a possible implementation, the classification server provides a unified query entry through which users can initiate two types of query requests: KV and multi-dimensional combination. Accordingly, after receiving the query request, the classification server parses the query dimension data contained in the query request, thereby determining the type of the query request based on the query dimension data.

可选的，分类服务器检测查询维度数据是否符合目标查询规则，若符合，则确定该查询请求为KV类型的查询请求，并执行步骤411；若不符合，则确定该查询请求为多维度组合类型的查询请求，并执行步骤412。其中，检测查询维度数据是否符合目标查询规则的过程可以参考上述步骤中检测提取到的维度数据是否符合目标查询规则的过程，本实施例在此不再赘述。Optionally, the classification server detects whether the query dimension data conforms to the target query rule. If so, it determines that the query request is a KV type query request and executes step 411; if not, it determines that the query request is a multi-dimensional combination type query request and executes step 412. The process of detecting whether the query dimension data conforms to the target query rule can refer to the process of detecting whether the extracted dimension data conforms to the target query rule in the above steps, and this embodiment will not be repeated here.

步骤411，若查询维度数据符合目标查询规则，则根据查询维度数据向非关系数据库发送第一查询请求。Step 411: If the query dimension data meets the target query rule, a first query request is sent to the non-relational database according to the query dimension data.

当查询维度数据符合目标查询规则时，表明该查询请求为高并发、低延迟需求的查询请求，分类服务器即根据查询维度数据向非关系数据库发送第一查询请求。When the query dimension data meets the target query rule, it indicates that the query request is a query request with high concurrency and low latency requirements, and the classification server sends a first query request to the non-relational database according to the query dimension data.

在一种可能的实施方式中，分类服务器根据查询维度数据生成查询key值，从而向非关系数据库发送包含查询key值的第一查询请求，由非关系数据库根据该查询key值获取对应的value值，并将该value值反馈给分类服务器。In a possible implementation, the classification server generates a query key value based on the query dimension data, thereby sending a first query request containing the query key value to the non-relational database, and the non-relational database obtains the corresponding value based on the query key value and feeds the value back to the classification server.

由于非关系数据库以KV格式存储数据，因此非关系数据库能够实现高并发、低延迟的数据查询，从而缩短数据查询时间，提高查询结果的反馈速度。Since non-relational databases store data in KV format, they can achieve high-concurrency, low-latency data queries, thereby shortening data query time and improving the feedback speed of query results.

步骤412，若查询维度数据不符合目标查询规则，则根据查询维度数据向时序数据库发送第二查询请求。Step 412: If the query dimension data does not meet the target query rule, a second query request is sent to the time series database according to the query dimension data.

当查询维度数据符合目标查询规则时，表明该查询请求对并发性以及查询延迟的需求较低，分类服务器即根据查询维度数据向时序数据库发送第二查询请求。When the query dimension data meets the target query rule, it indicates that the query request has low requirements for concurrency and query latency, and the classification server sends a second query request to the time series database according to the query dimension data.

在一种可能的实施方式中，分类服务器根据查询维度数据构建查询时序数据库的查询参数，从而向时序数据库发送包含该查询参数的第二查询请求，由时序数据库根据该查询参数进行多维度组合查询，并将查询结果反馈给分类服务器。In a possible implementation, the classification server constructs a query parameter for querying the time series database based on the query dimension data, thereby sending a second query request containing the query parameter to the time series database, and the time series database performs a multi-dimensional combined query based on the query parameter and feeds back the query result to the classification server.

本实施例中，分类服务器基于预设过滤维度数据对提取到的待存储数据进行过滤，实现对待存储数据中异常维度数据的格式化处理，避免异常维度数据对后续数据入库造成影响。In this embodiment, the classification server filters the extracted data to be stored based on preset filtering dimension data, and formats abnormal dimensional data in the data to be stored to avoid the abnormal dimensional data from affecting subsequent data storage.

此外，本实施例中，分类服务器基于预设维度集合检测待存储数据是否目标查询规则，实现了将包含指定维度的待存储数据存储至非关系数据库中，有助于提高后续对指定维度数据的查询效率；同时，将符合目标查询规则的待存储数据存储至非关系数据库时，分类服务器将属于维度集合的维度数据确定为目标维度数据，并根据目标维度数据生成目标key值，在实现多维度组合查询的前提下，降低数据存储时的膨胀率，节约非关系数据库的存储空间。In addition, in this embodiment, the classification server detects whether the data to be stored meets the target query rule based on the preset dimension set, thereby storing the data to be stored containing the specified dimension in the non-relational database, which helps to improve the efficiency of subsequent queries on the specified dimension data; at the same time, when storing the data to be stored that meets the target query rule in the non-relational database, the classification server determines the dimensional data belonging to the dimension set as the target dimensional data, and generates a target key value based on the target dimensional data. On the premise of realizing multi-dimensional combined query, the expansion rate of data storage is reduced, saving storage space of the non-relational database.

另外，本实施例中，分类服务器基于目标查询规则以及查询请求中的查询维度参数，将查询请求路由至相应的数据库，既能够满足高并发、低延迟的数据查询，又能够满足多维度组合的数据查询需求。In addition, in this embodiment, the classification server routes the query request to the corresponding database based on the target query rules and the query dimension parameters in the query request, which can meet both high concurrency and low latency data query requirements and multi-dimensional combination data query requirements.

结合上述实施例提供的方法，在一个示意性的例子中，当应用于存储日志消息这一场景时，分类服务器向数据库中存储日志消息的过程如图6所示。In combination with the method provided in the above embodiment, in an illustrative example, when applied to the scenario of storing log messages, the process of the classification server storing log messages in the database is shown in FIG6 .

1、分类服务器从消息队列中获取实时写入的日志消息；1. The classification server obtains the real-time written log messages from the message queue;

2、分类服务器对日志消息进行解析，对异常维度进行格式化过滤处理；2. The classification server parses the log messages and formats and filters the abnormal dimensions;

3、分类服务器检测日志消息是否存在高并发查询的维度组合统计项，若存在，执行4和5，若不存在，执行6；3. The classification server detects whether the log message contains dimension combination statistics items with high concurrent queries. If so, execute 4 and 5. If not, execute 6.

4、若存在，分类服务器对解析后的日志消息，根据维度组合规则计算key值、指标值，并写入NoSQL数据库；4. If it exists, the classification server calculates the key value and indicator value of the parsed log message according to the dimension combination rules and writes them into the NoSQL database;

5、若存在，分类服务器则根据预定义的维度格式，将解析后的日志消息写入Druid数据库；5. If it exists, the classification server writes the parsed log message into the Druid database according to the predefined dimension format;

6、若不存在，分类服务器根据预定义的维度格式，将解析后的日志消息写入Druid数据库。6. If it does not exist, the classification server writes the parsed log message to the Druid database according to the predefined dimension format.

对应的，分类服务器从数据库中查询日志消息的过程如图7所示。Correspondingly, the process of the classification server querying the log message from the database is shown in FIG7 .

1、分类服务器通过统一查询入口接收客户端的查询请求；1. The classification server receives the client's query request through the unified query portal;

2、分类服务器对json格式的查询请求进行解析；2. The classification server parses the query request in JSON format;

3、分类服务器根据数据查询规则检测是否需要查询NoSQL数据库，若需要，执行4和5，若不需要，执行6和7；3. The classification server detects whether it needs to query the NoSQL database according to the data query rules. If so, execute 4 and 5. If not, execute 6 and 7.

4、若需要，分类服务器则将查询请求转换为需要查询的key值，并查询NoSQL数据库；4. If necessary, the classification server converts the query request into the key value to be queried and queries the NoSQL database;

5、分类服务器通过统一查询入口，将NoSQL数据库返回的查询结果返回给客户端；5. The classification server returns the query results returned by the NoSQL database to the client through a unified query portal;

6、若不需要，分类服务器则将查询请求转化为Druid多维组合查询所需要的请求格式，并查询Druid数据库；6. If not required, the classification server converts the query request into the request format required by Druid multi-dimensional combination query and queries the Druid database;

7、分类服务器通过统一查询入口，将Druid数据库返回的查询结果返回给客户端。7. The classification server returns the query results returned by the Druid database to the client through a unified query entry.

上述实施例中，以维度集合由人工配置为例进行说明，然而，随着维度数量的不断增多，人工配置的难度较高且效率较低。为了提高维度集合配置效率，分类服务器可以通过自学习机制自动生成维度集合。In the above embodiment, the dimension set is manually configured as an example. However, with the increasing number of dimensions, manual configuration is more difficult and less efficient. In order to improve the efficiency of dimension set configuration, the classification server can automatically generate dimension sets through a self-learning mechanism.

在一种可能的实施方式中，如图8所示，分类服务器生成维度集合可以包括如下步骤。In a possible implementation, as shown in FIG8 , the classification server may generate a dimension set by including the following steps.

步骤801，对于各个预设时间段，统计预设时段内待存储消息中各个维度数据的出现频率。Step 801: for each preset time period, count the occurrence frequency of each dimension data in the message to be stored within the preset time period.

由于消息中出现频率较高的维度通常是重要维度，且后续对重要维度进行高并发、低延迟查询的概率较高，因此，在一种可能的实施方式中，分类服务器每次获取到待存储消息后，即根据待存储消息中包含的维度数据(单个维度数据或者维度数据组合)，对各个维度数据的出现频率进行更新。当然，为了降低分类处理器的处理压力，也可以基于对待存储消息进行抽样，从而根据抽样待存储消息中包含的维度数据进行出现频率更新，本实施例对此不做限定。Since dimensions that appear frequently in messages are usually important dimensions, and the probability of subsequent high-concurrency, low-latency queries on important dimensions is high, in a possible implementation, each time the classification server obtains a message to be stored, it updates the frequency of occurrence of each dimension data according to the dimension data (single dimension data or dimension data combination) contained in the message to be stored. Of course, in order to reduce the processing pressure of the classification processor, the frequency of occurrence can also be updated based on sampling of the messages to be stored, and the dimension data contained in the sampled messages to be stored is updated, which is not limited in this embodiment.

可选的，分类服务器根据各条待存储消息的消息时间，统计各个预设时间段内各个维度数据的出现频率。其中，该预设时间段可以是划分出的预定时长的时间bucket，分别对应各自的bucketID，比如，预设时间段是2小时的时间bucket。Optionally, the classification server counts the frequency of occurrence of each dimension data in each preset time period according to the message time of each message to be stored. The preset time period may be a time bucket divided into a predetermined time length, each corresponding to a respective bucket ID, for example, the preset time period is a 2-hour time bucket.

步骤802，根据出现频率确定第一候选维度数据，第一候选维度数据的出现频率高于其它维度数据的出现频率。Step 802: determine first candidate dimension data according to the occurrence frequency, where the occurrence frequency of the first candidate dimension data is higher than the occurrence frequency of other dimension data.

进一步的，分类服务器根据各个维度数据对应的出现频率，从中确定出出现频率高于其它维度数据的第一候选维度数据。Furthermore, the classification server determines, based on the occurrence frequencies corresponding to the respective dimensional data, first candidate dimensional data having a higher occurrence frequency than other dimensional data.

在一种可能的实施方式中，分类服务器将出现频率Top N％(前N％)的维度数据确定为第一候选维度数据。比如，分类服务器将出现频率前5％的维度数据确定为第一候选维度数据。In a possible implementation, the classification server determines the dimension data with the top N% occurrence frequency as the first candidate dimension data. For example, the classification server determines the dimension data with the top 5% occurrence frequency as the first candidate dimension data.

步骤803，根据第一候选维度数据生成维度集合。Step 803: Generate a dimension set based on the first candidate dimension data.

进一步的，分类服务器根据第一候选维度数据生成维度集合。Further, the classification server generates a dimension set based on the first candidate dimension data.

在一种可能的实施方式中，对于确定出的第一候选维度数据，分类服务器进一步确定第一候选维度数据对应的指标数据是否符合指标条件，并在符合时根据第一候选维度条件生成维度集合。其中，该指标条件可以为正态分布条件。In a possible implementation, for the determined first candidate dimension data, the classification server further determines whether the indicator data corresponding to the first candidate dimension data meets the indicator condition, and generates a dimension set according to the first candidate dimension condition if it meets the indicator condition. The indicator condition may be a normal distribution condition.

在一种可能的实施方式中，分类服务器将第一候选维度数据对应的元组添加到维度集合中，该元组可以包括维度名称、维度值、指标名称和bucketID。In a possible implementation, the classification server adds a tuple corresponding to the first candidate dimension data to the dimension set, and the tuple may include a dimension name, a dimension value, an indicator name, and a bucket ID.

步骤804，对于各个预设时间段，当在预设时间段内接收到查询请求时，获取查询请求中包含的查询维度数据。Step 804: for each preset time period, when a query request is received within the preset time period, the query dimension data included in the query request is obtained.

除了将出现频率较高的维度数据确定为候选维度数据外，分类服务器还可以基于查询请求中的查询维度数据确定候选维度数据，即根据历史数据查询情况确定候选维度数据。In addition to determining dimension data with a high frequency of occurrence as candidate dimension data, the classification server can also determine candidate dimension data based on query dimension data in the query request, that is, determine candidate dimension data based on historical data query conditions.

在一种可能的实施方式中，分类服务器接收到查询请求时，获取查询请求中包含的查询维度数据，并根据查询请求对应的查询时间，确定该查询请求所属的预设时间段。其中，该预设时间段可以是划分出的预定时长的时间bucket，分别对应各自的bucketID，比如，预设时间段是2小时的时间bucket。In a possible implementation, when the classification server receives a query request, it obtains the query dimension data contained in the query request, and determines the preset time period to which the query request belongs according to the query time corresponding to the query request. The preset time period may be a time bucket divided into predetermined time lengths, each corresponding to a respective bucket ID, for example, the preset time period is a 2-hour time bucket.

步骤805，将查询维度数据确定为第二候选维度数据，并根据第二候选维度数据生成维度集合。Step 805: determine the query dimension data as second candidate dimension data, and generate a dimension set based on the second candidate dimension data.

进一步的，分类服务器将查询维度数据确定为第二候选维度数据，从而根据第二维度数据生成维度集合。Further, the classification server determines the query dimension data as second candidate dimension data, thereby generating a dimension set according to the second dimension data.

在一种可能的实施方式中，分类服务器将第二候选维度数据对应的元组添加到维度集合中，该元组可以包括(查询)维度名称、(查询)维度值、指标名称和bucketID。In a possible implementation, the classification server adds a tuple corresponding to the second candidate dimension data to the dimension set, where the tuple may include a (query) dimension name, a (query) dimension value, an indicator name, and a bucket ID.

需要说明的是，上述步骤801至803与步骤804至805可以择一执行，也可以同时执行，即分类服务器可以根据第一候选维度数据和第二候选维度数据的交集生成维度集合，本实施例对此不做限定。It should be noted that the above steps 801 to 803 and steps 804 to 805 can be performed one by one or simultaneously, that is, the classification server can generate a dimension set based on the intersection of the first candidate dimension data and the second candidate dimension data, and this embodiment does not limit this.

步骤806，当接收到查询请求，且查询请求中包含的查询维度数据属于维度集合，则对维度集合中该查询维度数据对应的查询计时器进行加一操作。Step 806: When a query request is received, and the query dimension data contained in the query request belongs to a dimension set, the query timer corresponding to the query dimension data in the dimension set is incremented by one.

为了统计维度集合中各个候选维度数据被查询次数，在一种可能的实施方式中，维度集合中各个候选维度数据还对应有各自的查询计数器(可以设置在候选维度数据对应的元组中)，其中，维度集合中第一候选维度数据的查询计数器的初始值为0，而第二候选维度数据的查询计数器的初始值为1。In order to count the number of times each candidate dimension data in the dimension set is queried, in a possible implementation, each candidate dimension data in the dimension set also corresponds to its own query counter (which can be set in the tuple corresponding to the candidate dimension data), wherein the initial value of the query counter of the first candidate dimension data in the dimension set is 0, and the initial value of the query counter of the second candidate dimension data is 1.

进一步的，当接收到查询请求时，分类服务器检测查询请求中包含的查询维度数据是否属于维度集合，并在查询维度数据属于维度集合时，对维度结合中该查询维度数据对应的查询计时器进行加一操作。Furthermore, when receiving a query request, the classification server detects whether the query dimension data contained in the query request belongs to the dimension set, and when the query dimension data belongs to the dimension set, adds one to the query timer corresponding to the query dimension data in the dimension combination.

步骤807，获取当前时刻之前预定时长内，维度集合中各个候选维度数据对应的查询计数器的计数值。Step 807, obtaining the count value of the query counter corresponding to each candidate dimension data in the dimension set within a predetermined time period before the current moment.

对于维度集合中的候选维度数据，若长时间内未接收到对针对该候选维度数据的查询请求，表明该查询候选维度数据的重要程度较低，因此为了避免维度集合过于庞大，分类服务器可以获取维度集合中各个候选维度数据对应的查询计数器的计数值，以便基于计数值确定是否需要删除该候选维度数据。For candidate dimension data in a dimension set, if no query request for the candidate dimension data is received for a long time, it indicates that the importance of the query candidate dimension data is relatively low. Therefore, in order to avoid the dimension set being too large, the classification server can obtain the count value of the query counter corresponding to each candidate dimension data in the dimension set, so as to determine whether the candidate dimension data needs to be deleted based on the count value.

在一种可能的实施方式中，为了保证时效性，分类服务器获取当前时刻之前预定时长内候选维度数据对应查询计数器的计数值。比如，分类服务器可以根据候选维度数据对应的bucketID，确定是否处于当前时刻之前预定时长内，比如，该预定时长可以为1天或1周。In a possible implementation, in order to ensure timeliness, the classification server obtains the count value of the query counter corresponding to the candidate dimension data within a predetermined time period before the current moment. For example, the classification server can determine whether it is within a predetermined time period before the current moment based on the bucket ID corresponding to the candidate dimension data, for example, the predetermined time period can be 1 day or 1 week.

步骤808，若候选维度数据对应的查询计数器的计数值小于计数阈值，则从维度集合中删除该候选维度数据。Step 808: If the count value of the query counter corresponding to the candidate dimension data is less than the count threshold, the candidate dimension data is deleted from the dimension set.

进一步的，分类服务器检测获取到的计数值是否小于计数阈值，若小于，则将对应的候选维度数据删除，若大于，则保留对应的候选维度数据。比如，该计数阈值可以为1、2、5等等。Furthermore, the classification server detects whether the obtained count value is less than the count threshold, and if so, deletes the corresponding candidate dimension data, and if so, retains the corresponding candidate dimension data. For example, the count threshold may be 1, 2, 5, and so on.

在一种可能的实施方式中，若候选维度数据对应查询计数器的计数值小于计数阈值，分类服务器可以将该候选维度数据标记为待删除，并设置删除延迟时长(比如1小时)；若在达到删除延迟时长时查询计数器的计数值仍旧小于计数阈值，则删除该候选维度数据；若在达到删除延迟时长时查询计数器的计数值大于计数阈值，则清楚该候选维度数据的待删除标记。In one possible implementation, if the count value of the query counter corresponding to the candidate dimension data is less than the count threshold, the classification server may mark the candidate dimension data as to be deleted and set a deletion delay period (for example, 1 hour); if the count value of the query counter is still less than the count threshold when the deletion delay period is reached, the candidate dimension data is deleted; if the count value of the query counter is greater than the count threshold when the deletion delay period is reached, the to-be-deleted mark of the candidate dimension data is cleared.

本实施例中，分类服务器基于各个维度数据的出现频率，和/或，基于查询请求中的查询维度数据，确定候选维度数据，进而根据候选维度数据生成维度集合，免去了用户手动配置维度集合的过程，简化了维度集合的配置效率和速度。In this embodiment, the classification server determines candidate dimension data based on the frequency of occurrence of each dimension data and/or based on the query dimension data in the query request, and then generates a dimension set based on the candidate dimension data, thereby eliminating the process of users manually configuring the dimension set and simplifying the configuration efficiency and speed of the dimension set.

另外，分类服务器根据接收到的查询请求，对候选维度数据的查询次数进行更新，并定时清除查询次数较少的候选维度数据，从而避免维度集合过于庞大。In addition, the classification server updates the query times of the candidate dimension data according to the received query requests, and periodically clears the candidate dimension data with fewer query times, thereby avoiding the dimension set from being too large.

将待存储数据存储至非关系数据库后，为了进一步压缩数据的存储空间，可以通过数据压缩机制对非关系数据库中存储的数据进行压缩。在一种可能的实施方式中，数据压缩处理可以由非关系数据库执行，也可以由分类服务器执行，下面以数据压缩处理由分类服务器执行为例进行说明。After the data to be stored is stored in the non-relational database, in order to further compress the storage space of the data, the data stored in the non-relational database can be compressed by a data compression mechanism. In a possible implementation, the data compression process can be performed by the non-relational database or by the classification server. The following takes the data compression process performed by the classification server as an example for explanation.

在一种可能的实施方式中，分类服务器获取非关系数据库中具有相同key值的数据，并检测各个key值对应的时间序列是否连续。若key值的时间序列不连续，分类服务器即采用图9所示的格式进行数据压缩。In a possible implementation, the classification server obtains data with the same key value in the non-relational database and detects whether the time series corresponding to each key value is continuous. If the time series of the key value is not continuous, the classification server uses the format shown in FIG9 to perform data compression.

图9所示的格式中，key值对应的value值部分包括FLG字段、LEN字段、INDEX字段以及数据点(datapoint)字段。其中，FLG字段为1bit，用于指示时间序列是否连续(0表示不连续，1表示连续)；LEN字段为3bit，用于指示每个时间序列对应datapoint的数据存储字节长度；INDEX字段为1字节，用于指示时间序列对应的索引；datapoint字段用于存储datapoint的数据，且INDEX字段与datapoint字段配对出现。In the format shown in Figure 9, the value portion corresponding to the key value includes the FLG field, LEN field, INDEX field and datapoint field. Among them, the FLG field is 1 bit, used to indicate whether the time series is continuous (0 means discontinuous, 1 means continuous); the LEN field is 3 bits, used to indicate the data storage byte length of each time series corresponding to the datapoint; the INDEX field is 1 byte, used to indicate the index corresponding to the time series; the datapoint field is used to store the data of the datapoint, and the INDEX field appears in pairs with the datapoint field.

采用上述格式进行数据压缩，可以将同一时间bucket内对应同一key值的value值压缩为一个value值，从而节约了存储空间。By using the above format for data compression, the value values corresponding to the same key value in the same time bucket can be compressed into one value, thereby saving storage space.

若key值的时间序列连续，分类服务器即采用图10所示的格式进行数据压缩。相较于图9所示的格式中，由于key值连续，因此value值可以不再存储INDEX字段，而是直接存储datapoint的数据，并将FLG字段的值设置为1。If the time series of the key value is continuous, the classification server uses the format shown in Figure 10 for data compression. Compared with the format shown in Figure 9, since the key value is continuous, the value value can no longer store the INDEX field, but directly store the datapoint data and set the value of the FLG field to 1.

可选的，若key值的时间序列连续，且datapoint的数据存储字节长度大于预设字节长度时，分类服务器可以进一步对相邻时间序列对应的datapoint数据进行异或运算(exclusive OR，XOR)，从而根据异或运算结果对datapoint数据进行压缩。Optionally, if the time series of the key value is continuous and the data point data storage byte length is greater than the preset byte length, the classification server can further perform an exclusive OR operation (exclusive OR, XOR) on the data point data corresponding to the adjacent time series, thereby compressing the data point data according to the exclusive OR operation result.

在一种可能的实施方式中，第一个时间序列对应的datapoint数据不经过压缩后直接存储，而对于第n个时间序列对应的datapoint数据(n大于等于2)，分类服务器对第n个时间序列和第n-1对应的datapoint数据进行异或运算，得到异或运算结果，并根据异或运算结果生成第n个时间序列对应datapoint数据的压缩值。In one possible implementation, the datapoint data corresponding to the first time series is directly stored without being compressed, and for the datapoint data corresponding to the nth time series (n is greater than or equal to 2), the classification server performs an XOR operation on the datapoint data corresponding to the nth time series and the n-1th time series to obtain an XOR operation result, and generates a compressed value of the datapoint data corresponding to the nth time series based on the XOR operation result.

在一个示意性的例子中，分类服务器获取到13:00:00以及13:00:01对应的原始datapoint(previous value和current value)分别如表一所示。In an illustrative example, the classification server obtains the original datapoints (previous value and current value) corresponding to 13:00:00 and 13:00:01 as shown in Table 1.

表一Table I

13:00:0013:00:00 previous valueprevious value 00000000 0011110000000000 00111100 13:00:0113:00:01 current valuecurrent value 00000000 0011110100000000 00111101 current xorcurrent xor 00000000 0000000100000000 00000001

通过对previous value和current value进行异或运算计算得到异或运算结果(current xor)为0000000000000001。进一步的，根据异或运算结果0000000000000001，分类服务器生成current value对应的压缩值如表二所示。By performing an XOR operation on the previous value and the current value, the XOR operation result (current XOR) is 0000000000000001. Further, according to the XOR operation result 0000000000000001, the classification server generates a compression value corresponding to the current value as shown in Table 2.

表二Table II

其中，压缩值的第1比特位用于指示异或运算结果是否为0(不为0设置为1，为0设置为0)，第2比特位用于指示异或运算结果中前导0的位数与上一异或运算结果中前导0的位数是否相同(相同设置为1，不同设置为0)，第3至4比特位用于指示异或运算结果中前导0的位数(以4比特位为单位，11表示前导0的位数为3×4位)，第5至8比特位用于存储异或运算结果中除前导0之外的部分(即0001)。示意性的，采用上述方法得到的压缩数据如图11所示。Among them, the first bit of the compressed value is used to indicate whether the XOR operation result is 0 (if not 0, it is set to 1, and if it is 0, it is set to 0), the second bit is used to indicate whether the number of leading 0 bits in the XOR operation result is the same as the number of leading 0 bits in the previous XOR operation result (if they are the same, they are set to 1, and if they are different, they are set to 0), the third to fourth bits are used to indicate the number of leading 0 bits in the XOR operation result (in units of 4 bits, 11 means that the number of leading 0 bits is 3×4 bits), and the fifth to eighth bits are used to store the part of the XOR operation result other than the leading 0 (i.e., 0001). Schematically, the compressed data obtained by the above method is shown in Figure 11.

图12是本申请一个示例性实施例提供的数据存储装置的结构框图，该装置可以设置于上述实施例所述的分类服务器，如图12所示，该装置包括：FIG. 12 is a structural block diagram of a data storage device provided by an exemplary embodiment of the present application. The device may be provided in the classification server described in the above embodiment. As shown in FIG. 12 , the device includes:

第一获取模块1201，用于获取待存储消息；The first acquisition module 1201 is used to acquire the message to be stored;

提取模块1202，用于提取所述待存储消息中的待存储数据；An extraction module 1202 is used to extract the data to be stored in the message to be stored;

第一存储模块1203，用于当所述待存储数据符合目标查询规则时，将所述待存储数据存储至非关系数据库，并将所述待存储数据存储至时序数据库，其中，符合所述目标查询规则的数据的并发查询需求以及查询延迟需求高于不符合所述目标查询规则的数据；The first storage module 1203 is used to store the data to be stored in a non-relational database and store the data to be stored in a time series database when the data to be stored meets the target query rule, wherein the concurrent query requirement and query delay requirement of the data meeting the target query rule are higher than those of the data not meeting the target query rule;

第二存储模块1204，用于当所述待存储数据不符合所述目标查询规则时，将所述待存储数据存储至所述时序数据库。The second storage module 1204 is configured to store the data to be stored in the time series database when the data to be stored does not meet the target query rule.

可选的，，所述待存储数据包括维度数据和指标数据，所述非关系数据库采用键值key-value结构存储数据；Optionally, the data to be stored includes dimension data and indicator data, and the non-relational database uses a key-value structure to store data;

所述第一存储模块1203，包括：The first storage module 1203 includes:

生成单元，用于当所述待存储数据符合所述目标查询规则时，根据所述维度数据生成目标key值，并根据所述指标数据生成目标value值；A generating unit, configured to generate a target key value according to the dimension data and a target value according to the indicator data when the data to be stored meets the target query rule;

存储单元，用于将所述目标key值和所述目标value值存储至所述非关系数据库。A storage unit is used to store the target key value and the target value in the non-relational database.

可选的，所述目标查询规则中包含维度集合，所述维度集合用于指示符合所述目标查询规则的数据的维度特征；Optionally, the target query rule includes a dimension set, and the dimension set is used to indicate dimensional features of data that meets the target query rule;

所述生成单元，用于：The generating unit is used for:

若所述维度数据属于所述维度集合，则确定所述待存储数据符合所述目标查询规则；If the dimension data belongs to the dimension set, determining that the data to be stored meets the target query rule;

将属于所述维度集合的所述维度数据确定为目标维度数据，并根据所述目标维度数据生成所述目标key值；Determine the dimension data belonging to the dimension set as target dimension data, and generate the target key value according to the target dimension data;

根据目标粒度对所述指标数据进行聚合处理，生成所述目标value值。The indicator data is aggregated according to the target granularity to generate the target value.

可选的，所述装置还包括第一生成模块，所述第一生成模块，用于：Optionally, the device further includes a first generating module, wherein the first generating module is configured to:

对于各个预设时间段，统计所述预设时段内所述待存储消息中各个维度数据的出现频率；For each preset time period, counting the occurrence frequency of each dimension data in the message to be stored within the preset time period;

根据所述出现频率确定第一候选维度数据，所述第一候选维度数据的出现频率高于其它维度数据的出现频率；Determine first candidate dimension data according to the occurrence frequency, where the occurrence frequency of the first candidate dimension data is higher than the occurrence frequency of other dimension data;

根据所述第一候选维度数据生成所述维度集合。The dimension set is generated according to the first candidate dimension data.

可选的，所述装置还包括第二生成模块，所述第二生成模块，用于：Optionally, the device further includes a second generating module, wherein the second generating module is configured to:

对于各个预设时间段，当在所述预设时间段内接收到查询请求时，获取所述查询请求中包含的查询维度数据；For each preset time period, when a query request is received within the preset time period, the query dimension data included in the query request is obtained;

第二生成模块，用于将所述查询维度数据确定为第二候选维度数据，并根据所述第二候选维度数据生成所述维度集合。The second generating module is used to determine the query dimension data as second candidate dimension data, and generate the dimension set according to the second candidate dimension data.

可选的，所述装置还包括：Optionally, the device further comprises:

第一接收模块，用于当接收到查询请求，且所述查询请求中包含的查询维度数据属于所述维度集合，则对所述维度集合中所述查询维度数据对应的查询计时器进行加一操作；A first receiving module, configured to, when receiving a query request, and the query dimension data contained in the query request belongs to the dimension set, add one to a query timer corresponding to the query dimension data in the dimension set;

第二获取模块，用于获取当前时刻之前预定时长内，所述维度集合中各个候选维度数据对应的所述查询计数器的计数值；A second acquisition module is used to obtain the count value of the query counter corresponding to each candidate dimension data in the dimension set within a predetermined time period before the current moment;

删除模块，用于当候选维度数据对应的查询计数器的计数值小于计数阈值时，从所述维度集合中删除所述候选维度数据。The deletion module is used to delete the candidate dimension data from the dimension set when the count value of the query counter corresponding to the candidate dimension data is less than the count threshold.

可选的，所述装置还包括：Optionally, the device further comprises:

第二接收模块，用于接收查询请求，所述查询请求中包括查询维度数据；A second receiving module, configured to receive a query request, wherein the query request includes query dimension data;

第一发送模块，用于若所述查询维度数据符合所述目标查询规则，则根据所述查询维度数据向所述非关系数据库发送第一查询请求；A first sending module, configured to send a first query request to the non-relational database according to the query dimension data if the query dimension data meets the target query rule;

第二发送模块，用于若所述查询维度数据不符合所述目标查询规则，则根据所述查询维度数据向所述时序数据库发送第二查询请求。The second sending module is used to send a second query request to the time series database according to the query dimension data if the query dimension data does not meet the target query rule.

可选的，所述装置还包括：Optionally, the device further comprises:

格式化模块，用于若所述待存储数据中包含预设过滤维度数据，则对所述待存储数据中与所述预设过滤维度数据匹配的数据进行格式化。The formatting module is used for formatting the data in the data to be stored that matches the preset filtering dimension data if the data to be stored includes the preset filtering dimension data.

综上所述，本申请实施例中，获取到待存储消息后，提取待存储消息中的待存储数据，并检测待存储数据是否符合目标查询规则，在待存储数据符合目标查询规则时，将待存储数据存储至非关系数据库以及时序数据库，在待存储数据不符合目标查询规则时，将待存储数据存储至时序数据库；由于非关系数据库能够提供高并发以及低延迟的数据查询服务，因此将对并发查询需求以及查询延迟需求较高的数据存储至非关系数据库后，能够提高后续数据查询的速度；同时，将对并发查询需求以及查询延迟需求较低的数据存储至时序数据库后，方便后续数据查询过程中进行多维度组合查询。To summarize, in an embodiment of the present application, after obtaining the message to be stored, the data to be stored in the message to be stored is extracted, and it is detected whether the data to be stored meets the target query rules. When the data to be stored meets the target query rules, the data to be stored is stored in a non-relational database and a time series database; when the data to be stored does not meet the target query rules, the data to be stored is stored in a time series database; since non-relational databases can provide high concurrency and low-latency data query services, storing data with higher concurrent query requirements and query delay requirements in the non-relational database can increase the speed of subsequent data queries; at the same time, storing data with lower concurrent query requirements and query delay requirements in the time series database facilitates multi-dimensional combined queries in subsequent data query processes.

需要说明的是：上述实施例提供的数据存储装置，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将服务器的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的数据存储装置与数据存储方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that the data storage device provided in the above embodiment is only illustrated by the division of the above functional modules. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the server is divided into different functional modules to complete all or part of the functions described above. In addition, the data storage device provided in the above embodiment belongs to the same concept as the data storage method embodiment. The specific implementation process is detailed in the method embodiment and will not be repeated here.

请参考图13，其示出了本申请一个示例性实施例提供的服务器的结构示意图。具体来讲：所述服务器1300包括中央处理单元(CPU)1301、包括随机存取存储器(RAM)1302和只读存储器(ROM)1303的系统存储器1304，以及连接系统存储器1304和中央处理单元1301的系统总线1305。所述服务器1300还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)1306，和用于存储操作系统1313、应用程序1314和其他程序模块1315的大容量存储设备1307。Please refer to Figure 13, which shows a schematic diagram of the structure of a server provided by an exemplary embodiment of the present application. Specifically, the server 1300 includes a central processing unit (CPU) 1301, a system memory 1304 including a random access memory (RAM) 1302 and a read-only memory (ROM) 1303, and a system bus 1305 connecting the system memory 1304 and the central processing unit 1301. The server 1300 also includes a basic input/output system (I/O system) 1306 that helps transmit information between various devices in the computer, and a large-capacity storage device 1307 for storing an operating system 1313, an application program 1314, and other program modules 1315.

所述基本输入/输出系统1306包括有用于显示信息的显示器1308和用于用户输入信息的诸如鼠标、键盘之类的输入设备1309。其中所述显示器1308和输入设备1309都通过连接到系统总线1305的输入输出控制器1310连接到中央处理单元1301。所述基本输入/输出系统1306还可以包括输入输出控制器1310以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地，输入输出控制器1310还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 1306 includes a display 1308 for displaying information and an input device 1309 such as a mouse and a keyboard for user inputting information. The display 1308 and the input device 1309 are connected to the central processing unit 1301 through an input/output controller 1310 connected to the system bus 1305. The basic input/output system 1306 may also include an input/output controller 1310 for receiving and processing inputs from a plurality of other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input/output controller 1310 also provides output to a display screen, a printer, or other types of output devices.

所述大容量存储设备1307通过连接到系统总线1305的大容量存储控制器(未示出)连接到中央处理单元1301。所述大容量存储设备1307及其相关联的计算机可读介质为服务器1300提供非易失性存储。也就是说，所述大容量存储设备1307可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。The mass storage device 1307 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305. The mass storage device 1307 and its associated computer-readable media provide non-volatile storage for the server 1300. That is, the mass storage device 1307 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.

不失一般性，所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术，CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然，本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1304和大容量存储设备1307可以统称为存储器。Without loss of generality, the computer readable medium may include computer storage media and communication media. Computer storage media include volatile and non-volatile, removable and non-removable media implemented by any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cassettes, magnetic tapes, disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the above. The above-mentioned system memory 1304 and mass storage device 1307 can be collectively referred to as memory.

存储器存储有一个或多个程序，一个或多个程序被配置成由一个或多个中央处理单元1301执行，一个或多个程序包含用于实现上述方法的指令，中央处理单元1301执行该一个或多个程序实现上述各个方法实施例提供的方法。The memory stores one or more programs, and the one or more programs are configured to be executed by one or more central processing units 1301. The one or more programs contain instructions for implementing the above-mentioned methods. The central processing unit 1301 executes the one or more programs to implement the methods provided by the above-mentioned various method embodiments.

根据本申请的各种实施例，所述服务器1300还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器1300可以通过连接在所述系统总线1305上的网络接口单元1311连接到网络1312，或者说，也可以使用网络接口单元1311来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present application, the server 1300 may also be connected to a remote computer on a network through a network such as the Internet. That is, the server 1300 may be connected to a network 1312 through a network interface unit 1311 connected to the system bus 1305, or the network interface unit 1311 may be used to connect to other types of networks or remote computer systems (not shown).

所述存储器还包括一个或者一个以上的程序，所述一个或者一个以上程序存储于存储器中，所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由分类服务器所执行的步骤。The memory also includes one or more programs, which are stored in the memory and include steps executed by the classification server in the method provided in the embodiment of the present application.

本申请实施例还提供一种计算机可读存储介质，该可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述任一实施例所述的数据存储方法。An embodiment of the present application also provides a computer-readable storage medium, which stores at least one instruction, at least one program, a code set or an instruction set. The at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by a processor to implement the data storage method described in any of the above embodiments.

本申请还提供了一种计算机程序产品，当计算机程序产品在服务器上运行时，使得计算机执行上述各个方法实施例提供的数据存储方法。The present application also provides a computer program product. When the computer program product runs on a server, the computer executes the data storage methods provided by the above-mentioned various method embodiments.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，该计算机可读存储介质可以是上述实施例中的存储器中所包含的计算机可读存储介质；也可以是单独存在，未装配入终端中的计算机可读存储介质。该计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集，所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述任一方法实施例所述的数据存储方法。A person skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, which can be a computer-readable storage medium contained in the memory in the above embodiments; or a computer-readable storage medium that exists independently and is not assembled in the terminal. The computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, the at least one program, the code set or instruction set is loaded and executed by the processor to implement the data storage method described in any of the above method embodiments.

可选地，该计算机可读存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取记忆体(RAM，Random Access Memory)、固态硬盘(SSD，Solid State Drives)或光盘等。其中，随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance RandomAccess Memory)和动态随机存取存储器(DRAM，Dynamic Random Access Memory)。上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。Optionally, the computer readable storage medium may include: a read-only memory (ROM), a random access memory (RAM), a solid state drive (SSD), or an optical disk. Among them, the random access memory may include a resistance random access memory (ReRAM) and a dynamic random access memory (DRAM). The serial numbers of the above embodiments of the present application are only for description and do not represent the advantages and disadvantages of the embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。A person skilled in the art will understand that all or part of the steps to implement the above embodiments may be accomplished by hardware or by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a disk or an optical disk, etc.

以上所述仅为本申请的较佳实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above description is only a preferred embodiment of the present application and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A method of data storage, the method comprising:

Acquiring a message to be stored;

Extracting data to be stored in the message to be stored, wherein the data to be stored comprises dimension data and index data;

If the dimension data belong to a dimension set, determining that the data to be stored accord with a target query rule; determining the dimension data belonging to the dimension set as target dimension data, and combining the target dimension data to generate a target key value; performing aggregation treatment on the index data according to the target granularity to generate a target value; storing the target key value and the target value into a non-relational database, wherein the non-relational database stores data by adopting a key-value structure; storing the data to be stored into a time sequence database, wherein the target query rule is used for indicating the dimension characteristics of data with high concurrency and low delay query requirements, the concurrency query requirements of the data conforming to the target query rule and the query delay requirements are higher than those of the data not conforming to the target query rule, the dimension set is contained in the target query rule, and the dimension set is used for indicating the dimension characteristics of the data conforming to the target query rule;

if the dimension data does not belong to the dimension set, determining that the data to be stored does not accord with the target query rule; storing the data to be stored into the time sequence database;

Receiving a query request, wherein the query request comprises query dimension data;

If the query dimension data accords with the target query rule, a first query request is sent to the non-relational database according to the query dimension data;

And if the query dimension data does not accord with the target query rule, sending a second query request to the time sequence database according to the query dimension data.

2. The method of claim 1, wherein after the retrieving the message to be stored, the method further comprises:

counting the occurrence frequency of each dimension data in the message to be stored in the preset time period for each preset time period;

determining first candidate dimension data according to the occurrence frequency, wherein the occurrence frequency of the first candidate dimension data is higher than that of other dimension data;

And generating the dimension set according to the first candidate dimension data.

3. The method of claim 1, wherein after the retrieving the message to be stored, the method further comprises:

for each preset time period, when a query request is received within the preset time period, acquiring query dimension data contained in the query request;

And determining the query dimension data as second candidate dimension data, and generating the dimension set according to the second candidate dimension data.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

when a query request is received and query dimension data contained in the query request belongs to the dimension set, adding one operation to a query counter corresponding to the query dimension data in the dimension set;

Acquiring count values of the query counters corresponding to each candidate dimension data in the dimension set within a preset time before the current moment;

And if the count value of the query counter corresponding to the candidate dimension data is smaller than the count threshold value, deleting the candidate dimension data from the dimension set.

5. A method according to any one of claims 1 to 3, wherein after said extracting the data to be stored in the message to be stored, the method further comprises:

If the data to be stored contains preset filtering dimension data, formatting the data matched with the preset filtering dimension data in the data to be stored.

6. A data storage device, the device comprising:

The first acquisition module is used for acquiring the message to be stored;

The extraction module is used for extracting data to be stored in the message to be stored, wherein the data to be stored comprises dimension data and index data;

The first storage module is used for determining that the data to be stored accords with a target query rule when the dimension data belongs to a dimension set; determining the dimension data belonging to the dimension set as target dimension data, and combining the target dimension data to generate a target key value; performing aggregation treatment on the index data according to the target granularity to generate a target value; storing the target key value and the target value into a non-relational database, wherein the non-relational database stores data by adopting a key-value structure; storing the data to be stored into a time sequence database, wherein the target query rule is used for indicating the dimension characteristics of data with high concurrency and low delay query requirements, the concurrency query requirements of the data conforming to the target query rule and the query delay requirements are higher than those of the data not conforming to the target query rule, the dimension set is contained in the target query rule, and the dimension set is used for indicating the dimension characteristics of the data conforming to the target query rule;

the second storage module is used for determining that the data to be stored does not accord with the target query rule when the dimension data does not belong to the dimension set; storing the data to be stored into the time sequence database;

The second receiving module is used for receiving a query request, wherein the query request comprises query dimension data;

The first sending module is used for sending a first query request to the non-relational database according to the query dimension data if the query dimension data accords with the target query rule;

And the second sending module is used for sending a second query request to the time sequence database according to the query dimension data if the query dimension data does not accord with the target query rule.

7. The apparatus of claim 6, further comprising a first generation module to:

8. The apparatus of claim 6, further comprising a second generation module configured to:

And the second generation module is used for determining the query dimension data as second candidate dimension data and generating the dimension set according to the second candidate dimension data.

9. A server comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the data storage method of any one of claims 1 to 5.

10. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the data storage method of any one of claims 1 to 5.