[go: up one dir, main page]

CN120316150A - Time series data query method, computing device and computer storage medium - Google Patents

Time series data query method, computing device and computer storage medium

Info

Publication number
CN120316150A
CN120316150A CN202410057345.4A CN202410057345A CN120316150A CN 120316150 A CN120316150 A CN 120316150A CN 202410057345 A CN202410057345 A CN 202410057345A CN 120316150 A CN120316150 A CN 120316150A
Authority
CN
China
Prior art keywords
query
time series
keyword
keywords
series data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410057345.4A
Other languages
Chinese (zh)
Inventor
刘志鹏
沈春辉
李飞勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou AliCloud Feitian Information Technology Co Ltd
Original Assignee
Hangzhou AliCloud Feitian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou AliCloud Feitian Information Technology Co Ltd filed Critical Hangzhou AliCloud Feitian Information Technology Co Ltd
Priority to CN202410057345.4A priority Critical patent/CN120316150A/en
Priority to PCT/IB2025/050208 priority patent/WO2025153921A1/en
Publication of CN120316150A publication Critical patent/CN120316150A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种时序数据的查询方法、计算设备及计算机存储介质。其中,时序数据的查询方法包括:接收时序数据查询请求;确定时序数据查询请求中的至少两个查询关键词;基于至少两个查询关键词的区分度指数,从至少两个查询关键词中确定目标查询关键词,其中,区分度指数是通过对存储至时序数据库中的时序数据包含的关键词的频次进行统计得到的;从时序数据库中查询与目标查询关键词相匹配的第一查询结果;利用至少两个查询关键词中,除目标查询关键词以外的至少一个查询关键词,对第一查询结果进行过滤,得到查询结果。本发明实施例提供的技术方案降低针对时序数据库的查询次数,从而降低了时序数据库的I/O量。

The embodiments of the present application provide a method for querying time series data, a computing device, and a computer storage medium. The method for querying time series data includes: receiving a time series data query request; determining at least two query keywords in the time series data query request; determining a target query keyword from at least two query keywords based on the discrimination index of the at least two query keywords, wherein the discrimination index is obtained by counting the frequency of keywords contained in the time series data stored in the time series database; querying the first query result that matches the target query keyword from the time series database; using at least one query keyword other than the target query keyword among the at least two query keywords to filter the first query result to obtain a query result. The technical solution provided by the embodiments of the present invention reduces the number of queries to the time series database, thereby reducing the I/O volume of the time series database.

Description

Time sequence data query method, computing device and computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a time sequence data query method, computing equipment and a computer storage medium.
Background
Time series data (TIME SERIES DATA) refers to a set of data recorded in time series, where each data point in the time series data is associated with a particular time stamp. Time series data is typically used to describe time-varying data such as temperature, humidity, load, flow, stock prices, etc. The time series data is widely applied to various fields including finance, energy, intelligent manufacturing, the internet of things, medical health and the like.
The timing data may be stored in a timing database, which may be a data management system for providing access to the timing data.
To mine the value behind time series data, it is often necessary to query time series data in a time series database. When a time series data query is performed, there are usually a plurality of query keywords in the query request. In the related art, a time sequence database generally performs a data query according to each query keyword to obtain a plurality of query results, and then generates a final query result by taking an intersection of the plurality of query results.
As can be seen from the above description, the time-series data query method in the related art has the technical problems of large Input/Output (I/O) quantity and low efficiency.
Disclosure of Invention
The embodiment of the invention provides a time sequence data query method, a time sequence data query device, computing equipment and a computer storage medium.
In a first aspect, an embodiment of the present invention provides a method for querying time-series data, including:
Receiving a time sequence data query request;
determining at least two query keywords in the time sequence data query request;
Determining target query keywords from the at least two query keywords based on a discrimination index of the at least two query keywords, wherein the discrimination index is obtained by counting the frequency of keywords contained in the time sequence data stored in the time sequence database;
querying a first query result matched with the target query keyword from the time sequence database;
and filtering the first query result by utilizing at least one query keyword except the target query keyword in the at least two query keywords to obtain a query result.
In a second aspect, an embodiment of the present invention provides a device for querying time-series data, including:
The request receiving module is used for receiving a time sequence data query request;
A first keyword determining module, configured to determine at least two query keywords in the time-series data query request;
The second keyword determining module is used for determining target query keywords from the at least two query keywords based on the distinguishing degree indexes of the at least two query keywords, wherein the distinguishing degree indexes are obtained by counting the frequencies of the keywords contained in the time sequence data stored in the time sequence database;
the matching module is used for inquiring a first inquiry result matched with the target inquiry keyword from the time sequence database;
and the query module is used for filtering the first query result by utilizing at least one query keyword except the target query keyword in the at least two query keywords to obtain a query result.
In a third aspect, embodiments of the present invention provide a computing device, including a processing component and a storage component;
the storage component stores one or more computer instructions, and the one or more computer instructions are used for being called and executed by the processing component to realize the time sequence data query method provided by the embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium storing a computer program, where the computer program when executed by a computer implements a method for querying time-series data provided by the embodiment of the present invention.
In the embodiment of the invention, the target keywords are determined from the plurality of query keywords, the target keywords are utilized to query from the time sequence database to obtain the first query result, and then other query keywords are utilized to filter the first query result to obtain the technical scheme of the query result, so that the query times of the time sequence database can be reduced when the time sequence data is queried, and the I/O (input/output) quantity of the time sequence database is reduced.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic diagram of a time series provided by one embodiment of the present invention;
FIG. 2 schematically illustrates a flowchart of a method for querying time series data according to an embodiment of the present invention;
Fig. 3 schematically illustrates a schematic diagram of a method for querying time-series data according to an embodiment of the present invention;
FIG. 4 schematically illustrates a block diagram of a time series data querying device according to an embodiment of the present invention;
FIG. 5 schematically illustrates a block diagram of a computing device provided by one embodiment of the invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
First, terms related to one or more embodiments of the present specification will be explained.
Object Tag (Tag) for identifying a time sequence in the time series data, a specific object for which an index item indicating the time series data is aimed. For example, an object tag may be a sub-category of data under a specified metric. The label key and the corresponding label value together determine the object label. For example, an object tag may be composed of a tag key (TagKey) and a corresponding tag value (TagValue), e.g., "city (TagKey) =hangzhou (TagValue)" is an object tag, and "machine room= A, IP = 172.220.110.1" is an object tag. The label key is in a one-to-one or one-to-many relationship with the label value. When the label key and the label value are the same, the label key is the same, and the label value is different, and the label is not the same. For example, in the time series data of monitoring weather, the specified metric may be "air temperature", the object label is "city=hangzhou", where "city" is a label key, and "hangzhou" is a label value, and the monitored object in the time series data is air temperature of hangzhou city.
A label key (TagKey) for determining the object label together with the corresponding label value. A tag key, which may be used to indicate a monitoring-specified object type (which together with a corresponding tag value defines a specific object under that object type), such as country, province, city, machine room, IP, etc.
The label value TagValue is a value corresponding to the label key. For example, when the tag key is "country", the tag value may be "china".
Metrics (metrics) are indicators of the monitored data, such as wind force and temperature.
Metric values corresponding to metrics such as 15 level (wind) and 20 ℃.
Timestamp (Timestamp) the point in time at which the data point was generated.
Data points (Data points) each metric value collected at time intervals (e.g., consecutive time stamps) for a certain index of the object (e.g., may be defined by a metric and a label) is a Data Point. That is, "one metric+n object tags N > =1) +one timestamp+one metric value" defines one data point.
Time series TIME SERIES for example, as shown in fig. 1, includes a number of data points generated at different time stamps (timestamps). In fig. 1, a Device number (Device) and a Region (Region) may be tag keys, respectively, and F07a1260 and North-cn may be tag values of the Device number and the Region, respectively.
For example, the time series may be a description of a certain index (e.g., defined by metrics and tags) for a certain monitored object. "one metric+n object tag KV combinations N > =1)" is defined as a time series, and an increase in the data value generated on a certain time series does not result in an increase in the time series.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Fig. 2 schematically illustrates a flowchart of a method for querying time series data according to an embodiment of the present invention, as shown in fig. 2, the method for querying time series data may specifically include the following steps:
201, receiving a time sequence data query request;
202, determining at least two query keywords in a time sequence data query request;
203, determining target query keywords from at least two query keywords based on a discrimination index of the at least two query keywords, wherein the discrimination index is obtained by counting the frequency of the keywords contained in the time sequence data stored in the time sequence database;
204, inquiring a first inquiry result matched with the target inquiry keyword from the time sequence database;
205, filtering the first query result by using at least one query keyword except the target query keyword in the at least two query keywords to obtain a query result.
According to an embodiment of the present invention, the time series data inquiry request is a request for inquiring time series data, which is transmitted by a user, for example, inquiring data in a certain period of time, sorting in time series, and the like.
According to the embodiment of the invention, when a user generates a data query request, keywords of the time sequence data which the user desires to search can be written into the query request, and the keywords can be query keywords, so that the time sequence database can be matched with the corresponding time sequence data through the query keywords.
According to an embodiment of the present invention, the time series data is a data set arranged in time series. It generally relates to the concept of time, such as time stamps or time intervals, for describing the change of events, behaviors or phenomena at different points in time.
According to the embodiment of the invention, the time sequence data query request can be used for requesting to query one or more pieces of data meeting the query condition from the data set of the time sequence data.
In an embodiment of the present invention, the query condition may include, for example, that the one or more pieces of data include query keywords carried in the query request.
According to an embodiment of the present invention, one piece of data may refer to data recorded at a certain time, and in the time series data shown in fig. 1, each row may represent one piece of data. For example, the second row may represent data recorded at a time of 2020-10-24-10:01. Specifically, the time series data shown in fig. 1 may represent time series data generated by detecting the temperature of the device, and the second line data may represent the temperature of the device located in the north area at the time point 2020-10-24-10:01 with the device number F07a1260 being 12.1 degrees.
According to embodiments of the present invention, query keywords may be used to match tag values in the time series data.
In fig. 1, according to an embodiment of the present invention, both the device number and the region may be tag keys, F07a1260 may be tag values corresponding to the device number tag keys, and the north region may be tag values corresponding to the region tag keys.
When the time sequence database queries by utilizing the query keywords, the target query keywords can be matched with the tag values in the time sequence data, so that one or more pieces of data containing the tag keys identical to the query keywords are queried from the time sequence data stored in the time sequence database. For example, when the target query keyword is "north region", the query keyword may be matched with tag values each contained in a plurality of pieces of data in the time series data shown in fig. 1. Since the second and fourth rows and that data contain the "north area" tag value in the time series data, the second and fourth rows of data can be regarded as the first query result.
According to the embodiment of the invention, after the first query result is obtained by the query, other query keywords except the target query keyword can be utilized to filter the first query result, so that the final query result can be obtained by screening the first query result.
In the embodiment of the invention, the target keywords are determined from the plurality of query keywords, the target keywords are utilized to query from the time sequence database to obtain the first query result, and then other query keywords are utilized to filter the first query result to obtain the technical scheme of the query result, so that the query times of the time sequence database can be reduced when the time sequence data is queried, and the I/O (input/output) quantity of the time sequence database is reduced.
According to an embodiment of the present invention, the method for querying time series data further includes:
and respectively determining the distinguishing degree index of each of at least two query keywords.
According to an embodiment of the present invention, the determining, based on the discrimination index of the at least two query keywords, the target query keyword from the at least two query keywords may be specifically implemented as:
and determining the query keyword with the largest distinguishing index from the at least two query keywords as the target query keyword.
According to the embodiment of the invention, the discrimination index can be inversely related to the frequency of the keywords included in the time series data stored in the time series database, that is, the more times a keyword is stored in the time series database, the smaller the discrimination index is, the less times a keyword is stored in the time series database, and the larger the discrimination index is.
According to the embodiment of the invention, the query keyword with the largest discrimination index is determined as the target query keyword, and the first query result with smaller data volume can be queried from the time sequence database in the time sequence database queried by using the target query keyword, so that the matching times of the query keyword can be reduced and the query efficiency of the time sequence data can be improved when the first query result is screened by using other query keywords.
According to the embodiment of the invention, the query of the first query result matched with the target query keyword from the time sequence database can be specifically realized as follows:
Inquiring identification information of time sequence data containing target inquiry keywords from an inverted index table of the time sequence database;
acquiring a tag value contained in the time sequence data based on the identification information;
Determining a tag key contained in the time sequence data from the forward file based on the identification information;
and correspondingly combining the label key and the label value according to the identification information to generate a first query result.
According to an embodiment of the invention, the timing database may include an inverted index table, a forward file, and a data table. The inverted index table and the forward file are data structures for managing and storing time series data, wherein the inverted index table can record time series data identifications corresponding to each piece of time series data. The forward file may record tag keys included in each piece of time series data, and the tag keys may be obtained according to the time series data identifier obtained by querying from the reverse index table.
According to an embodiment of the present invention, a tag value included in time series data may be acquired from a TSF (TIME SERIES FILE ). In practical applications, TSFs may be read and processed using various programming languages or software libraries, such as pandas libraries in Python, ts packets in R language, etc. Through the tools, the time series files can be loaded into the memory and analyzed and processed.
According to the embodiment of the invention, according to the identification information, the label key and the label value are correspondingly combined, and the generation of the first query result can be specifically realized as follows:
And combining at least one label key and label value corresponding to the same identification information.
According to the embodiment of the invention, based on the identification information, the tag value contained in the acquired time sequence data can be specifically realized as follows:
Inquiring whether the time sequence data corresponding to the identification information contains a tag value from a time sequence database;
if not, returning the query result to be empty;
If yes, acquiring the tag value.
According to the embodiment of the invention, before inquiring the positive file, whether the time sequence data corresponding to the inquiring key word has the tag value written in or not can be firstly inquired, and if no data is written in the time sequence data corresponding to the inquiring key word, the inquiring result can be directly returned to be empty without continuously inquiring the positive file, so that invalid I/O is avoided.
According to an embodiment of the present invention, determining at least two query keywords in the time series data query request includes:
determining a plurality of query keywords contained in the time sequence data query request;
From a plurality of query terms, at least two query terms for which the query condition is identical are determined.
According to embodiments of the present invention, the plurality of query keywords may be, for example, an (AND) relationship, OR (OR) relationship, a MUST (MUST) relationship, an exclude (NOT) relationship, OR the like. Wherein the (AND) relationship indicates that all query keywords MUST be satisfied at the same time to match the result, OR the (OR) relationship indicates that any one query keyword is satisfied to match the result, the (MUST) relationship indicates that some query keywords MUST be satisfied AND other query keywords are optional, AND the (NOT) relationship is excluded that some query keywords MUST NOT exist to match the result.
In the embodiment of the invention, the query condition is that at least two query keywords are in one group or in multiple groups. For example, query requests include query terms A, B, C and D, where query term A and query term B are in a relationship with, and query term C and query term D are in a relationship with. In this case, the query keyword a and the query keyword B are one set, and the query keyword C and the query keyword D are one set.
According to the embodiment of the present invention, for example, in the above example, the query keyword a and the query keyword B are in a group, the query keyword C and the query keyword D are in a group, the query method provided by the embodiment of the present invention may be executed with respect to the query keyword a and the query keyword B to obtain a query result, then the query method provided by the embodiment of the present invention is executed with respect to the query keyword C and the query keyword D to obtain a query result, and finally the two query results are processed, for example, an intersection set, a union set, etc., to obtain a final query result.
According to an embodiment of the present invention, the method for querying time series data further includes:
counting occurrence frequencies of a plurality of keywords contained in time sequence data which are requested to be stored in a time sequence database in a preset time period respectively;
and determining a distinguishing degree index of each keyword according to the occurrence frequency, wherein the keywords comprise the at least two query keywords.
According to the embodiment of the invention, in a preset time period, the time sequence data stored in the time period can be searched through a query statement, and the query statement can use a query language similar to SQL (structured query language) for example so as to realize screening and aggregation operation on the time sequence data. The search results may then be traversed and the frequency of occurrence of each keyword therein counted. This may be accomplished by writing a program, such as a statistical library or custom code using Python or other programming language. Specifically, each time series data may be sequentially traversed in time series, and the number of occurrences of all keywords therein may be accumulated for each time series data and recorded in a statistical table. And calculating the distinguishing degree index of each keyword according to the statistical result. The discrimination index may be calculated based on TF-IDF or the like algorithms to measure the uniqueness and importance of a certain keyword throughout the data set. Specifically, the frequency of occurrence of each keyword in the entire data set and the frequency of occurrence in different time series data may be calculated, and the discrimination index may be calculated using a corresponding formula.
According to the embodiment of the invention, based on the target query keyword, the query of the time sequence database to obtain the first query result can be specifically implemented as follows:
Determining whether the number of time sequence data contained in the first query result is larger than a preset threshold value;
If yes, filtering the first query result by utilizing the query keywords except the target query keyword in the at least two query keywords to obtain a query result;
If not, inquiring from the time sequence database based on each inquiry keyword to obtain sub inquiry results corresponding to each inquiry keyword, and acquiring intersections of the sub inquiry results to obtain inquiry results.
According to the embodiment of the invention, before the post-screening operation provided by the embodiment of the invention is executed, the data quantity contained in the initial query result obtained by utilizing the target query keyword can be judged first, if the data quantity is larger than the preset threshold value, the post-screening operation can be executed, namely, the initial query result is screened by utilizing other query keywords to obtain the query result, if the data quantity contained in the initial query result is smaller, the time sequence database can be queried according to each query keyword to obtain sub-query results matched with each query keyword, and then intersection sets are obtained on a plurality of sub-query results to obtain the query result.
According to the embodiment of the invention, unnecessary sub-query operations can be avoided by firstly judging the data amount contained in the initial query result. If the data quantity in the initial query result exceeds a preset threshold value, the post screening operation can be directly executed, so that the number of times of additional database query is reduced, and the query efficiency is improved. Further, for the case of smaller data quantity in the initial query result, the time sequence database is queried according to each query keyword, and then intersection is taken for the sub-query result, so that the result meeting a plurality of query conditions can be obtained. The method can reduce redundant data return, only returns data meeting all query conditions, and improves the accuracy of query results.
In the embodiment of the invention, the preset threshold value can be flexibly set by a person skilled in the art according to actual application requirements, and the embodiment of the invention does not limit the specific value of the preset threshold value.
According to the embodiment of the present invention, the statistics of the occurrence frequency packets of the plurality of keywords contained in the time sequence data requested to be stored in the time sequence database in the preset time period respectively may be specifically implemented as follows:
Determining whether a plurality of keywords contained in the time sequence data to be stored exist in the bitmap according to the time sequence data to be stored;
For the keywords existing in the bitmap, updating a counter of an index position corresponding to the keywords, wherein the counter is used for recording the occurrence frequency of the keywords;
For a keyword that is not present in the bitmap, an index position corresponding to the keyword is created in the bitmap.
According to an embodiment of the present invention, a bitmap (bit map) is a data structure that may be used to represent membership of a collection in bits.
According to an embodiment of the present invention, the presence or absence of keywords may be represented by a bitmap data structure. If the keyword exists in the bitmap, the keyword is indicated to be indexed, and when the keyword is requested to be stored in the time sequence database again, the counter for updating the index position corresponding to the keyword can be directly recorded, and the occurrence frequency of the keyword is recorded by adding one to the counter.
According to an embodiment of the present invention, if a keyword does not exist in the bitmap, which indicates that the keyword is not indexed, the keyword does not exist in the time sequence database, at this time, an index position corresponding to the keyword may be created in the bitmap, and a corresponding counter may be initialized to 1.
According to embodiments of the invention, index locations of bitmaps may be generated using hash values of keywords or other mapping algorithms to ensure uniqueness of the index locations.
According to the embodiment of the invention, by recording the keywords by using the bitmap, the existence of the keywords can be rapidly judged in the bitmap, the existing keywords are updated by the counter, and meanwhile, the new keywords are created at the index position and the counter is initialized, so that the occurrence condition and the frequency of a plurality of keywords in time sequence data can be efficiently managed and inquired.
Fig. 3 schematically illustrates a schematic diagram of a method for querying time-series data according to an embodiment of the present invention.
In fig. 3, a plurality of query keywords, such as query keyword a, query keyword b, and query keyword c, may be included in the time-series data query request 301. In the embodiment of the invention, the query keyword a, the query keyword b and the query keyword c can be in a relation with each other.
After determining the query terms, a discrimination index for each query term may be determined separately. Specifically, the discrimination index can be obtained by counting the occurrence frequency of the keywords included in the time series data of the time series database at the time of the request writing. Specifically, the degree of distinction index may be inversely related to the occurrence frequency, and the more the number of occurrences of a certain keyword is, the smaller the degree of distinction index is, and the fewer the number of occurrences is, the larger the degree of distinction index is.
The occurrence frequency of each keyword may be stored in the bitmap, and then the query keyword a, the query keyword b, and the query keyword c included in the query request may be ranked based on the occurrence frequency of each keyword stored in the bitmap, for example, the ranking may be from large to small or from small to large. Thus, the target query keyword with the largest differentiation index can be determined from the plurality of query keywords.
After determining the target query keyword, the time series database may be queried first using the target query keyword.
Specifically, the inverted index table may be first queried using the target query keyword to obtain identification information of the time series data including the target query keyword. The tag value contained in the time series data may then be obtained from the TSF (TIME SERIES FILE ) based on the identification information. If the TSF is queried without hit data, the query is ended, the direct return result is null, and the subsequent query program is not needed to be executed continuously. If the TSF is queried, hit data exist, and the data quantity is larger than a preset threshold value, the query of the forward file can be continued, the forward file can be recorded with tag keys contained in each piece of time sequence data, and the tag keys can be acquired according to the time sequence data identification obtained by query in the reverse index table.
Further, since the tag value is recorded in the TSF, after the TSF is queried to obtain a query result, the query result of the TSF may be filtered by using other query keywords to obtain the tag value of the identification information including the query keyword a, the query keyword b and the query keyword c, and then the final query result is obtained based on the query positive file.
Fig. 4 schematically illustrates a block diagram of a time series data query device according to an embodiment of the present invention, and as shown in fig. 4, the time series data query device 400 may specifically include:
a request receiving module 401, configured to receive a time-series data query request;
A first keyword determination module 402, configured to determine at least two query keywords in the time-series data query request;
a second keyword determining module 403, configured to determine a target query keyword from at least two query keywords based on a discrimination index of the at least two query keywords, where the discrimination index is obtained by counting frequencies of keywords included in the time series data stored in the time series database;
A matching module 404, configured to query a first query result matching the target query keyword from the time-series database;
and the query module 405 is configured to filter the first query result by using at least one query keyword except the target query keyword in the at least two query keywords, so as to obtain a query result.
According to an embodiment of the present invention, the apparatus for querying time series data further includes:
And the distinguishing degree determining module is used for determining the distinguishing degree index of each of at least two query keywords respectively.
According to an embodiment of the present invention, the second keyword determination module 403 may include:
and the target keyword determining unit is used for determining the query keyword with the largest discrimination index as the target query keyword.
According to an embodiment of the invention, the matching module 404 includes:
the identification inquiry sub-module is used for inquiring the identification information of the time sequence data containing the target inquiry key words from the inverted index table of the time sequence database;
the label value acquisition sub-module is used for acquiring a label value contained in the time sequence data based on the identification information;
the keyword query sub-module is used for determining label keys contained in the time sequence data from the forward file based on the identification information;
And the result query sub-module is used for correspondingly combining the label key and the label value according to the identification information to generate a first query result.
According to an embodiment of the present invention, the tag value acquisition submodule includes:
the tag value determining unit is used for inquiring whether the time sequence data corresponding to the identification information contains a tag value or not from the time sequence database;
And the result returning unit is used for returning the query result to be null when the time sequence data corresponding to the identification information does not contain the tag value, and acquiring the tag value when the time sequence data corresponding to the identification information contains the tag value.
According to an embodiment of the present invention, the time series data inquiry apparatus further includes:
a third keyword determining module, configured to determine a plurality of query keywords included in the time-series data query request;
And the fourth related keyword determining module is used for determining at least two query keywords with which the query condition is the same from the plurality of query keywords.
According to an embodiment of the present invention, the time series data inquiry apparatus further includes:
The statistics module is used for respectively counting the occurrence frequency of a plurality of keywords contained in the time sequence data which are requested to be stored in the time sequence database in a preset time period;
And the distinguishing degree determining module is used for determining the distinguishing degree index of each keyword according to the occurrence frequency, wherein the keywords comprise the at least two query keywords.
According to an embodiment of the present invention, the query module 405 includes:
A threshold determining submodule, configured to determine whether the number of time-series data included in the initial query result is greater than a preset threshold;
The filtering module is used for filtering the first query result by utilizing query keywords except the target query keywords in at least two query keywords under the condition that the number of time sequence data contained in the initial query result is larger than a preset threshold value, so as to obtain a query result;
the parallel query module is used for obtaining sub-query results corresponding to each query keyword from the time sequence database based on each query keyword under the condition that the number of time sequence data contained in the first query result is smaller than a preset threshold;
And the result determining module is used for acquiring intersections of the plurality of sub-query results to obtain the query results.
According to an embodiment of the invention, the statistics module comprises:
The bitmap determining submodule is used for determining whether a plurality of keywords contained in the time sequence data to be stored exist in the bitmap aiming at the time sequence data to be stored;
an updating sub-module, configured to update, for a keyword already existing in the bitmap, a counter of an index position corresponding to the keyword, where the counter is configured to record occurrence frequency of the keyword;
And the creation sub-module is used for creating an index position corresponding to the keyword in the bitmap for the keyword which does not exist in the bitmap.
The time-series data query device of fig. 4 may execute the time-series data query method of the embodiment shown in fig. 2, and its implementation principle and technical effects are not repeated. The specific manner in which the respective modules and units perform the operations in the above-described time series data query apparatus of the embodiment have been described in detail in the embodiment related to the method, and will not be described in detail herein.
In one possible design, the time series data query device provided by the embodiment of the present invention may be implemented as a computing device, as shown in fig. 5, where the computing device may include a storage component 501 and a processing component 502;
the storage component 501 stores one or more computer instructions for the processing component 502 to invoke and execute, so as to implement the method for querying time-series data provided by the embodiment of the present invention.
Of course, the computing device may necessarily include other components, such as input/output interfaces, communication components, and the like. The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by the cloud computing platform, and at this time, the computing device may be a cloud server, and the processing component, the storage component, and the like may be a base server resource rented or purchased from the cloud computing platform.
When the computing device is a physical device, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device.
The embodiment of the invention also provides a computer readable storage medium which stores a computer program, and the computer program can realize the time sequence data query method provided by the embodiment of the invention when being executed by a computer.
The embodiment of the invention also provides a computer program product, which comprises a computer program, wherein the computer program can realize the time sequence data query method provided by the embodiment of the invention when being executed by a computer.
Wherein the processing components of the respective embodiments above may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described above.
The storage component is configured to store various types of data to support operation in the device. The memory component may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (10)

1.一种时序数据的查询方法,其特征在于,包括:1. A method for querying time series data, comprising: 接收时序数据查询请求;Receive time series data query requests; 确定所述时序数据查询请求中的至少两个查询关键词;Determine at least two query keywords in the time series data query request; 基于所述至少两个查询关键词的区分度指数,从所述至少两个查询关键词中确定目标查询关键词,其中,所述区分度指数是通过对存储至时序数据库中的时序数据包含的关键词的频次进行统计得到的;Based on the discrimination index of the at least two query keywords, determining a target query keyword from the at least two query keywords, wherein the discrimination index is obtained by counting the frequency of the keywords contained in the time series data stored in the time series database; 从所述时序数据库中查询与所述目标查询关键词相匹配的第一查询结果;Querying the time series database for a first query result that matches the target query keyword; 利用所述至少两个查询关键词中,除所述目标查询关键词以外的至少一个查询关键词,对所述第一查询结果进行过滤,得到查询结果。The first query result is filtered using at least one query keyword other than the target query keyword among the at least two query keywords to obtain a query result. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 分别确定所述至少两个查询关键词各自的区分度指数;respectively determining a discrimination index of each of the at least two query keywords; 所述基于所述至少两个查询关键词的区分度指数,从所述至少两个查询关键词中确定目标查询关键词包括:The determining of the target query keyword from the at least two query keywords based on the discrimination indexes of the at least two query keywords comprises: 将所述至少两个查询关键词中,区分度指数最大的查询关键词确定为所述目标查询关键词。The query keyword with the largest discrimination index among the at least two query keywords is determined as the target query keyword. 3.根据权利要求1所述的方法,其特征在于,所述从所述时序数据库中查询与所述目标查询关键词相匹配的第一查询结果包括:3. The method according to claim 1, wherein the step of searching the time series database for a first query result that matches the target query keyword comprises: 从所述时序数据库的倒排索引表中查询包含所述目标查询关键词的时序数据的标识信息;Querying the inverted index table of the time series database for identification information of the time series data containing the target query keyword; 基于所述标识信息,获取所述时序数据包含的标签值;Based on the identification information, obtaining a label value included in the time series data; 基于所述标识信息,从正排文件中确定所述时序数据所包含的标签键;Based on the identification information, determine the tag key included in the time series data from the forward file; 按照所述标识信息,将所述标签键和所述标签值对应组合,生成所述第一查询结果。According to the identification information, the tag key and the tag value are correspondingly combined to generate the first query result. 4.根据权利要求3所述的方法,其特征在于,所述基于所述标识信息,获取所述时序数据包含的标签值包括:4. The method according to claim 3, wherein obtaining the label value contained in the time series data based on the identification information comprises: 从时序数据库中查询所述标识信息对应的时序数据是否包含标签值;Querying a time series database to determine whether the time series data corresponding to the identification information contains a label value; 若否,返回查询结果为空;If not, the returned query result is empty; 若是,获取所述标签值。If so, obtain the tag value. 5.根据权利要求1所述的方法,其特征在于,所述确定所述时序数据查询请求中的至少两个查询关键词包括:5. The method according to claim 1, wherein determining at least two query keywords in the time series data query request comprises: 确定所述时序数据查询请求中包含的多个查询关键词;Determine a plurality of query keywords included in the time series data query request; 从所述多个查询关键词中,确定查询条件是与的至少两个查询关键词。From the plurality of query keywords, it is determined that the query conditions are at least two query keywords. 6.根据权利要求1所述的方法,其特征在于,所述方法还包括:6. The method according to claim 1, characterized in that the method further comprises: 分别统计在预设时间周期内请求存储至所述时序数据库的时序数据所包含的多个关键词的出现频次;Counting the occurrence frequencies of multiple keywords contained in the time series data requested to be stored in the time series database within a preset time period respectively; 根据所述出现频次,确定每个关键词的区分度指数,其中,所述多个关键词包括所述至少两个查询关键词。According to the occurrence frequency, a discrimination index of each keyword is determined, wherein the multiple keywords include the at least two query keywords. 7.根据权利要求1所述的方法,其特征在于,所述利用所述至少两个查询关键词中,除所述目标查询关键词以外的至少一个查询关键词,对所述第一查询结果进行过滤,得到查询结果包括:7. The method according to claim 1, characterized in that the filtering of the first query result by using at least one query keyword other than the target query keyword among the at least two query keywords to obtain the query result comprises: 确定所述第一查询结果中包含的时序数据的数量是否大于预设阈值;Determine whether the amount of time series data included in the first query result is greater than a preset threshold; 若是,利用所述至少两个查询关键词中,除所述目标查询关键词以外的查询关键词,对所述第一查询结果进行过滤,得到所述查询结果;If yes, use the query keywords other than the target query keyword among the at least two query keywords to filter the first query result to obtain the query result; 若否,分别基于每个查询关键词,从所述时序数据库中查询得到与每个查询关键词对应的子查询结果;对多个子查询结果取交集,得到所述查询结果。If not, based on each query keyword, a sub-query result corresponding to each query keyword is queried from the time series database; and an intersection of multiple sub-query results is taken to obtain the query result. 8.根据权利要求6所述的方法,其特征在于,所述分别统计在预设时间周期内请求存储至所述时序数据库的时序数据所包含的多个关键词的出现频次包括:8. The method according to claim 6, wherein the respectively counting the occurrence frequencies of a plurality of keywords contained in the time series data requested to be stored in the time series database within a preset time period comprises: 针对待存储时序数据,确定位图中是否存在所述待存储时序数据包含的多个关键词;For the time series data to be stored, determining whether a plurality of keywords included in the time series data to be stored exist in the bitmap; 对于已经在所述位图中存在的关键词,更新所述关键词对应的索引位置的计数器,所述计数器用于记录所述关键词的出现频次;For a keyword that already exists in the bitmap, updating a counter of an index position corresponding to the keyword, the counter being used to record the frequency of occurrence of the keyword; 对于未在所述位图中存在的关键词,在所述位图中创建与该关键词相对应的索引位置。For a keyword that does not exist in the bitmap, an index position corresponding to the keyword is created in the bitmap. 9.一种计算设备,其特征在于,包括处理组件以及存储组件;9. A computing device, comprising a processing component and a storage component; 所述存储组件存储一个或多个计算机指令;所述一个或多个计算机指令用以被所述处理组件调用执行,实现如权利要求1至8任一项所述的时序数据的查询方法。The storage component stores one or more computer instructions; the one or more computer instructions are used to be called and executed by the processing component to implement the query method for time series data as described in any one of claims 1 to 8. 10.一种计算机存储介质,其特征在于,存储有计算机程序,所述计算机程序被计算机执行时,实现如权利要求1至8任一项所述的时序数据的查询方法。10. A computer storage medium, characterized in that a computer program is stored therein, and when the computer program is executed by a computer, the method for querying time series data according to any one of claims 1 to 8 is implemented.
CN202410057345.4A 2024-01-15 2024-01-15 Time series data query method, computing device and computer storage medium Pending CN120316150A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202410057345.4A CN120316150A (en) 2024-01-15 2024-01-15 Time series data query method, computing device and computer storage medium
PCT/IB2025/050208 WO2025153921A1 (en) 2024-01-15 2025-01-09 Time series data query method, computing device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410057345.4A CN120316150A (en) 2024-01-15 2024-01-15 Time series data query method, computing device and computer storage medium

Publications (1)

Publication Number Publication Date
CN120316150A true CN120316150A (en) 2025-07-15

Family

ID=96333347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410057345.4A Pending CN120316150A (en) 2024-01-15 2024-01-15 Time series data query method, computing device and computer storage medium

Country Status (2)

Country Link
CN (1) CN120316150A (en)
WO (1) WO2025153921A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006113506A2 (en) * 2005-04-15 2006-10-26 Perfect Market Technologies, Inc. Search engine with suggestion tool and method of using same
US7996393B1 (en) * 2006-09-29 2011-08-09 Google Inc. Keywords associated with document categories
US10977284B2 (en) * 2016-01-29 2021-04-13 Micro Focus Llc Text search of database with one-pass indexing including filtering
US10474674B2 (en) * 2017-01-31 2019-11-12 Splunk Inc. Using an inverted index in a pipelined search query to determine a set of event data that is further limited by filtering and/or processing of subsequent query pipestages
CN114817293B (en) * 2022-03-31 2022-11-08 华能信息技术有限公司 Data query method and system based on distributed SQL

Also Published As

Publication number Publication date
WO2025153921A1 (en) 2025-07-24

Similar Documents

Publication Publication Date Title
US12335297B2 (en) Systems and methods for rapidly generating security ratings
CN111046034B (en) Method and system for managing memory data and maintaining data in memory
CN112445889B (en) Method for storing and retrieving data and related equipment
CN112527783A (en) Data quality probing system based on Hadoop
CN110659282B (en) Data route construction method, device, computer equipment and storage medium
US8108411B2 (en) Methods and systems for merging data sets
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
CN111858607B (en) Data processing method, device, electronic equipment and computer readable medium
CN104769586A (en) Profiling data with location information
CN111913860B (en) Operation behavior analysis method and device
CN110888981A (en) Title-based document clustering method and device, terminal equipment and medium
US20200089798A1 (en) High volume-velocity time series data ingestion, analysis and reporting method and system
CN107330031B (en) Data storage method and device and electronic equipment
CN116756216A (en) Time series data acquisition method and device, electronic equipment and storage medium
CN114741368A (en) Log data statistical method based on artificial intelligence and related equipment
CN116126864A (en) Index construction method, data query method and related equipment
CN117874082A (en) Method for searching associated dictionary data and related components
CN115510289B (en) Data cube configuration method and device, electronic equipment and storage medium
CN111125045B (en) Lightweight ETL processing platform
CN110928868B (en) Vehicle data retrieval method, device and computer-readable storage medium
CN112632058A (en) Track determination method, device and equipment and storage medium
CN112286995B (en) Data analysis method, device, server, system and storage medium
US20160004749A1 (en) Search system and search method
US10749764B2 (en) Device for generating and searching sensor tag data in real time
CN120316150A (en) Time series data query method, computing device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination