[go: up one dir, main page]

CN117271529A - Index processing method, device and storage medium - Google Patents

Index processing method, device and storage medium Download PDF

Info

Publication number
CN117271529A
CN117271529A CN202311547108.8A CN202311547108A CN117271529A CN 117271529 A CN117271529 A CN 117271529A CN 202311547108 A CN202311547108 A CN 202311547108A CN 117271529 A CN117271529 A CN 117271529A
Authority
CN
China
Prior art keywords
index
hash
time
target
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311547108.8A
Other languages
Chinese (zh)
Other versions
CN117271529B (en
Inventor
梁成辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202311547108.8A priority Critical patent/CN117271529B/en
Publication of CN117271529A publication Critical patent/CN117271529A/en
Application granted granted Critical
Publication of CN117271529B publication Critical patent/CN117271529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an index processing method, index processing equipment and a storage medium. In the embodiment of the application, on the basis of partitioning the time index table in a time range, a hash mode is introduced to perform secondary partition, and hash calculation can be performed on a target time range corresponding to the index partition obtained in the time range partition mode; and dividing the index partition into a plurality of sub-partitions according to the hash result range of the target time range. In this way, when the time index of the time series data table is written in the time series index table, the time index of the time series data table can be stored in the plurality of sub-partitions according to the hash result of the time index of the time series data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.

Description

Index processing method, device and storage medium
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to an index processing method, apparatus, and storage medium.
Background
In the present informatization age, application and research of information data have become a trend, and databases are widely used for storage, management, maintenance and query of data based on advantages thereof.
A database is a computer software system that stores and manages data in data structures. The index of a data table is a separate, physical storage structure that orders the values of one or more columns of a database table, providing pointers to the data values stored in the specified columns of the table.
The time information of the time series data is often used to construct an index of the time series data table. The query types of the global index of the time type are usually based on the time Range (Range) query, and therefore, the partition strategy of the global index of the time type is also based on the time Range (Range) partition. When a time-type column save time is used, the current time is usually saved in a time index column and written into a database, and partition hot spots are easily generated, which easily causes load imbalance of a storage system.
Disclosure of Invention
Aspects of the present application provide an index processing method, apparatus, and storage medium for reducing the probability of occurrence of partition hot spots.
The embodiment of the application provides an index processing method, which comprises the following steps:
acquiring a target time range of a first index partition to be created; the first index partition is a partition of an index table of the time sequence data table;
Creating a first index partition corresponding to the target time range;
carrying out hash calculation on the target time range to obtain a hash result range of the target time range;
dividing the first index partition into a plurality of sub-partitions according to the hash result range of the target time range, so as to store the time index in the plurality of sub-partitions according to the hash result of the time index of the time sequence data table.
The embodiment of the application also provides an index processing method, which comprises the following steps:
acquiring a time index to be stored from a time sequence data table;
determining a target index partition corresponding to the time index to be stored from at least one index partition according to the time index to be stored; the at least one index partition is partitioned according to a time range;
performing hash calculation on the time index to be stored to obtain a hash result of the time index to be stored;
determining at least one target sub-partition from a plurality of sub-partitions corresponding to the target index partition according to the hash result of the time index to be stored; the plurality of sub-partitions are obtained by dividing the hash result range of the time range corresponding to the target index partition;
And storing the time index to be stored in the at least one target sub-partition.
The embodiment of the application also provides electronic equipment, which comprises: a memory and a processor; wherein the memory is used for storing a computer program;
the processor is coupled to the memory for executing the computer program for performing the steps in the index processing methods described above.
Embodiments also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the index processing methods described above.
In the embodiment of the application, on the basis of partitioning the time index table in a time range, a hash mode is introduced to perform secondary partition, and hash calculation can be performed on a target time range corresponding to the index partition obtained in the time range partition mode; and dividing the index partition into a plurality of sub-partitions according to the hash result range of the target time range. In this way, when the time index of the time series data table is written in the time series index table, the time index of the time series data table can be stored in the plurality of sub-partitions according to the hash result of the time index of the time series data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a flow chart of an index processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a range-based partitioning according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a range partitioning splitting process according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a two-stage partitioning process provided in an embodiment of the present application;
fig. 5 is a schematic diagram of hash-based partitioning according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a splitting process of index partitions according to an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating an index processing method according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of another index partition splitting process according to an embodiment of the present disclosure;
FIG. 9 is a flowchart illustrating another index processing method according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a hot spot traffic writing process according to an embodiment of the present disclosure;
fig. 11 is a schematic diagram of a single-value hotspot being generated based on a hash partition and being unable to perform hotspot splitting according to the embodiment of the present application;
Fig. 12 is a schematic diagram of a hash principle of a vector partition key according to an embodiment of the present application;
fig. 13 is a schematic diagram of a hash space segmentation process according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a hash segmentation process of a single-value hotspot provided in an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The partition strategy of the global index of the time type is also based on time Range (Range) partition. When a time-type column save time is used, the current time is usually saved in a time index column and written into a database, so that partition hot spots are easily generated, and the load of a storage system is unbalanced.
In order to reduce the occurrence probability of the partitioning hot spots, in some embodiments of the present application, a hash mode is introduced to perform a secondary partition on the basis of partitioning the time index table in a time range, and hash calculation can be performed on a target time range corresponding to the index partition obtained in the time range partition mode; and dividing the index partition into a plurality of sub-partitions according to the hash result range of the target time range. In this way, when the time index of the time series data table is written in the time series index table, the time index of the time series data table can be stored in the plurality of sub-partitions according to the hash result of the time index of the time series data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
It should be noted that: like reference numerals denote like objects in the following figures and embodiments, and thus once an object is defined in one figure or embodiment, further discussion thereof is not necessary in the subsequent figures and embodiments.
Fig. 1 is a flowchart of an index processing method according to an embodiment of the present application. As shown in fig. 1, the index processing method mainly includes:
101. in response to an index creation request for a time series data table, a default time range is acquired.
102. And creating an initial index partition corresponding to the default time range.
103. And carrying out hash calculation on the default time range to obtain a hash result range of the default time range.
104. The initial index partition is divided into a plurality of sub-partitions according to the hash result range of the default time range, so that the time index is stored in the plurality of sub-partitions according to the hash result of the time index of the time sequence data table.
In the present embodiment, a time series data table is used to store time series data. Timing data refers to a series of data that is continuously generated based on a certain frequency, and may include: time information and other application data. Different application scenarios correspond to different application data. For example, in an online shopping scenario, the application data may include: user Identification (ID), order information, etc. In an online monitoring scenario, application data may include: identification of the monitored object, metric (Metric), and Metric value.
The index processing method provided by the embodiment of the application can be suitable for the database kernel. Database kernels refer to software modules, applications, services, or entity devices that provide data source-based query functionality. The database kernel may include: resolvers, optimizers, and executors, etc. The resolver, the optimizer and the executor may be located in the same physical machine or may be deployed in different physical machines.
In the embodiment of the application, the database is a transparent distributed database. Transparent distributed databases refer to the distributed databases automatically hiding partition columns from users and their definitions, allowing users to build tables or index directly using a single grammar. The kernel of the distributed database can automatically select a proper partition column for horizontal partition for a newly built data table or a newly built index, and a user does not feel the partition column. A partition column is a column of an index table or global index table for horizontal splitting.
In the present embodiment, the time column of the time series data table is used as an index column, that is, an index of the time series data table is constructed using time information of the time series data table, which may be referred to as a time index. Accordingly, the time column may be used as a partition column to horizontally partition the time index table.
In this embodiment, the index table may be partitioned by a time Range (Range). As shown in FIG. 2, the time range 0x0000-0xFFFF may be divided into a plurality of partitions, such as partition 1-partition 3 in FIG. 2, and so on. When the time index is stored, the time range 0x001-0x2001 to be stored can be routed to different partitions according to the time ranges corresponding to the partitions. For example, 0x001 is routed to partition 1;0x2001 is routed to partition 3, etc.
In this embodiment, the index partition is obtained by dividing the index table. As shown in fig. 3, each index table starts with only one index partition, but as the time index is inserted, the database kernel will split the table horizontally according to a certain rule, and split into two index partitions. As more and more rows in the index table, more and more index partitions are generated, and these index partitions can be distributed and stored on multiple physical machines. The plural means 2 or more than 2.
Due to the specificity of the time data, the latest time data writing is always routed to the last index partition, so that the writing pressure of the last index partition in the index partitions based on the time range cannot be effectively balanced. Especially during the peak traffic hours, the rapid increase in the data volume of the latest time data causes hot spot problems in the last index partition. For example, in an online shopping scene, the user order quantity increases sharply on the day of a large-scale preferential activity, so that the time sequence data of the day has a flow peak, the time sequence index of the day also has the flow peak, and the index partition corresponding to the day has a hot spot problem.
In the embodiment of the application, in order to realize read-write balance of the time type global index, reduce the probability of occurrence of hot spot problem of the index partition, for the time type global index table to be created, a two-level partition scheme based on a time range and a Hash (Hash) mode is used as shown in fig. 4. The Range (Range) partition key of the primary partition is a time index column, and one index partition P0 is initially defaulted.
Specifically, in connection with fig. 1 and 4, a user may send an index creation request to the database kernel when creating an index table of the time series data table. The index creation request is for requesting the database kernel to create an index table of the time series data table. The index creation request may include: the identification of the index column, the identification of the time sequence data table, etc. The identification of the index column may be information that uniquely identifies a column in the time series data table, such as a column name of a column, etc. In the embodiment of the present application, the index column is a time column. In the embodiment of the present application, the programming language of the index creation request is not limited. For example, the index creation request may be an index creation request written in a structured query language (StructuredQuery Language, SQL), a cached query language (Linkcache query language, LQL), or a domain-specific language (Domain Specified Language, DSL), or the like.
For an index creation request in SQL form, it can be implemented as: CREATE INDEX ON t _orders (time_col). Where t_orders represents the identification of the time series data table and time_col represents the identification of the index column. The meaning of the SQL statement is: an index table with an index column of "time_col" is created for the time series data table "t_orders".
For the database kernel, in conjunction with fig. 1 and 4, in step 101, a default time range may be acquired in response to the index creation request. In this embodiment of the present application, the default time range is a time range default to the database kernel, and is determined by a time minimum value and a time maximum value. Generally, the minimum time is an infinitesimal time, and the maximum time is an infinite time, so as to ensure that any time is in a default time range.
Further, in step 102, based on the first level partition stage of the time-horizon manner, an initial index partition P0 corresponding to the default time horizon may be created. The time range corresponding to the initial index partition P0 is a default time range. In the embodiment of the application, in order to solve the hot spot problem of the time index partition, a hash partition scheme is introduced for performing secondary partition on the primary partition. The hash partition mode may be a consistent hash partition mode. Wherein, the hash algorithm can convert a large amount of time information into uniform hash through a certain form of hash, thereby realizing pressure averaging. The hash partitioning scheme is illustrated below in conjunction with fig. 5.
The consistent hashing algorithm is modulo a fixed integer X. Wherein X is a predetermined number of partitions, or a fixed integer, such as 2≡32. First, X values can be abstracted into a circle, the point directly above the circle represents 0, arranged clockwise, and so on: 0. 2, 3 … up to (X-1), and this ring of X points is collectively referred to as a hash ring. Further, when hashing the default range; as shown in fig. 5, a hash calculation may be performed on a default range (a range formed by a default minimum value and a maximum value), to obtain a hash result of the default range; thereafter, the hash result of the default range is modulo X, and the result is an integer between 0 and (X-1) (e.g., 0X0000-0xFFFF in FIG. 5). Further, the result of modulo X of the hash result of the default range may be divided into multiple partitions, such as partition 1-partition 3 in fig. 5.
When the time index is stored, the result of modulo X of the hash result of the default range may be mapped onto the hash ring, and the position mapped onto the hash ring represents a time index. Further, the time range 0X001-0X2001 to be stored may be subjected to hash calculation, and the result after the hash is used for modulo X, so that the hash result of 0X2001 is obtained as follows: hash (0 x 2001) =0x1356; the hash result of 0x001 is: hash (0 x 001) =0x2934. And routing the index to be stored to different partitions according to the hash result ranges corresponding to the partitions. For example, 0x001 is routed to partition 3;0x2001 is routed to partition 2, etc. After hash calculation is performed on the time type index columns in a hash partition mode, the time indexes of the writing flow can be balanced to different partitions, and the problem of partition hot spots of the time indexes written into the partitions can be effectively avoided. Thus, whether the equivalent query (such as Select … Where id=xxx) or the Insert (such as Insert … Values.) and other writing operations are performed, different partitions can be balanced through hash routing, so that obvious hot spots can not be generated, and load balancing is realized.
Based on the working principle of the hash partition, in combination with fig. 1 and fig. 4, in step 103 of this embodiment, a hash calculation is performed on the default time range, so as to obtain a hash result range of the default time range. Alternatively, a consistent hash calculation may be performed on the default time range to obtain a consistent hash result range for the default time range, such as [ -9223372036854775808, 9223372036854775807]. Further, in conjunction with fig. 1 and 4, in step 104, the initial index partition may be divided into a plurality of sub-partitions according to a hash result range of a default time range. The plural means 2 or more than 2. The number of sub-partitions is illustrated as N in fig. 4. N is more than or equal to 2 and is an integer. Multiple child partitions may be deployed on multiple physical machines. Wherein the number of physical machines deploying the sub-partitions is less than or equal to the number of sub-partitions. Preferably, the data deploying the child partition is equal to the number of physical machines.
Based on the plurality of sub-partitions corresponding to the initial index partition, when the time index of the time sequence data table is written in the index table, hash calculation can be carried out on the time index of the time sequence data table, and a hash result of the time index of the time sequence index table is obtained; and storing the time index of the time sequence data table in a plurality of sub-partitions according to the hash result of the time index of the time sequence data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.
Because the plurality of sub-partitions are deployed on the plurality of physical machines, the time index can be uniformly written into the plurality of physical machines, and the load balance of the writing flow of the physical machines is realized.
In the embodiment of the present application, the specific number of the plurality of sub-partitions into which the initial index partition is divided is not limited. In some embodiments, the number of sub-partitions, N, may be preset. N is more than or equal to 2 and is an integer. The number N of the sub-partitions may be a default setting parameter, or may be set autonomously by a user. For embodiments set autonomously by the user, the user may send a child partition number setting request to the database kernel through his terminal. The request includes the number of sub-partitions N. The database kernel may receive the child partition number setting request, and obtain the child partition number N from the request, and store the child partition number N. Based on the preset number N of sub-partitions, when the initial index partition is divided into a plurality of sub-partitions, the hash result range of the default time range may be equally divided into N hash result subsets. For example, the hash result range of the default time range may be equally divided into N hash result subsets in order of the hash result of the default time range from small to large. Further, as shown in fig. 4, the initial index partition may be divided into N sub-partitions corresponding to the N hash result subsets. The hash result range corresponding to each sub-partition is the hash result subset corresponding to the sub-partition. N sub-partitions may be deployed on N physical machines.
Because the common range query of the time index cannot be subjected to partition clipping in the Hash partition scene, only partition clipping of equivalent queries can be supported, and the range query of the time index loses the effect of query optimization. Therefore, when the time range query is performed on the two-stage partition, global scanning is required to be performed in a plurality of sub-partitions by using the hash result range of the time range to be queried, and the query efficiency is low.
In some embodiments of the present application, to improve the efficiency of subsequent queries, partition clipping is supported during queries. Specifically, when the data amount of a plurality of sub-partitions corresponding to the initially created index partition (i.e., the initial index partition) reaches a set data amount threshold, the initial index partition may be partitioned to obtain a plurality of index partitions.
The plurality of sub-partitions corresponding to the initial index partition are created in the initial stage, and the time indexes of the time sequence data table can be uniformly stored into the plurality of sub-partitions corresponding to the initial index partition according to the hash results of the time indexes of the time sequence data table.
As shown in fig. 6, in a case where the data amount of the plurality of sub-partitions corresponding to the initial index partition is greater than or equal to a set data amount threshold, the initial index partition may be split into a plurality of index partitions. In the embodiment of the present application, the trigger timing of index partition splitting is not limited. In some embodiments, when the data volume of a plurality of sub-partitions corresponding to the initial index partition is greater than or equal to a set data volume threshold, the index partition splitting may be directly triggered, but in this way, during a write flow peak period, the disadvantage of frequently triggering the database kernel to perform partition splitting may occur, which may easily cause service performance jitter of the database system.
To address this issue, embodiments of the present application support a split trigger occasion that automatically selects a partition. For example, a user may determine a valley period of service traffic from service traffic of a database; and sets the splitting opportunity of the index partition as the valley period. Correspondingly, aiming at the database kernel, a partition splitting time setting request can be obtained, and the partition splitting time set by a user is obtained from the partition splitting time setting request; when the partition splitting time arrives, splitting the initial index partition of which the data volume of the corresponding multiple sub-partitions is larger than or equal to the set data volume threshold value into multiple index partitions. For another example, the database kernel may also monitor the service traffic of the database and determine the valley period of the service traffic according to the service traffic of the database; and sets the splitting timing of the index partition to the valley period, etc. In this way, when the partition splitting timing arrives, the initial index partition whose data amount is greater than or equal to the set data amount threshold value of the corresponding plurality of sub-partitions is split into the plurality of index partitions.
The embodiment autonomously sets the splitting time of the index partition, can avoid frequently triggering the database kernel to split the partition in the writing flow peak period, and is beneficial to reducing the service performance jitter of the database.
In the embodiment of the present application, the specific implementation of splitting the initial index partition into a plurality of index partitions is not limited. In some embodiments, the initial index partition may be split into Q index partitions according to the data amounts of the plurality of sub-partitions corresponding to the initial index partition and a preset splitting number Q, where the data amount corresponding to each index partition is 1/Q of the data amount of the plurality of sub-partitions corresponding to the initial index partition. Wherein Q is more than or equal to 2 and is an integer.
In other embodiments, as shown in fig. 6, the default time range corresponding to the initial index partition may be divided into a plurality of time subsets according to a predetermined time interval. Wherein the number of time subsets is determined by the size of the time interval. Further, the initial index partition may be split into multiple time subsets; and splitting the initial index partition into a plurality of index partitions corresponding to the plurality of time subsets. The number of index partitions corresponds one-to-one to the subset of times. Fig. 6 illustrates a number M of time subsets at 1 year intervals, but is not limited thereto.
Further, as shown in fig. 6, for any index partition Pi among the plurality of index partitions into which the initial index partition is split, a hash calculation may be performed on a time range of the index partition Pi (i.e., a time subset corresponding to the index partition a), to obtain a hash result range of the time range corresponding to the index partition Pi. Where i=1, 2, …, M. M is the number of index partitions into which the initial index partition splits. Further, the index partition Pi may be divided into a plurality of sub-partitions according to the hash result range corresponding to the index partition Pi. In this way, the time indexes corresponding to the index partitions Pi can be uniformly written into the sub-partitions corresponding to the index partitions Pi, so as to realize load balancing.
The above embodiments are described by way of example only with the initial index partition split into a plurality of index partitions P1-PM. When the number of lines in the index table increases, the data volume of the sub-partition corresponding to the index partition Pi may also reach the set data volume threshold, and the index partition Pi may be split into a plurality of index partitions again, so that the index partition may be automatically split into a plurality of index partitions according to the above manner whenever the data volume of the sub-partition corresponding to the index partition reaches the set data volume threshold. The newly split index partition can also be used for carrying out secondary partition in a hash partition mode for realizing load balancing. Accordingly, the index processing method shown in fig. 1 is applicable to any index partition to be created.
In the embodiment of the present application, for convenience of description and distinction, an index partition to be created is defined as a first index partition. The first index partition may be the initial index partition, or may be any index partition that splits the index partition split at other stages again. Regardless of the stage of index partition to be created, the second-level partition may be performed according to the index partition processing method shown in fig. 7 described below.
Fig. 7 is a flowchart of another index processing method according to an embodiment of the present application. As shown in fig. 7, the index processing method may include:
701. acquiring a target time range of a first index partition to be created; the first index partition is a partition of an index table of the time series data table.
702. And creating a first index partition corresponding to the target time range.
703. And carrying out hash calculation on the target time range to obtain a hash result range of the target time range.
704. The first index partition is divided into a plurality of sub-partitions according to the hash result range of the target time range, so that the time index is stored in the plurality of sub-partitions according to the hash result of the time index of the time sequence data table.
In this embodiment, the first index partition to be created may be the initial index partition, and the description of the related content of fig. 1 may be referred to for the implementation of determining the target time range of the initial index partition, which is not repeated herein. Of course, the first index partition to be created may also be the index partition to be split out of the index partitions to be split. For example, the first index partition may be an index partition split from the initial index partition; or the index partition to be split out for the index partition to be split out of other stages. In the embodiment of the present application, for convenience of description and distinction, an index partition existing before the first index partition is created is defined as a second index partition. Accordingly, the first index partition may be an index partition to be split out of the index partitions to be split out of the second index partition.
The data quantity of a plurality of sub-partitions corresponding to the index partition to be split in the second index partition is larger than or equal to a set data quantity threshold. In this embodiment of the present application, the second index partition, in which the data amount of the corresponding plurality of sub-partitions in the second index partition is greater than or equal to the set data amount threshold, is referred to as a target index partition. Accordingly, the step 701 may be implemented as: aiming at a target index partition in the second index partition, a time range corresponding to the target index partition can be acquired; and dividing the time range corresponding to the target index partition into a plurality of time subsets according to the preset time interval, and taking the time ranges as the target time ranges of the plurality of first index partitions to be created.
In the embodiment of the present application, the preset time interval may be preset, and the specific implementation manner of setting the time interval is not limited. In some embodiments, the user may set the time interval based on actual query requirements. For example, when a user performs a time-frame query, the user often performs a data query in units of a certain time frame, and then the preset time interval may be set as the time frame of the data query. For example, the user performs a data query in a time range of day, week, or month, and the time interval may be set to day, week, or month, or the like.
Accordingly, the user may send a time interval setting request to the database kernel through his terminal. The time interval setting request is used to set a time interval for index partition splitting. For the database kernel, a time interval setting request may be acquired, and from the time interval setting request, a time interval set by the request may be acquired as a preset time interval.
In other embodiments, the database kernel may also autonomously determine the time interval based on statistical analysis. Specifically, the data amount of the target index partition at each unit time may be acquired. The data volume of the target index partition in each unit time is the data volume of a plurality of sub-partitions corresponding to the target index partition in each unit time. The unit time can be flexibly set according to actual requirements. For example, the unit time may be every minute, hour, day, month or month, or even longer, etc.
Further, the preset time interval may be determined according to the data amount of the target index partition at each unit time. Alternatively, a time interval in which the difference between the amounts of data among the split-out plurality of third index partitions can be made smaller than or equal to the set difference size may be determined as a preset time interval according to the amount of data of the target index partition at each unit time.
Based on the preset time interval, the time range corresponding to the target index partition may be divided into a plurality of time subsets. The number of the time subsets is specifically determined by the size of the time range and the size of the time interval corresponding to the target index partition. Accordingly, in conjunction with fig. 7 and 8, step 702 may be implemented as: the target index partition is split into a plurality of index partitions (defined as a third index partition) corresponding to the plurality of time subsets. The plurality of third index partitions are a plurality of first index partitions to be created. In fig. 8, only the target index partition is shown as the initial index partition P0, and the plurality of third index partitions are shown as a plurality of index partitions P1-PM split from the initial index partition P0, but the present invention is not limited thereto.
After the first index partition is obtained, in step 703, a hash calculation may be performed on the target time range to obtain a hash result range of the target time range. Further, in step 704, the first index partition may be divided into a plurality of sub-partitions according to the hash result range of the target time range.
For the embodiment of the plurality of third index partitions that are split for the target index partition by the first index partition, as shown in fig. 8, the step 703 may be implemented as follows: and carrying out hash calculation on the target time range corresponding to each third index partition to obtain a hash result range of the target time range corresponding to each third index partition. Accordingly, step 704 may be implemented as: each third index partition may be divided into a plurality of sub-partitions according to a hash result range of the third index partition.
In the embodiment of the application, on the basis of partitioning the time index table in a time range, a hash mode is introduced to perform secondary partition, and hash calculation can be performed on a target time range corresponding to the index partition obtained in the time range partition mode; and dividing the index partition into a plurality of sub-partitions according to the hash result range of the target time range. In this way, when the time index of the time series data table is written in the time series index table, the time index of the time series data table can be stored in the plurality of sub-partitions according to the hash result of the time index of the time series data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.
Specifically, the embodiment of the application also provides a further index processing method, which is mainly used for writing the time sequence index. As shown in fig. 9, the index processing method mainly includes:
901. and acquiring a time index to be stored from the time sequence data table.
902. Determining a target index partition corresponding to the time index to be stored from at least one index partition according to the time index to be stored; wherein at least one index partition is partitioned according to a time range.
903. And carrying out hash calculation on the time index to be stored to obtain a hash result of the time index to be stored.
904. Determining a target sub-partition from a plurality of sub-partitions corresponding to the target index partition according to a hash result of the time index to be stored; the plurality of sub-partitions are obtained by dividing the hash result range of the time range corresponding to the target index partition.
905. And storing the time index to be stored in the target sub-partition.
The partitioning process of the plurality of sub-partitions may be referred to in the foregoing embodiments, and will not be described herein. Fig. 9 provides an embodiment of a process for storing any time index in a time series data table. Because the time indexes of the time sequence data table are different, when the time indexes of the time sequence data table are written in the time sequence index table, the time indexes of the time sequence data table can be stored in a plurality of sub-partitions corresponding to the target index partition according to the hash result of the time indexes of the time sequence data table. When the time index of the time sequence data table is stored, the time index can be uniformly written into a plurality of sub-partitions by utilizing the hash principle, the probability of occurrence of partition hot spots is reduced, the occurrence of partition hot spots can be avoided, and the load balancing effect is achieved.
For example, as shown in fig. 10, even when hot spot writing occurs in a certain time range, the hot spot writing traffic can be distributed to a plurality of sub-partitions corresponding to the time range by using a hash method. For example, in fig. 10, when the index partition PM has a hot spot, hash calculation may be performed on the time index of the hot spot writing, and the time index of the hot spot writing is distributed to a plurality of sub-partitions corresponding to the index partition PM according to the hash result, so that the probability of the occurrence of the hot spot in the sub-partition is greatly reduced, and even the occurrence of the hot spot in the sub-partition can be avoided.
For the embodiment of the above-mentioned third index partition in which the first index partition is split into the target index partition, since the index partition is split into multiple index partitions with different time ranges, when querying the range of the global secondary index, the sub-partitions corresponding to the unrelated index partitions can be filtered out by partition clipping, and only the sub-partitions corresponding to the index partitions related to the query range are queried, so that the index query speed can be improved.
Specifically, when indexing a query, a query request may be obtained; and obtaining the time range to be queried from the query request. Further, according to the time range to be queried and the time ranges corresponding to the plurality of index partitions existing currently, determining that the time range comprises the index partitions of the time range to be queried; further, the query may be performed in a sub-partition corresponding to an index partition whose time range includes the time range to be queried, so as to obtain a time index corresponding to the time range to be queried. The time range to be queried and the time ranges corresponding to the plurality of index partitions currently existing can be filtered to index partitions with time ranges irrelevant to the time range to be queried, and the query is only performed in the sub-partitions corresponding to the index partitions containing the time range to be queried, and the global query is not required to be performed on all the index partitions, so that the index query can be accelerated.
The index partitions currently exist are index partitions existing when the query request is received, and specifically include which index partitions are determined by the time when the query request is received and the splitting time of the index partitions.
Because the time interval of index partition splitting can be set by a user according to the query requirement, the time range of the split index partitions also meets the query requirement, and when the time-related range query is performed based on the time index, the partition cutting effect meets the query requirement.
The index processing method provided by the embodiment of the invention utilizes the hash principle to uniformly write the time index into the plurality of sub-partitions, reduces the probability of occurrence of partition hot spots, and even can avoid occurrence of partition hot spots, thereby achieving the load balancing effect. However, a single-value hot spot problem may occur in the actual application process, that is, the write traffic of a single point in time increases sharply. The hash result of the time corresponding to the single-value hot spot is still routed to a certain sub-partition, so that the single-value hot spot appears in the sub-partition, namely the problem of the single-value hot spot cannot be solved by the two-stage partition mode based on the time range and the hash algorithm.
For example, as shown in fig. 11, 0x1234 is the hash result of the single-value hot spot, and it is assumed that 0x1234 accounts for 95% of the data volume of partition 1, and the single-value hot spot 0x1234 is always located in a certain partition, such as partition 1-2 into which partition 1 is split in fig. 11, regardless of how partition 1 is split subsequently. That is, the single-value hot spot cannot be split further, so that the read-write pressure of the partition where the single-value hot spot is located cannot be balanced.
In order to solve the problem of single-value hot spots, in some embodiments of the present application, a vector partition key composed of a time index column and a main key column is introduced, and hash space division is performed on sub-partitions based on the vector partition key. The Primary key (Primary key) can uniquely identify an entity, so that the entity integrity of the database is ensured, the correctness and rationality of data in the data are ensured, and the value is non-null and unique. The primary key may be a single primary key or a joint primary key. The combined main key is composed of 2 or more fields.
In this embodiment, the secondary partition of the time-series global index table may be partitioned according to the vector partition key formed by the time index series and the primary key series. Specifically, hash computation can be performed on the vector partition key to obtain a hash result of the vector partition key. For example, as shown in fig. 12, the time index column is time_col and the primary key column is id, and the vector partition key composed of the time index column and the primary key column is (time_col, id). Based on the vector partition key, the database kernel may consider the vector partition key as a point in a two-dimensional space (defined as an original space) coordinate system composed of x-axis=time_col and y-axis=id, and then the values of the columns of the point become another hash value vector (hash (time_col), hash (id)) after hash calculation, which is equivalent to the original point being mapped to another point in a new two-dimensional space (defined as a hash space) coordinate system composed of x '=hash (time_col) axis and y' =hash (id) axis. For embodiments in which the primary key is a joint primary key, both the original spatial coordinate system and the hash spatial coordinate system are more dimensional coordinate systems, with the specific dimensions being determined by the number of joint primary keys. For example, if the joint primary key is a primary key composed of 2 fields, the original space coordinate system and the hash space coordinate system are three-dimensional space coordinate systems.
Based on the vector partition key, in this embodiment, the primary key of the time series data table may also be acquired. Wherein the primary key of the time series data table may be specified by the user at the time of creation of the time series data table. Specifically, the user may send a list creation request to the database kernel through his terminal, which may specify a primary key of the time series data list. For the database kernel, the primary key of the time series data table can be obtained from the table establishment request. Further, a default hash range of the primary key may also be obtained. The default hash range of the primary key is determined by the data type of the primary key. For example, the primary key is user identification (id), the return value of hash function hash (id) is Long-form (Long) data, the value range is [ -9223372036854775808, 9223372036854775807], and the default hash range of the primary key id is [ -9223372036854775808, 9223372036854775807].
Further, the step 704 may be implemented as dividing the first index partition into a plurality of sub-partitions according to the hash result range of the target time range: dividing a hash result range of the target time range into a plurality of hash result subsets; and constructing a plurality of hash spaces by utilizing the plurality of hash result subsets and the default range of the primary key to serve as a plurality of sub-partitions. Specifically, as shown in the left diagram of fig. 13, for any one hash result subset, a multidimensional space (e.g., a two-dimensional space shown in fig. 13) formed by the hash result subset and a default range of the primary key may be used as one hash space. Each of the SP 1-SPNs shown in fig. 13 is a hash space.
Wherein the number of subsets of hash results is determined by the number N of sub-partitions. The setting manner of the number N of sub-partitions can be referred to the relevant content of the above embodiment, and will not be described herein. Alternatively, as shown in fig. 13, the hash result range of the target time range may be divided into N hash result subsets on average; n hash spaces, namely the hash spaces SP1-SPN in FIG. 13, are constructed by utilizing N hash result subsets and a default hash range of a primary key. The N hash spaces are N sub-partitions divided into by the first index partition.
In order to solve the problem of single-value hot spots, a target time index of the hot spots can be determined based on a hash space constructed by the hash result of the vector partition key. The target time index of the occurrence of the hot spot means that the data amount of the target time index is greater than or equal to a set data amount threshold, or the number of lines of the target time index is greater than or equal to a set number of lines threshold. In the embodiments of the present application, the specific implementation manner of determining the target time index of occurrence of the hot spot is not limited. In some embodiments, a preset time point at which a hot spot occurs may be obtained as a target time index. The preset time point of occurrence of the hot spot can be determined by a user according to prior knowledge of past historical hot spot data. Of course, the database kernel can also predict the time point of occurrence of the hot spot according to the historical hot spot data, and the time point is used as the target time index of occurrence of the hot spot. Specifically, according to the historical time sequence data, a time point when the number of the generated data records is greater than the set line number threshold value is obtained and used as a time point when the hot spot occurs. The time point of the occurrence of the hot spot is the target time index.
Further, for the target time index with the hot spot, the target hash space where the target time index is located may be determined from the plurality of hash spaces (i.e., the plurality of sub-partitions) divided from the first index partition according to the hash result of the target time index. For example, as shown in fig. 13, the target hash space corresponding to the target time index is the hash space SP1. Further, the target hash space may be divided into a plurality of hash subspaces according to the primary key. In this way, when the target time index is written, a plurality of vector partition keys composed of the target time index and the target primary key values can be stored in the plurality of hash subspaces according to the hash results of the plurality of target primary key values corresponding to the target time index. The primary key value refers to a specific value of the primary key listed in the time sequence data table, and the target primary key value corresponding to the target time index is the primary key value corresponding to the target time index in the time sequence data table.
The plurality of hash subspaces are deployed across a plurality of physical machines. The number of physical machines is less than or equal to the number of hash subspaces. Preferably, the number of physical machines is equal to the number of hash subspaces. Therefore, when a plurality of vector partition keys formed by the target time index and the target main key values are stored in a plurality of hash subspaces according to hash results of the target main key values corresponding to the target time index, the vector partition keys can be distributed to a plurality of physical machines, and single-value hot spot pressures of original target sub-partitions (namely the original target hash spaces) are distributed to the hash subspaces, so that single-value hot spot pressures of a single machine can be relieved, and the problem of read-write balance of the single-value time hot spots is solved.
In the embodiment of the present application, the specific implementation of dividing the target hash space into a plurality of hash subspaces according to the primary key is not limited. In some embodiments, as shown in the right diagram of fig. 13, the default hash range of the primary key may be divided equally into S hash result subsets corresponding to the primary key; s is the number of preset hash subspaces, S is an integer, and S is more than or equal to 2. Further, the target hash space can be divided into a plurality of hash subspaces according to the dividing points determined by the S hash result subsets corresponding to the primary keys; each hash result subset corresponds to a hash subspace. For example, as shown in the right diagram of fig. 13, s=3, the default hash range of the primary key may be divided equally into 3 hash result subsets; and according to the division points determined by the 3 hash result subsets, the hash space corresponding to the target hash space is divided into 3 hash subspaces, namely, the hash subspaces SP1-1, SP1-2 and SP1-3 in average.
In other embodiments, a target hash corresponding to the hash result of the target time index may be obtained from the target hash space; further, the target hash and other hashes of the target hash space except the target hash can be divided into different hash subspaces to obtain a plurality of hash subspaces corresponding to the target hash space.
In some embodiments, the target hash alone may be used as one hash subspace to carry the load. Specifically, the target hash is singly divided into a hash subspace; and dividing other hashes except the target hash in the target hash space into at least one hash subspace by taking the target hash as a dividing line so as to obtain a plurality of hash subspaces. For example, as shown in the left diagram of fig. 14, the target single column corresponding to the hash result "88" of the target time index in which the hot spot occurs may be divided into a separate hash subspace; and dividing the hash with the time index hash result greater than 88 in the target hash space into a hash subspace; and dividing the hash with the time index less than 88 in the target hash space into another hash subspace.
If the target hash is used as a hash subspace alone to bear the load, the problem of read-write hot spots still occurs, and the hash space division can be further carried out on the target hash according to the primary key. Specifically, the default hash range of the primary key can be equally divided into K hash result subsets corresponding to the primary key; k is the number of preset subcolumns, K is an integer, and K is more than or equal to 2. Fig. 14 illustrates k=3, but is not limited thereto.
Further, as shown in the right diagram of fig. 14, the target hash may be divided into K sub-columns according to the dividing points determined by the K hash result subsets corresponding to the primary key; and divides K subcolumns into K hash subspaces, sp_hot1, sp_hot2, and sp_hot3 as shown in fig. 14. Wherein the K hash subspaces are deployed on a plurality of physical machines. The number of physical machines is less than or equal to K. Preferably, the number of physical machines is equal to K. When the vector partition keys formed by the target time index and the target main key values are stored in the K hash subspaces according to the hash results of the target main key values corresponding to the target time index, the vector partition keys can be distributed to a plurality of physical machines, the single-value hot spot pressure of the original target partition (namely the original target hash space) is distributed to the K hash subspaces, the single-value hot spot pressure of a single machine can be relieved, and the read-write balance problem of the single-value time hot spot is solved.
Further, the target hash may be used as a dividing line, and other hashes in the target hash space except for the target hash may be divided into at least one hash subspace, so as to obtain a plurality of hash subspaces. The K hash subspaces corresponding to the target hash and the hash subspaces corresponding to other hashes except the target hash in the target hash form a plurality of hash subspaces corresponding to the target hash. For a specific implementation manner of dividing the hashes except for the target hash into at least one hash subspace in the target hash space by using the target hash as the dividing line, reference may be made to the relevant content of the foregoing embodiment, which is not repeated herein.
Based on the multiple hash subspaces divided by the target hash space, multiple target primary key values corresponding to the target time index can be obtained from the time sequence data table; hash calculation is respectively carried out on a plurality of vector partition keys formed by the target time index and a plurality of target primary key values, so as to obtain hash results of the plurality of vector partition keys; and then, storing the plurality of vector partition keys in the plurality of hash subspaces according to the hash results of the plurality of vector partition keys. Specifically, a target hash subspace corresponding to each vector partition key can be determined according to the hash result of the target primary key value in the hash results of the plurality of vector partition keys; and a plurality of vector partition keys are stored in the corresponding target hash subspaces respectively. The plurality of hash subspaces are deployed on the plurality of physical machines, so that the read-write pressure of the single-value time hot spot can be shared to the plurality of physical machines, the single-value hot spot pressure of the target hash space is dispersed to the plurality of hash subspaces, the single-value hot spot pressure of a single machine can be relieved, and the read-write balance problem of the single-value time hot spot is solved.
It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 701 and 702 may be device a; for another example, the execution body of step 701 may be device a, and the execution body of step 702 may be device B; etc.
In addition, in some of the above embodiments and the flows described in the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 701, 702, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the index processing method provided in the foregoing embodiments.
Fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 15, the electronic device includes: a memory 15a and a processor 15b. Wherein the memory 15a is used for storing a computer program.
The processor 15b is coupled to the memory 15a for executing the computer program for performing the steps in the index processing method provided by the foregoing embodiments. For the specific implementation of each step, reference may be made to the related description of the foregoing embodiments, which is not repeated herein.
In some alternative embodiments, as shown in fig. 15, the electronic device may further include: optional components such as a communication component 15c, a power component 15d, a display component 15e, and an audio component 15 f. Only a part of the components are schematically shown in fig. 15, which does not mean that the electronic device must contain all the components shown in fig. 15, nor that the electronic device can only contain the components shown in fig. 15.
In addition, the components within the dashed box in fig. 15 are optional components, not necessarily optional components, depending on the product form of the electronic device. The electronic device of the embodiment can be implemented as terminal devices such as a desktop computer, a notebook computer, a mobile phone or an internet of things device; and can also be a traditional server, a cloud server or a server cluster and other various server devices.
In embodiments of the present application, the memory is used to store a computer program and may be configured to store various other data to support operations on the device on which it resides. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The Memory may be implemented by any type or combination of volatile or non-volatile Memory devices, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), erasable programmable Read-Only Memory (Electrical Programmable Read Only Memory, EPROM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.
In the embodiments of the present application, the processor may be any hardware processing device that may execute the above-described method logic. Alternatively, the processor may be a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU) or a micro control unit (Microcontroller Unit, MCU); programmable devices such as Field programmable gate arrays (Field-Programmable Gate Array, FPGA), programmable array logic devices (Programmable Array Logic, PAL), general array logic devices (General Array Logic, GAL), complex programmable logic devices (Complex Programmable Logic Device, CPLD), and the like; or an advanced reduced instruction set (Reduced Instruction Set Compute, RISC) processor (Advanced RISC Machines, ARM) or System on Chip (SoC), etc., but is not limited thereto.
In embodiments of the present application, the communication component is configured to facilitate wired or wireless communication between the device in which it resides and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as wireless fidelity (Wireless Fidelity, wiFi), 2G or 3G,4G,5G or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component may also be implemented based on near field communication (Near Field Communication, NFC) technology, radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data association (Infrared Data Association, irDA) technology, ultra Wide Band (UWB) technology, bluetooth (BT) technology, or other technologies.
In embodiments of the present application, the display assembly may include a liquid crystal display (Liquid Crystal Display, LCD) and a Touch Panel (TP). If the display assembly includes a touch panel, the display assembly may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.
In embodiments of the present application, the power supply assembly is configured to provide power to the various components of the device in which it is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.
In embodiments of the present application, the audio component may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for a device with language interaction functionality, voice interaction with a user, etc., may be accomplished through an audio component.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should be further noted that, the descriptions of "first" and "second" herein are used to distinguish between different messages, devices, modules, etc., and do not represent a sequence, nor do they limit that "first" and "second" are different types.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, CD-ROM (Compact Disc Read-Only Memory), optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs, etc.), input/output interfaces, network interfaces, and memory.
The Memory may include volatile Memory, random-Access Memory (RAM), and/or nonvolatile Memory in a computer-readable medium, such as read-only Memory (ROM) or flash RAM. Memory is an example of computer-readable media.
The storage medium of the computer is a readable storage medium, which may also be referred to as a readable medium. Readable storage media, including both permanent and non-permanent, removable and non-removable media, may be implemented in any method or technology for information storage. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PRAM), static Random Access Memory (SRAM), dynamic random access Memory (Dynamic Random Access Memory, DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other Memory technology, compact disc read only Memory (CD-ROM), digital versatile disks (Digital Video Disc, DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (18)

1. An index processing method, comprising:
acquiring a target time range of a first index partition to be created; the first index partition is a partition of an index table of the time sequence data table;
Creating a first index partition corresponding to the target time range;
carrying out hash calculation on the target time range to obtain a hash result range of the target time range;
dividing the first index partition into a plurality of sub-partitions according to the hash result range of the target time range, so as to store the time index in the plurality of sub-partitions according to the hash result of the time index of the time sequence data table.
2. The method of claim 1, wherein the partitioning the first index partition into a plurality of sub-partitions according to the hash result range of the target time range comprises:
acquiring the number N of preset sub-partitions, wherein N is an integer, and N is more than or equal to 2;
equally dividing the hash result range of the target time range into N hash result subsets;
and dividing the first index partition into N sub-partitions corresponding to the N hash result subsets.
3. The method of claim 1, wherein the obtaining the target time range for the first index partition to be created comprises:
responding to an index creation request aiming at the time sequence data table, and acquiring a default time range as the target time range;
The creating the first index partition corresponding to the time range includes:
and creating an initial index partition corresponding to the default time range as the first index partition.
4. The method of claim 1, wherein at least one second index partition exists prior to acquiring the time frame of the first index partition to be created; the second index partition is a partition for dividing an index table of the time sequence data table according to a time range, and the second index partition corresponds to a plurality of sub-partitions; the method further comprises the steps of:
storing the time index to a plurality of sub-partitions corresponding to the second index partition according to the hash result of the time index of the time sequence data table;
the obtaining the target time range of the first index partition to be created includes:
aiming at a target index partition in the at least one second index partition, acquiring a time range corresponding to the target index partition; the target index partition is a second index partition of which the data volume of the corresponding sub-partitions is larger than or equal to a set data volume threshold value;
dividing the time range corresponding to the target index partition into a plurality of time subsets according to a preset time interval, and respectively serving as target time ranges of a plurality of first index partitions to be created;
The creating a first index partition corresponding to the target time range includes:
splitting the target index partition into a plurality of third index partitions respectively corresponding to the plurality of time subsets, and taking the third index partitions as the plurality of first index partitions.
5. The method as recited in claim 4, further comprising:
acquiring a time interval setting request; acquiring the preset time interval from the time interval setting request;
or,
acquiring the data quantity of the target index partition in each unit time; and determining the preset time interval according to the data quantity of the target index partition in each unit time.
6. The method as recited in claim 1, further comprising:
acquiring a main key of the time sequence data table, and acquiring a default hash range of the main key;
the dividing the first index partition into a plurality of sub-partitions according to the hash result range of the target time range includes:
dividing the hash result range of the target time range into a plurality of hash result subsets;
and constructing a plurality of hash spaces by utilizing the plurality of hash result subsets and the default hash range of the main key, and taking the hash spaces as the plurality of sub-partitions.
7. The method as recited in claim 6, further comprising:
determining a target time index of occurrence of the hot spot;
determining a target hash space corresponding to the target time index from the plurality of hash spaces according to the hash result of the target time index;
dividing the target hash space into a plurality of hash subspaces according to the primary key, and storing a plurality of vector partition keys formed by the target time index and the plurality of target primary key values in the plurality of hash subspaces according to hash results of the plurality of target primary key values corresponding to the target time index; the plurality of hash subspaces are deployed at a plurality of physical machines.
8. The method of claim 7, wherein the partitioning the target hash space into a plurality of hash subspaces according to the primary key comprises:
the default hash range of the main key is divided into S hash result subsets corresponding to the main key in average; s is the number of preset hash subspaces, S is an integer, and S is more than or equal to 2;
dividing the target hash space into a plurality of hash subspaces according to dividing points determined by S hash result subsets corresponding to the primary key; each hash result subset corresponds to a hash subspace.
9. The method of claim 7, wherein the partitioning the target hash space into a plurality of hash subspaces according to the primary key comprises:
acquiring a target hash corresponding to a hash result of the target time index from the target hash space;
dividing the target hash and other hashes of the target hash space except the target hash into different hash subspaces to obtain the plurality of hash subspaces.
10. The method of claim 9, wherein the partitioning the target hash from other hashes in the target hash space than the target hash into different hash subspaces to obtain the plurality of hash subspaces comprises:
dividing the target hash sheet into a hash subspace; dividing other hashes except the target hashes in the target hash space into at least one hash subspace by taking the target hashes as a dividing line so as to obtain a plurality of hash subspaces;
or,
the default hash range of the main key is divided into K hash result subsets corresponding to the main key in average; k is the number of preset subcolumns, K is an integer, and K is more than or equal to 2;
Dividing the target hash into K sub-columns according to dividing points determined by K hash result subsets corresponding to the primary key, and dividing the K sub-columns into K hash subspaces; and dividing other hashes except the target hash in the target hash space into at least one hash subspace by taking the target hash as a dividing line so as to obtain the plurality of hash subspaces.
11. The method of claim 7, wherein determining the target time index for occurrence of the hotspot comprises:
acquiring a preset time point of occurrence of a hot spot as the target time index;
or,
and predicting a time point of occurrence of the hot spot according to the historical hot spot data, and taking the time point as the target time index.
12. The method according to any one of claims 1-11, further comprising:
performing hash calculation on the time index of the time sequence data table to obtain a hash result of the time index;
for any time index in the time indexes, determining a target sub-partition corresponding to the hash result of the any time index from the plurality of sub-partitions according to the hash result of the any time index and the hash result ranges corresponding to the plurality of sub-partitions;
And storing the arbitrary time index in the target sub-partition.
13. The method according to claim 4 or 5, further comprising:
acquiring a time range to be queried from a query request;
determining an index partition of which the time range contains the time range to be queried according to the time range to be queried and the time ranges corresponding to the plurality of index partitions existing currently; the index partitions refer to the existing index partitions when the query request is received;
and inquiring in the sub-partition corresponding to the index partition of which the time range contains the time range to be inquired so as to acquire the time index corresponding to the time range to be inquired.
14. The method according to any one of claims 7-11, further comprising:
acquiring a plurality of target primary key values corresponding to the target time index from the time sequence data table;
respectively carrying out hash calculation on a plurality of vector partition keys formed by the target time index and the target primary key values so as to obtain hash results of the vector partition keys;
and storing the plurality of vector partition keys in the plurality of hash subspaces according to hash results of the plurality of vector partition keys.
15. An index processing method, comprising:
acquiring a time index to be stored from a time sequence data table;
determining a target index partition corresponding to the time index to be stored from at least one index partition according to the time index to be stored; the at least one index partition is partitioned according to a time range;
performing hash calculation on the time index to be stored to obtain a hash result of the time index to be stored;
determining at least one target sub-partition from a plurality of sub-partitions corresponding to the target index partition according to the hash result of the time index to be stored; the plurality of sub-partitions are obtained by dividing the hash result range of the time range corresponding to the target index partition;
and storing the time index to be stored in the at least one target sub-partition.
16. The method of claim 15, wherein the time index corresponding to the target sub-partition has a target time index for hot spots; the target sub-partition is divided into a plurality of hash subspaces according to a main key of the time sequence data table; the plurality of hash subspaces are deployed on a plurality of physical machines; the time index to be stored is the target time index;
The method further comprises the steps of:
acquiring a plurality of target primary key values corresponding to the time index to be stored from the time sequence data table;
performing hash calculation on the target primary key values to obtain hash results of the target primary key values so as to obtain hash results of the vector partition keys; a vector partition key consists of the time index to be stored and a target primary key value;
the storing the time index to be stored in the target sub-partition includes:
and storing the vector partition keys in a plurality of hash subspaces corresponding to the target sub-partition according to the hash results of the vector partition keys.
17. An electronic device, comprising: a memory and a processor; wherein the memory is used for storing a computer program;
the processor is coupled to the memory for executing the computer program for performing the steps in the method of any of claims 1-16.
18. A computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the method of any of claims 1-16.
CN202311547108.8A 2023-11-20 2023-11-20 Index processing method, device and storage medium Active CN117271529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311547108.8A CN117271529B (en) 2023-11-20 2023-11-20 Index processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311547108.8A CN117271529B (en) 2023-11-20 2023-11-20 Index processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN117271529A true CN117271529A (en) 2023-12-22
CN117271529B CN117271529B (en) 2024-03-29

Family

ID=89209028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311547108.8A Active CN117271529B (en) 2023-11-20 2023-11-20 Index processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117271529B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090089334A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Lazy updates to indexes in a database
CN102063490A (en) * 2010-12-20 2011-05-18 大唐移动通信设备有限公司 Database partition method and device
US8005992B1 (en) * 2006-10-13 2011-08-23 Cadence Design Systems, Inc. Scalable storage and retrieval of multiple asynchronous signals
US20150199413A1 (en) * 2014-01-13 2015-07-16 International Business Machines Corporation Transforming timeseries and non-relational data to relational for complex and analytical query processing
CN107423368A (en) * 2017-06-29 2017-12-01 中国测绘科学研究院 A kind of space-time data indexing means in non-relational database
CN111767268A (en) * 2020-06-23 2020-10-13 平安普惠企业管理有限公司 Database table partitioning method and device, electronic equipment and storage medium
CN112765271A (en) * 2020-12-31 2021-05-07 杭州趣链科技有限公司 Block chain transaction index storage method and device, computer equipment and medium
CN113918807A (en) * 2021-09-24 2022-01-11 咪咕文化科技有限公司 Data recommendation method and device, computing equipment and computer-readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8005992B1 (en) * 2006-10-13 2011-08-23 Cadence Design Systems, Inc. Scalable storage and retrieval of multiple asynchronous signals
US20090089334A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Lazy updates to indexes in a database
CN102063490A (en) * 2010-12-20 2011-05-18 大唐移动通信设备有限公司 Database partition method and device
US20150199413A1 (en) * 2014-01-13 2015-07-16 International Business Machines Corporation Transforming timeseries and non-relational data to relational for complex and analytical query processing
CN107423368A (en) * 2017-06-29 2017-12-01 中国测绘科学研究院 A kind of space-time data indexing means in non-relational database
CN111767268A (en) * 2020-06-23 2020-10-13 平安普惠企业管理有限公司 Database table partitioning method and device, electronic equipment and storage medium
CN112765271A (en) * 2020-12-31 2021-05-07 杭州趣链科技有限公司 Block chain transaction index storage method and device, computer equipment and medium
WO2022143540A1 (en) * 2020-12-31 2022-07-07 杭州趣链科技有限公司 Block chain index storage method and apparatus, computer device and medium
CN113918807A (en) * 2021-09-24 2022-01-11 咪咕文化科技有限公司 Data recommendation method and device, computing equipment and computer-readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEI HUANG等: ""Peak and Valley Periods Partitioning Based on Improved K-medoids Algorithm"", 《2020 IEEE SUSTAINABLE POWER AND ENERGY CONFERENCE (ISPEC)》, 18 February 2021 (2021-02-18), pages 1 - 4 *
杨佐希等: ""基于时序分区的时态索引与查询"", 《软件学报》, 30 November 2020 (2020-11-30), pages 3519 - 3539 *
蒋勇;: "ORACLE数据库分区技术及其应用", 科技信息, no. 29, pages 1 *

Also Published As

Publication number Publication date
CN117271529B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
JP6535031B2 (en) Data query method and apparatus
CN105354151B (en) Cache management method and equipment
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN115168338B (en) Data processing method, electronic device and storage medium
CN110837499B (en) Data access processing method, device, electronic equipment and storage medium
CN113448969B (en) Data processing method, device and storage medium
CN114116777A (en) A data processing method, device, equipment and storage medium
CN105468644B (en) Method and equipment for querying in database
CN108536759B (en) Sample playback data access method and device
CN110019274B (en) Database system and method and device for querying database
CN117271529B (en) Index processing method, device and storage medium
CN110019544B (en) Data query method and system
US12182201B2 (en) Graph data storage method, system and electronic device
CN111125157B (en) Query data processing method and device, storage medium and processor
CN110019296B (en) Database query script generation method and device, storage medium and processor
CN114661666B (en) Data searching method, device, equipment and storage medium
CN115809311B (en) Knowledge graph data processing method and device and computer equipment
CN120104037A (en) Partition identification method, storage system, electronic device and storage medium
CN110866003B (en) Index value number estimation method and device and electronic equipment
CN111143711A (en) Object searching method and system
CN119336786B (en) SQL query method and device, electronic equipment and storage medium
CN108287853B (en) Data relation analysis method and system
US12380009B2 (en) Method and apparatus for data processing for software product
CN116069784A (en) Data processing method, device and medium for ERP system
CN116962496A (en) Search service framework, search service request processing method, service isolation method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant