[go: up one dir, main page]

CN119088800A - Data indexing and query optimization system and method based on database - Google Patents

Data indexing and query optimization system and method based on database Download PDF

Info

Publication number
CN119088800A
CN119088800A CN202411068003.9A CN202411068003A CN119088800A CN 119088800 A CN119088800 A CN 119088800A CN 202411068003 A CN202411068003 A CN 202411068003A CN 119088800 A CN119088800 A CN 119088800A
Authority
CN
China
Prior art keywords
data
query
index
database
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411068003.9A
Other languages
Chinese (zh)
Inventor
张晖
郝运凯
李明
李正蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Database Technology Co Ltd
Original Assignee
Shandong Inspur Database Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Database Technology Co Ltd filed Critical Shandong Inspur Database Technology Co Ltd
Priority to CN202411068003.9A priority Critical patent/CN119088800A/en
Publication of CN119088800A publication Critical patent/CN119088800A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database-based data index and query optimization system and a method, which belong to the technical field of databases, and the technical problem to be solved by the invention is how to improve the data index efficiency and the query performance of a database system; the data storage module is used for storing data records in a database, the data records are stored in a column mode, the data compression rate and the query performance are improved, the disk I/O operation is reduced, the index building module is used for building an index according to the data records in the data storage module, the index adopts a B+ tree design, the number of times of the I/O operation when data are transmitted between the disk and the memory is reduced, and the query optimization module is used for receiving a query request and optimizing a query path according to the index built by the index building module.

Description

Database-based data index and query optimization system and method
Technical Field
The invention relates to the technical field of databases, in particular to a database-based data index and query optimization system and method.
Background
A database is a collection of data that is stored on a computer's storage device for a long period of time, organized according to certain rules, and sharable by various users or applications. It is a centralized data store, typically consisting of one or more data tables, each containing a plurality of rows and columns, for storing particular types of data. The database is mainly used for effectively organizing, storing and managing a large amount of data and providing the functions of structured storage, inquiry, modification, sequencing, statistics and the like of the data.
The database system is a more ideal data processing system developed to adapt to the data processing requirement, and is also a software system for providing data for the actual operable storage, maintenance and application system. Database systems are typically composed of four parts, database (DB), hardware, software (including operating systems, database management systems DBMSs, and applications), and personnel (including system analysts, database designers, application programmers, end users, and database administrators DBA). Among these, the database management system (DBMS) is the core software of the database system.
In recent years, with the large increase of data volume in society, the conventional database system often encounters problems of low database index efficiency and long query response time when processing large-scale data, and the performance of the database system is limited.
Therefore, how to improve the data index efficiency and the query performance of the database system is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a database-based data indexing and query optimizing system and method, which are used for solving the problem of how to improve the data indexing efficiency and the query performance of a database system.
The technical task of the invention is realized in the following way, a database-based data indexing and query optimizing system comprises a query optimizing module, an index constructing module and a data storage module;
The data storage module is used for storing data records in the database, wherein the data records are stored in a column, so that the data compression rate and the query performance are improved, and the disk I/O operation is reduced;
The index construction module is used for constructing an index according to the data record in the data storage module, the index adopts a B+ tree design, and the number of I/O operations when data are transmitted between the disk and the memory is reduced;
The query optimization module is used for receiving the query request and optimizing the query path according to the index constructed by the index construction module.
Preferably, the data storage module distributes data to a plurality of physical disks through RAID, and when different data is read and written, different disk blocks are used, so that the overall performance of data reading and writing is improved;
during actual reading and writing, the data storage module adopts a buffer pool formed by an internal database, so that physical reading and writing operations on a disk are reduced.
Preferably, the index construction module adopts a dynamic index structure based on data access frequency and data distribution, and dynamically adjusts the size of index nodes according to the data access frequency, wherein the adjustment of the size of the index nodes is based on a preset performance threshold value;
the index construction module analyzes the distribution of the data on the storage medium and optimizes the index structure by adopting a distribution perception algorithm based on cluster analysis and data inclination detection.
Preferably, the query optimization module comprises a query plan generator and a cost evaluator;
Wherein the query plan generator is configured to generate an index-based query execution plan;
The cost estimator is used for estimating the query time of different paths by adopting a machine learning model (cost model).
More preferably, the index-based query execution plan includes a multi-stage query execution flow and an intermediate result caching strategy;
The multi-stage query execution flow comprises data retrieval, data filtering, data projection, sequencing, connection and aggregation operations, wherein the data retrieval is based on index to retrieve data from a storage engine, the data filtering adopts a WHERE clause to filter the data, so that the data quantity of subsequent operations is reduced, and the data projection type SELECTs a required column according to the SELECT clause;
the intermediate result caching strategy comprises a selection strategy, a caching failure strategy and a caching replacement strategy, wherein the selection strategy is used for selecting the temporal locality and the spatial locality of an intermediate result generated according to a query execution flow, the caching failure strategy is used for setting a data cache and appointing caching failure by adopting a time standard, and the caching replacement strategy is used for removing data, the use frequency of which is lower than a set frequency threshold value within a set time range, from a caching area by adopting an LRU algorithm.
More preferably, the reference indexes input by the machine learning model comprise CPU processing time, I/O operands, data transmission quantity and data transmission mode;
The weight is characterized in that corresponding weight is given to the influence degree of the query time according to each reference index, linear relation description is adopted, the formula is y=ax 1+bx2+cx3+dx4, x 1,x2,x3,x4 is the numerical value of CPU processing time, I/O operand, data transmission quantity and transmission mode, and a, b, c and d are the weights of each reference index.
More preferably, the query plan generator predicts the distribution of query results using statistical information and a machine learning algorithm, and is capable of dynamically adjusting the query plan based on the predicted results, the dynamically adjusting the content of the query plan including adjusting the parallelism of query execution and resource allocation, and the method specifically comprises the following steps:
firstly, acquiring load resource information of the residual CPU core number, the residual memory capacity and the residual broadband capacity of a database system;
then, collecting query statistical information containing data access modes, data amounts and index use conditions, and controlling parallelism and memory use by using configuration parameters of maximum parallelism (max_parallel_depth) and working memory (work_mem);
meanwhile, a partitioning technology is used for a data table with a size exceeding a set threshold, and each query operation is executed on different partitions in parallel.
A data index and query optimization method based on a database comprises the following steps:
S1, receiving a query request and analyzing the content of the query request, wherein the content of the query request comprises query conditions, ordering requirements and data aggregation operation;
S2, constructing or updating an index by using an index construction module, and adapting to the change of the data record in the data storage module; the index adjustment strategy of the index construction module is based on data access frequency, data distribution and system load setting, and specifically comprises the steps of identifying data needing to be frequently accessed according to past query logs, optimizing related indexes, and adjusting the related indexes by combining the change condition of the data distribution after updating the data;
S3, generating a query execution plan by utilizing a query optimization module, wherein the content of the query execution plan comprises the steps of evaluating and selecting an optimal query path by using a cost evaluator and predicting query result distribution by using a machine learning model;
s4, executing the query and returning a result, and collecting statistical information of query execution for subsequent optimization, wherein the statistical information comprises query execution time, resource consumption and user satisfaction.
Preferably, constructing or updating the index includes incremental updating and full reconstruction;
The incremental updating adopts a partial updating mode, namely when the data change amount does not exceed a set threshold value, the incremental updating only updates the existing index so as to reflect the insertion, deletion and updating of the data record;
when the scale of the data change exceeds a set threshold, reconstructing the index to optimize performance, namely reconstructing the index for all the data, and establishing a corresponding mapping relation.
More preferably, the query optimization module works as follows:
s301, an initial query plan and a solution space saved for a query operation in the past are used as external input to be sent to a query optimization module;
S302, the query optimization module performs continuous optimization based on the established machine learning model and generates a corresponding query plan;
s303, the query plan is directly used for query operation of the database, time information of the query operation is used for feedback machine learning model calculation, and the steps are repeated until a group of optimal results are found.
The database-based data index and query optimization system and method of the invention have the following advantages:
The invention improves the efficiency of data index and the performance of query through the data storage module, the index construction module and the query optimization module;
the data record in the database is stored through the data storage module, the data record is stored in a column type, the data compression rate and the query performance storage are improved, and the disk I/O operation is reduced;
And thirdly, constructing an index according to the data record, wherein the index adopts a B+ tree design, and the number of I/O operations when data is transmitted between a disk and a memory is reduced.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a topology diagram of a database system household;
FIG. 2 is a block diagram of a database-based data indexing and query optimization system;
FIG. 3 is a flow chart diagram of the operation of the query optimization module;
FIG. 4 is a flow chart diagram of a database-based data indexing and query optimization method.
Detailed Description
The database-based data indexing and query optimization system and method of the present invention are described in detail below with reference to the drawings and detailed description.
Example 1:
as shown in fig. 1 and 2, the present embodiment provides a database-based data indexing and query optimization system, which includes a query optimization module, an index construction module, and a data storage module;
The data storage module is used for storing data records in the database, wherein the data records are stored in a column, so that the data compression rate and the query performance are improved, and the disk I/O operation is reduced;
The index construction module is used for constructing an index according to the data record in the data storage module, the index adopts a B+ tree design, and the number of I/O operations when data are transmitted between the disk and the memory is reduced;
The query optimization module is used for receiving the query request and optimizing the query path according to the index constructed by the index construction module.
The data storage module in the embodiment distributes data to a plurality of physical disks through RAID, and when different data is read and written, different disk blocks are used, so that the overall performance of data reading and writing is improved;
during actual reading and writing, the data storage module adopts a buffer pool formed by an internal database, so that physical reading and writing operations on a disk are reduced.
The index construction module in the embodiment adopts a dynamic index structure based on data access frequency and data distribution, and dynamically adjusts the size of index nodes according to the data access frequency, wherein the adjustment of the size of the index nodes is based on a preset performance threshold value;
the index construction module analyzes the distribution of the data on the storage medium and optimizes the index structure by adopting a distribution perception algorithm based on cluster analysis and data inclination detection.
The query optimization module in this embodiment includes a query plan generator and a cost evaluator;
Wherein the query plan generator is configured to generate an index-based query execution plan;
The cost estimator is used for estimating the query time of different paths by adopting a machine learning model (cost model).
In this embodiment, the index-based query execution plan includes a multi-stage query execution flow and an intermediate result caching policy;
The multi-stage query execution flow comprises data retrieval, data filtering, data projection, sequencing, connection and aggregation operations, wherein the data retrieval is based on index to retrieve data from a storage engine, the data filtering adopts a WHERE clause to filter the data, so that the data quantity of subsequent operations is reduced, and the data projection type SELECTs a required column according to the SELECT clause;
the intermediate result caching strategy comprises a selection strategy, a caching failure strategy and a caching replacement strategy, wherein the selection strategy is used for selecting the temporal locality and the spatial locality of an intermediate result generated according to a query execution flow, the caching failure strategy is used for setting a data cache and appointing caching failure by adopting a time standard, and the caching replacement strategy is used for removing data, the use frequency of which is lower than a set frequency threshold value within a set time range, from a caching area by adopting an LRU algorithm.
The reference index input by the machine learning model in the embodiment comprises CPU processing time, I/O operand, data transmission quantity and data transmission mode, wherein different weights are given to each factor at first;
The weight is characterized in that corresponding weight is given to the influence degree of the query time according to each reference index, linear relation description is adopted, the formula is y=ax 1+bx2+cx3+dx4, x 1,x2,x3,x4 is the numerical value of CPU processing time, I/O operand, data transmission quantity and transmission mode, and a, b, c and d are the weights of each reference index.
The query plan generator in this embodiment predicts the distribution of query results by using statistical information and a machine learning algorithm, and can dynamically adjust the query plan according to the predicted results, wherein the dynamic adjustment of the content of the query plan includes adjustment of parallelism of query execution and resource allocation, and specifically comprises the following steps:
firstly, acquiring load resource information of the residual CPU core number, the residual memory capacity and the residual broadband capacity of a database system;
then, collecting query statistical information containing data access modes, data amounts and index use conditions, and controlling parallelism and memory use by using configuration parameters of maximum parallelism (max_parallel_depth) and working memory (work_mem);
meanwhile, a partitioning technology is used for a data table with a size exceeding a set threshold, and each query operation is executed on different partitions in parallel.
Example 2:
As shown in fig. 4, this embodiment provides a data index and query optimization method based on a database, which specifically includes the following steps:
S1, receiving a query request and analyzing the content of the query request, wherein the content of the query request comprises query conditions, ordering requirements and data aggregation operation;
S2, constructing or updating an index by using an index construction module, and adapting to the change of the data record in the data storage module; the index adjustment strategy of the index construction module is based on data access frequency, data distribution and system load setting, and specifically comprises the steps of identifying data needing to be frequently accessed according to past query logs, optimizing related indexes, and adjusting the related indexes by combining the change condition of the data distribution after updating the data;
S3, generating a query execution plan by utilizing a query optimization module, wherein the content of the query execution plan comprises the steps of evaluating and selecting an optimal query path by using a cost evaluator and predicting query result distribution by using a machine learning model;
s4, executing the query and returning a result, and collecting statistical information of query execution for subsequent optimization, wherein the statistical information comprises query execution time, resource consumption and user satisfaction.
In this embodiment, constructing or updating the index includes incremental updating and full reconstruction;
The incremental updating adopts a partial updating mode, namely when the data change amount does not exceed a set threshold value, the incremental updating only updates the existing index so as to reflect the insertion, deletion and updating of the data record;
when the scale of the data change exceeds a set threshold, reconstructing the index to optimize performance, namely reconstructing the index for all the data, and establishing a corresponding mapping relation.
As shown in fig. 3, the working process of the query optimization module in this embodiment is specifically as follows:
s301, an initial query plan and a solution space saved for a query operation in the past are used as external input to be sent to a query optimization module;
S302, the query optimization module performs continuous optimization based on the established machine learning model and generates a corresponding query plan;
s303, the query plan is directly used for query operation of the database, time information of the query operation is used for feedback machine learning model calculation, and the steps are repeated until a group of optimal results are found.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims (10)

1.一种基于数据库的数据索引与查询优化系统,其特征在于,该系统包括查询优化模块、索引构建模块及数据存储模块;1. A data indexing and query optimization system based on a database, characterized in that the system comprises a query optimization module, an index building module and a data storage module; 其中,数据存储模块用于存储数据库中的数据记录;数据记录以列式存储,提高数据压缩率和查询性能存储,减少磁盘I/O操作;Among them, the data storage module is used to store data records in the database; data records are stored in columnar format to improve data compression rate and query performance storage, and reduce disk I/O operations; 索引构建模块用于根据数据存储模块中的数据记录构建索引,索引采用B+树设计,减少数据在磁盘和内存之间传输时I/O操作的次数;The index building module is used to build indexes based on the data records in the data storage module. The index adopts B+ tree design to reduce the number of I/O operations when data is transferred between disk and memory; 查询优化模块用于接收查询请求并根据索引构建模块构建的索引优化查询路径。The query optimization module is used to receive query requests and optimize the query path according to the index built by the index building module. 2.根据权利要求1所述的基于数据库的数据索引与查询优化系统,其特征在于,所述数据存储模块通过RAID将数据分布到多个物理磁盘上,读写不同的数据时,使用不同的磁盘块,提高数据读写的整体性能;2. The database-based data indexing and query optimization system according to claim 1 is characterized in that the data storage module distributes data to multiple physical disks through RAID, and uses different disk blocks when reading and writing different data, thereby improving the overall performance of data reading and writing; 实际读写时,数据存储模块采用内部数据库构成的缓冲池,减少对磁盘的物理读写操作。During actual reading and writing, the data storage module uses a buffer pool composed of an internal database to reduce physical read and write operations on the disk. 3.根据权利要求1所述的基于数据库的数据索引与查询优化系统,其特征在于,所述索引构建模块采用基于数据访问频率和数据分布的动态索引结构,并根据数据访问频率动态调整索引节点的大小;其中,索引节点大小的调整基于预设的性能阈值;3. The database-based data indexing and query optimization system according to claim 1, characterized in that the index building module adopts a dynamic index structure based on data access frequency and data distribution, and dynamically adjusts the size of the index node according to the data access frequency; wherein the adjustment of the index node size is based on a preset performance threshold; 索引构建模块采用基于聚类分析和数据倾斜检测的分布感知算法分析数据在存储介质上的分布并优化索引结构。The index building module uses a distribution-aware algorithm based on cluster analysis and data skew detection to analyze the distribution of data on the storage medium and optimize the index structure. 4.根据权利要求1所述的基于数据库的数据索引与查询优化系统,其特征在于,所述查询优化模块包括查询计划生成器和成本评估器;4. The database-based data indexing and query optimization system according to claim 1, characterized in that the query optimization module includes a query plan generator and a cost evaluator; 其中,查询计划生成器用于生成基于索引的查询执行计划;Among them, the query plan generator is used to generate an index-based query execution plan; 成本评估器用于采用机器学习模型的方式评估不同路径的查询时间。The cost evaluator is used to evaluate the query time of different paths using a machine learning model. 5.根据权利要求4所述的基于数据库的数据索引与查询优化系统,其特征在于,基于索引的查询执行计划包括多阶段查询执行流程和中间结果缓存策略;5. The database-based data indexing and query optimization system according to claim 4, characterized in that the index-based query execution plan includes a multi-stage query execution process and an intermediate result caching strategy; 其中,多阶段查询执行流程包括数据检索、数据过滤、数据投影以及排序、连接和聚合操作;数据检索是基于索引从存储引擎检索数据;数据过滤采用WHERE子句过滤数据,减少后续操作的数据量;数据投影式根据SELECT子句选择所需要的列;The multi-stage query execution process includes data retrieval, data filtering, data projection, sorting, joining and aggregation operations; data retrieval is to retrieve data from the storage engine based on the index; data filtering uses the WHERE clause to filter data to reduce the amount of data for subsequent operations; data projection selects the required columns according to the SELECT clause; 中间结果缓存策略包括选择策略、缓存失效策略、缓存替换策略;选择策略是根据查询执行流程产生的中间结果的时间局部性和空间局部性作为选择策略;缓存失效策略是在设置数据缓存同时采用时间标准指定缓存失效;缓存替换策略采用LRU算法将设定时间范围内使用频率低于设定次数阈值的数据剔除缓存区域。The intermediate result cache strategy includes selection strategy, cache invalidation strategy, and cache replacement strategy; the selection strategy is based on the temporal locality and spatial locality of the intermediate results generated by the query execution process; the cache invalidation strategy is to specify cache invalidation by using time standards while setting data cache; the cache replacement strategy uses the LRU algorithm to remove data from the cache area with a usage frequency lower than the set number threshold within the set time range. 6.根据权利要求4所述的基于数据库的数据索引与查询优化系统,其特征在于,所述机器学习模型输入的参考指标包括CPU处理时间、I/O操作数、数据传输量以及数据传输方式;初始时为每个因素赋以不同的权重;6. The database-based data indexing and query optimization system according to claim 4, characterized in that the reference indicators input into the machine learning model include CPU processing time, number of I/O operations, data transmission volume, and data transmission mode; different weights are initially assigned to each factor; 其中,权重是根据各个参考指标对于查询时间影响的程度赋以相应的权重,采用线性关系描述,公式为y=ax1+bx2+cx3+dx4;其中,x1,x2,x3,x4分别为CPU处理时间、I/O操作数、数据传输量、传输方式的数值,a,b,c,d分别为各项参考指标的权重。Among them, the weight is assigned according to the degree of influence of each reference indicator on the query time, and is described by a linear relationship. The formula is y= ax1 + bx2 + cx3 + dx4 ; among them, x1 , x2 , x3 , x4 are the values of CPU processing time, I/O operation number, data transmission volume, and transmission mode respectively, and a, b, c, d are the weights of each reference indicator respectively. 7.根据权利要求4所述的基于数据库的数据索引与查询优化系统,其特征在于,所述查询计划生成器利用统计信息和机器学习算法预测查询结果的分布,并且能够根据预测结果动态调整查询计划,动态调整查询计划的内容包括调整查询执行的并行度和资源分配;具体如下:7. According to claim 4, the data indexing and query optimization system based on database is characterized in that the query plan generator uses statistical information and machine learning algorithms to predict the distribution of query results, and can dynamically adjust the query plan according to the prediction results, and the content of dynamically adjusting the query plan includes adjusting the parallelism and resource allocation of query execution; specifically as follows: 首先,获取数据库系统的剩余CPU核心数、剩余内存容量以及剩余宽带容量的负载资源信息;First, obtain the load resource information of the remaining number of CPU cores, remaining memory capacity, and remaining broadband capacity of the database system; 然后,收集包含数据访问模式、数据量以及索引使用情况的查询统计信息,并使用最大并行度和工作内存的配置参数,控制并行度和内存使用;Then, query statistics including data access patterns, data volume, and index usage are collected, and configuration parameters for maximum parallelism and working memory are used to control parallelism and memory usage. 同时,针对大小超设定阈值的数据表使用分区技术,并在不同的分区上并行执行各个查询操作。At the same time, partitioning technology is used for data tables whose size exceeds the set threshold, and each query operation is executed in parallel on different partitions. 8.一种基于数据库的数据索引与查询优化方法,其特征在于,该方法具体如下:8. A method for data indexing and query optimization based on a database, characterized in that the method is specifically as follows: S1、接收查询请求,并解析查询请求的内容;其中,查询请求的内容包括查询条件、排序要求和数据聚合操作;S1. Receive a query request and parse the content of the query request; wherein the content of the query request includes query conditions, sorting requirements, and data aggregation operations; S2、利用索引构建模块构建或更新索引,适应数据存储模块中数据记录的变化;同时索引构建模块根据数据访问模式的变化动态调整索引结构,动态调整索引结构的内容包括索引的分裂、合并或重新组织;索引构建模块的索引调整策略是基于数据访问频率、数据分布和系统负载设定,具体为:根据过往的查询日志,识别需要频繁访问的数据,优化相关索引,并结合更新数据后数据分布的变化情况,调整相关索引;S2. Use the index building module to build or update the index to adapt to the changes in the data records in the data storage module; at the same time, the index building module dynamically adjusts the index structure according to the changes in the data access mode, and the content of the dynamic adjustment of the index structure includes splitting, merging or reorganizing the index; the index adjustment strategy of the index building module is based on the data access frequency, data distribution and system load setting, specifically: according to the past query logs, identify the data that needs to be accessed frequently, optimize the relevant indexes, and adjust the relevant indexes in combination with the changes in the data distribution after the data is updated; S3、利用查询优化模块生成查询执行计划,查询执行计划的内容包括使用成本评估器评估并选择最优查询路径以及使用机器学习模型预测查询结果分布;S3. Generate a query execution plan using a query optimization module. The query execution plan includes using a cost evaluator to evaluate and select an optimal query path and using a machine learning model to predict the query result distribution. S4、执行查询并返回结果,同时收集查询执行的统计信息以供后续优化使用;其中,统计信息包括查询执行时间、资源消耗和用户满意度。S4. Execute the query and return the result, and collect the statistical information of the query execution for subsequent optimization; the statistical information includes the query execution time, resource consumption and user satisfaction. 9.根据权利要求8所述的基于数据库的数据索引与查询优化方法,其特征在于,构建或更新索引包括增量更新和全量重建;9. The method for data indexing and query optimization based on a database according to claim 8, wherein building or updating an index includes incremental updating and full reconstruction; 其中,增量更新采用部分更新的方式,即在数据变化量未超过设定阈值时,仅针对已有索引进行更新,以反映数据记录的插入、删除和更新;Among them, incremental update adopts the partial update method, that is, when the data change amount does not exceed the set threshold, only the existing index is updated to reflect the insertion, deletion and update of data records; 当数据发生变化的规模超过设定阈值时,重建索引以优化性能,即针对全部的数据重新构建索引,并建立对应的映射关系。When the scale of data changes exceeds the set threshold, the index is rebuilt to optimize performance, that is, the index is rebuilt for all data and the corresponding mapping relationship is established. 10.根据权利要求8或9所述的基于数据库的数据索引与查询优化方法,其特征在于,查询优化模块的工作过程具体如下:10. The method for data indexing and query optimization based on a database according to claim 8 or 9, wherein the working process of the query optimization module is as follows: S301、初始的查询计划和过往针对查询操作保存的解空间作为外界的输入发送到查询优化模块;S301, the initial query plan and the solution space saved for the query operation in the past are sent to the query optimization module as external input; S302、查询优化模块基于建立的机器学习模型进行不断的调优,并生成对应的查询计划;S302, the query optimization module continuously optimizes based on the established machine learning model and generates a corresponding query plan; S303、查询计划直接用于数据库的查询操作,将查询操作的时间信息用于反馈机器学习模型计算,如此反复直到找到一组最优的结果。S303. The query plan is directly used for the query operation of the database, and the time information of the query operation is used to feed back the machine learning model calculation, and this process is repeated until a set of optimal results is found.
CN202411068003.9A 2024-08-06 2024-08-06 Data indexing and query optimization system and method based on database Pending CN119088800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411068003.9A CN119088800A (en) 2024-08-06 2024-08-06 Data indexing and query optimization system and method based on database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411068003.9A CN119088800A (en) 2024-08-06 2024-08-06 Data indexing and query optimization system and method based on database

Publications (1)

Publication Number Publication Date
CN119088800A true CN119088800A (en) 2024-12-06

Family

ID=93665627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411068003.9A Pending CN119088800A (en) 2024-08-06 2024-08-06 Data indexing and query optimization system and method based on database

Country Status (1)

Country Link
CN (1) CN119088800A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119474106A (en) * 2025-01-14 2025-02-18 浙江智臾科技有限公司 A method for asynchronous vector index construction and vector retrieval
CN119862206A (en) * 2024-12-18 2025-04-22 浪潮云信息技术股份公司 Dynamic SQL mapping optimization method, system, equipment and medium
CN120372786A (en) * 2025-06-27 2025-07-25 苏交科集团股份有限公司 Channel model dynamic association method based on multi-source data index

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119862206A (en) * 2024-12-18 2025-04-22 浪潮云信息技术股份公司 Dynamic SQL mapping optimization method, system, equipment and medium
CN119474106A (en) * 2025-01-14 2025-02-18 浙江智臾科技有限公司 A method for asynchronous vector index construction and vector retrieval
CN120372786A (en) * 2025-06-27 2025-07-25 苏交科集团股份有限公司 Channel model dynamic association method based on multi-source data index

Similar Documents

Publication Publication Date Title
US7979399B2 (en) Database journaling in a multi-node environment
CN119088800A (en) Data indexing and query optimization system and method based on database
US10331572B2 (en) Selective data mirroring for in-memory databases
US8090710B2 (en) Index maintenance in a multi-node database
US8732163B2 (en) Query optimization with memory I/O awareness
US7979400B2 (en) Database journaling in a multi-node environment
US9740741B2 (en) Aggregate query-caching in databases architectures with a differential buffer and a main store
US8195642B2 (en) Partial indexes for multi-node database
JP5088668B2 (en) Computer load estimation system, computer load estimation method, computer load estimation program
CN110362566B (en) Data placement in a hybrid data layout of a hierarchical HTAP database
CN117951119B (en) Database performance optimization method based on cloud computing
CN111475507B (en) Key value data indexing method for workload adaptive single-layer LSMT
US20240330289A1 (en) Methods and systems for transforming distributed database structure for reduced compute load
CN109918450A (en) Based on the distributed parallel database and storage method under analysis classes scene
KR20150089544A (en) Apparatus of managing data and method of managing data for supporting mixed workload
Heidari et al. Doblix: A dual-objective learned index for log-structured merge trees
Hsiao et al. Reducing mapreduce abstraction costs for text-centric applications
US12481630B1 (en) Optimizing file storage in data lake tables
Martha Efficient Data Processing with Apache Spark
US20250321801A1 (en) Database system performance of a storage rebalancing process
US12386831B2 (en) Query execution via scheduling segment chunks for parallelized processing based on requested number of rows
US12380103B1 (en) Distributed limit processing in a database system
US20090287639A1 (en) Embedding Densities in a Data Structure
CN120492453A (en) Metadata management method and device and electronic equipment
Wang et al. Automatic Storage Structure Selection for hybrid Workload

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination