CN119088800A - Data indexing and query optimization system and method based on database - Google Patents
Data indexing and query optimization system and method based on database Download PDFInfo
- Publication number
- CN119088800A CN119088800A CN202411068003.9A CN202411068003A CN119088800A CN 119088800 A CN119088800 A CN 119088800A CN 202411068003 A CN202411068003 A CN 202411068003A CN 119088800 A CN119088800 A CN 119088800A
- Authority
- CN
- China
- Prior art keywords
- data
- query
- index
- database
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a database-based data index and query optimization system and a method, which belong to the technical field of databases, and the technical problem to be solved by the invention is how to improve the data index efficiency and the query performance of a database system; the data storage module is used for storing data records in a database, the data records are stored in a column mode, the data compression rate and the query performance are improved, the disk I/O operation is reduced, the index building module is used for building an index according to the data records in the data storage module, the index adopts a B+ tree design, the number of times of the I/O operation when data are transmitted between the disk and the memory is reduced, and the query optimization module is used for receiving a query request and optimizing a query path according to the index built by the index building module.
Description
Technical Field
The invention relates to the technical field of databases, in particular to a database-based data index and query optimization system and method.
Background
A database is a collection of data that is stored on a computer's storage device for a long period of time, organized according to certain rules, and sharable by various users or applications. It is a centralized data store, typically consisting of one or more data tables, each containing a plurality of rows and columns, for storing particular types of data. The database is mainly used for effectively organizing, storing and managing a large amount of data and providing the functions of structured storage, inquiry, modification, sequencing, statistics and the like of the data.
The database system is a more ideal data processing system developed to adapt to the data processing requirement, and is also a software system for providing data for the actual operable storage, maintenance and application system. Database systems are typically composed of four parts, database (DB), hardware, software (including operating systems, database management systems DBMSs, and applications), and personnel (including system analysts, database designers, application programmers, end users, and database administrators DBA). Among these, the database management system (DBMS) is the core software of the database system.
In recent years, with the large increase of data volume in society, the conventional database system often encounters problems of low database index efficiency and long query response time when processing large-scale data, and the performance of the database system is limited.
Therefore, how to improve the data index efficiency and the query performance of the database system is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a database-based data indexing and query optimizing system and method, which are used for solving the problem of how to improve the data indexing efficiency and the query performance of a database system.
The technical task of the invention is realized in the following way, a database-based data indexing and query optimizing system comprises a query optimizing module, an index constructing module and a data storage module;
The data storage module is used for storing data records in the database, wherein the data records are stored in a column, so that the data compression rate and the query performance are improved, and the disk I/O operation is reduced;
The index construction module is used for constructing an index according to the data record in the data storage module, the index adopts a B+ tree design, and the number of I/O operations when data are transmitted between the disk and the memory is reduced;
The query optimization module is used for receiving the query request and optimizing the query path according to the index constructed by the index construction module.
Preferably, the data storage module distributes data to a plurality of physical disks through RAID, and when different data is read and written, different disk blocks are used, so that the overall performance of data reading and writing is improved;
during actual reading and writing, the data storage module adopts a buffer pool formed by an internal database, so that physical reading and writing operations on a disk are reduced.
Preferably, the index construction module adopts a dynamic index structure based on data access frequency and data distribution, and dynamically adjusts the size of index nodes according to the data access frequency, wherein the adjustment of the size of the index nodes is based on a preset performance threshold value;
the index construction module analyzes the distribution of the data on the storage medium and optimizes the index structure by adopting a distribution perception algorithm based on cluster analysis and data inclination detection.
Preferably, the query optimization module comprises a query plan generator and a cost evaluator;
Wherein the query plan generator is configured to generate an index-based query execution plan;
The cost estimator is used for estimating the query time of different paths by adopting a machine learning model (cost model).
More preferably, the index-based query execution plan includes a multi-stage query execution flow and an intermediate result caching strategy;
The multi-stage query execution flow comprises data retrieval, data filtering, data projection, sequencing, connection and aggregation operations, wherein the data retrieval is based on index to retrieve data from a storage engine, the data filtering adopts a WHERE clause to filter the data, so that the data quantity of subsequent operations is reduced, and the data projection type SELECTs a required column according to the SELECT clause;
the intermediate result caching strategy comprises a selection strategy, a caching failure strategy and a caching replacement strategy, wherein the selection strategy is used for selecting the temporal locality and the spatial locality of an intermediate result generated according to a query execution flow, the caching failure strategy is used for setting a data cache and appointing caching failure by adopting a time standard, and the caching replacement strategy is used for removing data, the use frequency of which is lower than a set frequency threshold value within a set time range, from a caching area by adopting an LRU algorithm.
More preferably, the reference indexes input by the machine learning model comprise CPU processing time, I/O operands, data transmission quantity and data transmission mode;
The weight is characterized in that corresponding weight is given to the influence degree of the query time according to each reference index, linear relation description is adopted, the formula is y=ax 1+bx2+cx3+dx4, x 1,x2,x3,x4 is the numerical value of CPU processing time, I/O operand, data transmission quantity and transmission mode, and a, b, c and d are the weights of each reference index.
More preferably, the query plan generator predicts the distribution of query results using statistical information and a machine learning algorithm, and is capable of dynamically adjusting the query plan based on the predicted results, the dynamically adjusting the content of the query plan including adjusting the parallelism of query execution and resource allocation, and the method specifically comprises the following steps:
firstly, acquiring load resource information of the residual CPU core number, the residual memory capacity and the residual broadband capacity of a database system;
then, collecting query statistical information containing data access modes, data amounts and index use conditions, and controlling parallelism and memory use by using configuration parameters of maximum parallelism (max_parallel_depth) and working memory (work_mem);
meanwhile, a partitioning technology is used for a data table with a size exceeding a set threshold, and each query operation is executed on different partitions in parallel.
A data index and query optimization method based on a database comprises the following steps:
S1, receiving a query request and analyzing the content of the query request, wherein the content of the query request comprises query conditions, ordering requirements and data aggregation operation;
S2, constructing or updating an index by using an index construction module, and adapting to the change of the data record in the data storage module; the index adjustment strategy of the index construction module is based on data access frequency, data distribution and system load setting, and specifically comprises the steps of identifying data needing to be frequently accessed according to past query logs, optimizing related indexes, and adjusting the related indexes by combining the change condition of the data distribution after updating the data;
S3, generating a query execution plan by utilizing a query optimization module, wherein the content of the query execution plan comprises the steps of evaluating and selecting an optimal query path by using a cost evaluator and predicting query result distribution by using a machine learning model;
s4, executing the query and returning a result, and collecting statistical information of query execution for subsequent optimization, wherein the statistical information comprises query execution time, resource consumption and user satisfaction.
Preferably, constructing or updating the index includes incremental updating and full reconstruction;
The incremental updating adopts a partial updating mode, namely when the data change amount does not exceed a set threshold value, the incremental updating only updates the existing index so as to reflect the insertion, deletion and updating of the data record;
when the scale of the data change exceeds a set threshold, reconstructing the index to optimize performance, namely reconstructing the index for all the data, and establishing a corresponding mapping relation.
More preferably, the query optimization module works as follows:
s301, an initial query plan and a solution space saved for a query operation in the past are used as external input to be sent to a query optimization module;
S302, the query optimization module performs continuous optimization based on the established machine learning model and generates a corresponding query plan;
s303, the query plan is directly used for query operation of the database, time information of the query operation is used for feedback machine learning model calculation, and the steps are repeated until a group of optimal results are found.
The database-based data index and query optimization system and method of the invention have the following advantages:
The invention improves the efficiency of data index and the performance of query through the data storage module, the index construction module and the query optimization module;
the data record in the database is stored through the data storage module, the data record is stored in a column type, the data compression rate and the query performance storage are improved, and the disk I/O operation is reduced;
And thirdly, constructing an index according to the data record, wherein the index adopts a B+ tree design, and the number of I/O operations when data is transmitted between a disk and a memory is reduced.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a topology diagram of a database system household;
FIG. 2 is a block diagram of a database-based data indexing and query optimization system;
FIG. 3 is a flow chart diagram of the operation of the query optimization module;
FIG. 4 is a flow chart diagram of a database-based data indexing and query optimization method.
Detailed Description
The database-based data indexing and query optimization system and method of the present invention are described in detail below with reference to the drawings and detailed description.
Example 1:
as shown in fig. 1 and 2, the present embodiment provides a database-based data indexing and query optimization system, which includes a query optimization module, an index construction module, and a data storage module;
The data storage module is used for storing data records in the database, wherein the data records are stored in a column, so that the data compression rate and the query performance are improved, and the disk I/O operation is reduced;
The index construction module is used for constructing an index according to the data record in the data storage module, the index adopts a B+ tree design, and the number of I/O operations when data are transmitted between the disk and the memory is reduced;
The query optimization module is used for receiving the query request and optimizing the query path according to the index constructed by the index construction module.
The data storage module in the embodiment distributes data to a plurality of physical disks through RAID, and when different data is read and written, different disk blocks are used, so that the overall performance of data reading and writing is improved;
during actual reading and writing, the data storage module adopts a buffer pool formed by an internal database, so that physical reading and writing operations on a disk are reduced.
The index construction module in the embodiment adopts a dynamic index structure based on data access frequency and data distribution, and dynamically adjusts the size of index nodes according to the data access frequency, wherein the adjustment of the size of the index nodes is based on a preset performance threshold value;
the index construction module analyzes the distribution of the data on the storage medium and optimizes the index structure by adopting a distribution perception algorithm based on cluster analysis and data inclination detection.
The query optimization module in this embodiment includes a query plan generator and a cost evaluator;
Wherein the query plan generator is configured to generate an index-based query execution plan;
The cost estimator is used for estimating the query time of different paths by adopting a machine learning model (cost model).
In this embodiment, the index-based query execution plan includes a multi-stage query execution flow and an intermediate result caching policy;
The multi-stage query execution flow comprises data retrieval, data filtering, data projection, sequencing, connection and aggregation operations, wherein the data retrieval is based on index to retrieve data from a storage engine, the data filtering adopts a WHERE clause to filter the data, so that the data quantity of subsequent operations is reduced, and the data projection type SELECTs a required column according to the SELECT clause;
the intermediate result caching strategy comprises a selection strategy, a caching failure strategy and a caching replacement strategy, wherein the selection strategy is used for selecting the temporal locality and the spatial locality of an intermediate result generated according to a query execution flow, the caching failure strategy is used for setting a data cache and appointing caching failure by adopting a time standard, and the caching replacement strategy is used for removing data, the use frequency of which is lower than a set frequency threshold value within a set time range, from a caching area by adopting an LRU algorithm.
The reference index input by the machine learning model in the embodiment comprises CPU processing time, I/O operand, data transmission quantity and data transmission mode, wherein different weights are given to each factor at first;
The weight is characterized in that corresponding weight is given to the influence degree of the query time according to each reference index, linear relation description is adopted, the formula is y=ax 1+bx2+cx3+dx4, x 1,x2,x3,x4 is the numerical value of CPU processing time, I/O operand, data transmission quantity and transmission mode, and a, b, c and d are the weights of each reference index.
The query plan generator in this embodiment predicts the distribution of query results by using statistical information and a machine learning algorithm, and can dynamically adjust the query plan according to the predicted results, wherein the dynamic adjustment of the content of the query plan includes adjustment of parallelism of query execution and resource allocation, and specifically comprises the following steps:
firstly, acquiring load resource information of the residual CPU core number, the residual memory capacity and the residual broadband capacity of a database system;
then, collecting query statistical information containing data access modes, data amounts and index use conditions, and controlling parallelism and memory use by using configuration parameters of maximum parallelism (max_parallel_depth) and working memory (work_mem);
meanwhile, a partitioning technology is used for a data table with a size exceeding a set threshold, and each query operation is executed on different partitions in parallel.
Example 2:
As shown in fig. 4, this embodiment provides a data index and query optimization method based on a database, which specifically includes the following steps:
S1, receiving a query request and analyzing the content of the query request, wherein the content of the query request comprises query conditions, ordering requirements and data aggregation operation;
S2, constructing or updating an index by using an index construction module, and adapting to the change of the data record in the data storage module; the index adjustment strategy of the index construction module is based on data access frequency, data distribution and system load setting, and specifically comprises the steps of identifying data needing to be frequently accessed according to past query logs, optimizing related indexes, and adjusting the related indexes by combining the change condition of the data distribution after updating the data;
S3, generating a query execution plan by utilizing a query optimization module, wherein the content of the query execution plan comprises the steps of evaluating and selecting an optimal query path by using a cost evaluator and predicting query result distribution by using a machine learning model;
s4, executing the query and returning a result, and collecting statistical information of query execution for subsequent optimization, wherein the statistical information comprises query execution time, resource consumption and user satisfaction.
In this embodiment, constructing or updating the index includes incremental updating and full reconstruction;
The incremental updating adopts a partial updating mode, namely when the data change amount does not exceed a set threshold value, the incremental updating only updates the existing index so as to reflect the insertion, deletion and updating of the data record;
when the scale of the data change exceeds a set threshold, reconstructing the index to optimize performance, namely reconstructing the index for all the data, and establishing a corresponding mapping relation.
As shown in fig. 3, the working process of the query optimization module in this embodiment is specifically as follows:
s301, an initial query plan and a solution space saved for a query operation in the past are used as external input to be sent to a query optimization module;
S302, the query optimization module performs continuous optimization based on the established machine learning model and generates a corresponding query plan;
s303, the query plan is directly used for query operation of the database, time information of the query operation is used for feedback machine learning model calculation, and the steps are repeated until a group of optimal results are found.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411068003.9A CN119088800A (en) | 2024-08-06 | 2024-08-06 | Data indexing and query optimization system and method based on database |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411068003.9A CN119088800A (en) | 2024-08-06 | 2024-08-06 | Data indexing and query optimization system and method based on database |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119088800A true CN119088800A (en) | 2024-12-06 |
Family
ID=93665627
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411068003.9A Pending CN119088800A (en) | 2024-08-06 | 2024-08-06 | Data indexing and query optimization system and method based on database |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119088800A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119474106A (en) * | 2025-01-14 | 2025-02-18 | 浙江智臾科技有限公司 | A method for asynchronous vector index construction and vector retrieval |
| CN119862206A (en) * | 2024-12-18 | 2025-04-22 | 浪潮云信息技术股份公司 | Dynamic SQL mapping optimization method, system, equipment and medium |
| CN120372786A (en) * | 2025-06-27 | 2025-07-25 | 苏交科集团股份有限公司 | Channel model dynamic association method based on multi-source data index |
-
2024
- 2024-08-06 CN CN202411068003.9A patent/CN119088800A/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119862206A (en) * | 2024-12-18 | 2025-04-22 | 浪潮云信息技术股份公司 | Dynamic SQL mapping optimization method, system, equipment and medium |
| CN119474106A (en) * | 2025-01-14 | 2025-02-18 | 浙江智臾科技有限公司 | A method for asynchronous vector index construction and vector retrieval |
| CN120372786A (en) * | 2025-06-27 | 2025-07-25 | 苏交科集团股份有限公司 | Channel model dynamic association method based on multi-source data index |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7979399B2 (en) | Database journaling in a multi-node environment | |
| CN119088800A (en) | Data indexing and query optimization system and method based on database | |
| US10331572B2 (en) | Selective data mirroring for in-memory databases | |
| US8090710B2 (en) | Index maintenance in a multi-node database | |
| US8732163B2 (en) | Query optimization with memory I/O awareness | |
| US7979400B2 (en) | Database journaling in a multi-node environment | |
| US9740741B2 (en) | Aggregate query-caching in databases architectures with a differential buffer and a main store | |
| US8195642B2 (en) | Partial indexes for multi-node database | |
| JP5088668B2 (en) | Computer load estimation system, computer load estimation method, computer load estimation program | |
| CN110362566B (en) | Data placement in a hybrid data layout of a hierarchical HTAP database | |
| CN117951119B (en) | Database performance optimization method based on cloud computing | |
| CN111475507B (en) | Key value data indexing method for workload adaptive single-layer LSMT | |
| US20240330289A1 (en) | Methods and systems for transforming distributed database structure for reduced compute load | |
| CN109918450A (en) | Based on the distributed parallel database and storage method under analysis classes scene | |
| KR20150089544A (en) | Apparatus of managing data and method of managing data for supporting mixed workload | |
| Heidari et al. | Doblix: A dual-objective learned index for log-structured merge trees | |
| Hsiao et al. | Reducing mapreduce abstraction costs for text-centric applications | |
| US12481630B1 (en) | Optimizing file storage in data lake tables | |
| Martha | Efficient Data Processing with Apache Spark | |
| US20250321801A1 (en) | Database system performance of a storage rebalancing process | |
| US12386831B2 (en) | Query execution via scheduling segment chunks for parallelized processing based on requested number of rows | |
| US12380103B1 (en) | Distributed limit processing in a database system | |
| US20090287639A1 (en) | Embedding Densities in a Data Structure | |
| CN120492453A (en) | Metadata management method and device and electronic equipment | |
| Wang et al. | Automatic Storage Structure Selection for hybrid Workload |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |