CN119440972A

CN119440972A - Database performance testing method and related device

Info

Publication number: CN119440972A
Application number: CN202310956807.1A
Authority: CN
Inventors: 练炳诚; 蔡宝清; 李春花; 周可; 原嘉; 邢家树; 蔡晓帆; 杨杰
Original assignee: Huazhong University of Science and Technology; Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Huazhong University of Science and Technology; Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2025-02-14

Abstract

The application relates to the technical field of computers and provides a database performance test method and a related device for improving the test efficiency of database performance test, wherein the method comprises the steps of extracting respective file structure information of storage files associated with target fields from obtained transaction templates based on the set target fields, obtaining file generation sequences among the storage files based on the obtained file structure information, obtaining numerical distribution conditions of each attribute contained in the storage files, further constructing virtual synthesized files based on the file structure information and the numerical distribution conditions corresponding to the storage files, combining the file generation sequences, and performing performance test based on the constructed virtual synthesized files to obtain performance test results.

Description

Database performance test method and related device

Technical Field

The application relates to the technical field of computers, and provides a database performance testing method and a related device.

Background

With rapid development of computer technology, in a cloud computing scenario, in order to ensure service stability, performance test cannot be performed in an actual database environment, so how to collect service features and generate similar synthesized loads to perform database test is a problem to be solved.

In the related art, a historical load of an acquired original database is analyzed to obtain transaction templates corresponding to each transaction contained in the historical load, each transaction template is obtained by parameterizing each constant in the corresponding transaction, then, feature extraction is carried out on the original database to obtain library table structure information, a synthetic database is obtained according to the library table structure information, then, each transaction template is instantiated to obtain each synthetic transaction, and each synthetic transaction is executed in the synthetic database to obtain a performance test result.

However, the amount of data stored in the original database may be huge, and thus, if the synthetic database is directly generated from the original database, the data synthesis amount is large and the data synthesis efficiency is low.

Disclosure of Invention

The embodiment of the application provides a database performance test method and a related device, which are used for improving the data synthesis efficiency, so that the test efficiency of the database performance test is improved.

In a first aspect, an embodiment of the present application provides a database performance testing method, including:

Based on at least one constant contained in each historical transaction in a designated time period in an original database, carrying out parameterization processing on each historical transaction respectively to obtain a corresponding transaction template;

based on a set target field, extracting respective file structure information of each storage file associated with the target field from each obtained transaction template;

Based on the obtained file structure information, obtaining a file generation sequence among the storage files and obtaining numerical distribution conditions of each attribute contained in each storage file;

and constructing virtual synthesized files corresponding to the storage files respectively based on the file structure information and the numerical distribution condition corresponding to the storage files respectively and combining the file generation sequence, and performing performance test based on the constructed virtual synthesized files to obtain a performance test result.

In a second aspect, an embodiment of the present application provides a database performance testing apparatus, including:

the parameterization unit is used for parameterizing each historical transaction based on at least one constant contained in each historical transaction in a specified time period in the original database to obtain a corresponding transaction template;

the structure extraction unit is used for extracting respective file structure information of each storage file associated with the target field from each obtained transaction template based on the set target field;

A distribution extraction unit, configured to obtain a file generation sequence between the storage files based on the obtained file structure information, and obtain a numerical distribution condition of each attribute included in each storage file;

And the testing unit is used for constructing virtual synthesized files corresponding to the storage files respectively based on the file structure information and the numerical distribution situation corresponding to the storage files respectively and combining the file generation sequence, and performing performance test based on the constructed virtual synthesized files to obtain a performance test result.

As a possible implementation manner, the performance test is performed based on each constructed virtual composite file, and when a performance test result is obtained, the test unit is specifically configured to:

if the performance test is a performance simulation test, performing the performance test by adopting each constructed virtual synthesis file based on the performance constraint parameter set of each historical transaction in the original database to obtain a performance test result;

and if the performance test is performance press test, performing multiple rounds of performance tests based on each constructed virtual synthetic file to obtain a performance test result.

As a possible implementation manner, the performance test is performed by using each constructed virtual synthesis file based on the performance constraint parameter set of each historical transaction in the original database, and when a performance test result is obtained, the test unit is specifically configured to:

Based on the corresponding historical execution results of each transaction template, each transaction template is respectively instantiated to obtain corresponding instantiated transaction, and based on the performance constraint parameter set of each historical transaction in the original database, each obtained instantiated transaction and each virtual synthesized file are combined to obtain a performance test result, or

And based on the performance constraint parameter set of each historical transaction in the original database, combining each historical transaction with each constructed virtual synthesized file to obtain a performance test result.

As a possible implementation manner, the test unit is specifically configured to perform multiple rounds of performance tests based on each constructed virtual composite file, and obtain a performance test result:

Sorting the transaction templates based on the historical transaction numbers corresponding to the transaction templates, extracting target templates from the transaction templates according to the sorting results and the sorting ratios, and performing multiple performance tests in combination with the virtual synthesized files according to the instantiated transactions obtained by the instantiation of the target templates to obtain performance test results, or

Based on the target sub-load configured in the configuration file, combining the corresponding relation between the target sub-load and the target template, obtaining each target template corresponding to the target sub-load from each transaction template, and performing multiple rounds of performance test according to each instantiation transaction obtained through each target template instantiation and combining each virtual synthesis file to obtain a performance test result.

performing a first round of performance test based on each constructed virtual synthesized file to obtain a first round of reference performance test result;

Starting from the second round of performance test, performing the current round of performance test based on each constructed virtual synthesized file to obtain the current performance test result of the current round, and performing data recovery on each virtual synthesized file after the current round of performance test when determining that data offset occurs based on the reference performance test result and the current performance test result.

As a possible implementation, the test unit is further configured to:

Based on the execution starting time corresponding to each transaction template, sequencing each transaction template to obtain a transaction template sequence;

Dividing each transaction template in the transaction template sequence according to a set time interval to obtain a plurality of groups of transaction templates, wherein each group of transaction templates comprises at least one transaction template;

And clustering the multiple groups of transaction templates based on the category characteristics corresponding to the multiple groups of transaction templates, and obtaining at least one group of merged transaction templates.

As a possible implementation manner, when the plurality of groups of transaction templates are clustered based on the category features corresponding to the plurality of groups of transaction templates, and at least one group of merged transaction templates is obtained, the test unit is specifically configured to:

At least one merging process is carried out on the plurality of groups of transaction templates, and the following operations are executed in each merging process:

Acquiring current transaction templates of each group, determining two groups of transaction templates which meet preset merging conditions in the current transaction templates of each group based on respective corresponding category characteristics of the current transaction templates, and merging the two groups of transaction templates to obtain a new transaction template of each group;

and obtaining category characteristics corresponding to the new transaction templates based on at least one transaction template contained in each of the two transaction templates.

As a possible implementation manner, the test unit is specifically configured to, when determining that two groups of transaction templates that meet a preset merging condition exist in the current transaction templates based on the class features corresponding to the current transaction templates, where the class features correspond to the current transaction templates respectively:

if the category characteristics are template similarity between two groups of transaction templates, determining that two groups of transaction templates which meet a preset merging condition exist in the current transaction templates when the template similarity between the two groups of transaction templates is smaller than a preset similarity threshold;

if the category characteristic is an evaluation value of the number of the executed sentences between the two groups of transaction templates, determining that two groups of transaction templates which meet preset merging conditions exist in the current transaction templates when the difference value of the evaluation value of the number of the executed sentences between the two groups of transaction templates is smaller than a preset evaluation value threshold, wherein the evaluation value of the number of the executed sentences is used for representing the number of the executed SQL sentences in unit time.

As a possible implementation manner, when the set target field is based on the obtained transaction templates and the respective file structure information of each storage file associated with the target field is extracted, the structure extracting unit is specifically configured to:

Extracting file names contained in each transaction template from the obtained transaction templates based on the set target fields;

And respectively initiating a file structure query request to the original database based on the obtained file names to obtain the file structure information of the corresponding storage file.

As a possible implementation manner, each file structure information includes each attribute included in a corresponding storage file, and the obtaining, based on the obtained file structure information, a file generation order between the storage files includes:

Determining file dependency relations among all the storage files based on all the attributes contained in the obtained file structure information;

and obtaining the file generation sequence among the storage files based on the file dependency relationship.

As a possible implementation manner, when the numerical distribution of each attribute included in each storage file is obtained based on the obtained file structure information, the distribution extraction unit is specifically configured to:

extracting at least one joint column from each transaction template based on the set query field, wherein each joint column comprises at least one attribute in one storage file queried by the corresponding transaction template;

based on the extracted at least one joint column, combining the original database to obtain a numerical range corresponding to each attribute contained in the at least one joint column;

based on the at least one joint column, respectively carrying out sampling inquiry on the original database according to a set sampling proportion to obtain a corresponding sampling result, and based on the corresponding sampling result, obtaining corresponding sampling statistical information;

and obtaining the numerical distribution condition of each attribute contained in each storage file based on the sampling statistical information corresponding to each at least one joint column and each obtained numerical range.

As a possible implementation manner, when the virtual composite file corresponding to each storage file is constructed based on the file structure information and the numerical distribution situation corresponding to each storage file and in combination with the file generation sequence, the test unit is specifically configured to:

and sequentially constructing virtual synthesized files corresponding to the storage files according to the file generation sequence, wherein in each construction process, the following operations are respectively executed:

Determining each attribute contained in one virtual synthesized file based on file structure information of a storage file corresponding to the virtual synthesized file;

and generating a value set of each attribute contained in the virtual synthesized file based on the attribute type of each attribute contained in the virtual synthesized file and combining a data generation strategy corresponding to the attribute type, wherein each data generation strategy is generated based on the numerical value distribution condition of the corresponding attribute.

As a possible implementation manner, when the generating unit generates the value set of each attribute included in the virtual composite file based on the attribute type of each attribute included in the virtual composite file and in combination with the data generating policy corresponding to the attribute type, the testing unit is specifically configured to:

for each attribute contained in the virtual composite file, the following operations are respectively executed:

if one attribute is a primary key, adopting each non-repeated value as a value set of the one attribute;

If one attribute is an external key, determining a preamble of a storage file corresponding to the one virtual synthetic file from the storage files based on the file generation sequence, and generating a value set of the one attribute based on the numerical distribution condition of the one attribute in the preamble;

If one attribute does not belong to the primary key or the foreign key, generating a value set of the one attribute based on the numerical value distribution condition of the one attribute in the storage file corresponding to the one virtual synthetic file.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of the above-described method when the computer program is run on the electronic device.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of the method described above.

In the implementation of the application, after a transaction template corresponding to each historical transaction in a specified time period in an original database is obtained, based on a set target field, respective file structure information of each storage file associated with the target field is extracted from each obtained transaction template, then, based on the obtained file structure information, the file generation sequence among each storage file is obtained, the numerical distribution condition of each attribute contained in each storage file is obtained, further, based on the file structure information and the numerical distribution condition corresponding to each storage file, the virtual composite file is constructed by combining the file generation sequence, and performance test is performed based on the constructed virtual composite file, so that a performance test result is obtained.

In this way, because the load in a period of time has locality to the access of the library table and the attribute column, the storage files accessed in the historical transaction, the file structure information and the numerical distribution condition of the storage files and the file generation sequence among the storage files are analyzed and extracted, necessary data can be restrained and generated when the data is synthesized, the data generation speed is accelerated, the data synthesis efficiency is improved, the test efficiency is improved when the performance test is carried out, meanwhile, the similarity between the synthesized data and the original data is ensured to a certain extent, the data simulation effect is improved, and the performance test accuracy is further improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic diagram of an application scenario provided in the implementation of the present application;

FIG. 2 is a flowchart of a database performance testing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a transaction template provided in the practice of the present application;

FIG. 4 is a schematic diagram of a library table access analysis provided in the practice of the present application;

FIG. 5 is a schematic diagram of file structure information and data distribution provided in the embodiment of the present application;

FIG. 6 is a schematic diagram of joint column extraction provided in the practice of the present application;

FIG. 7 is a schematic diagram of a DAG provided in the practice of the present application;

FIG. 8 is a schematic diagram of a transaction instance distribution scenario provided in the practice of the present application;

FIG. 9 is a schematic diagram of a system architecture provided in the practice of the present application;

FIG. 10 is a flow chart of another database performance testing method provided in the implementation of the present application;

FIG. 11 is a schematic diagram of a database performance testing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

First, some concepts involved in the embodiments of the present application are explained.

SQL parameterization, which is to perform parameterization operation on constants such as numbers, character strings and the like appearing in an SQL sentence, so that the SQL sentence containing actual data is expressed as a parameterized SQL template.

And a transaction template, namely parameterizing all SQL of one transaction to obtain a parameterized transaction.

The joint column is a column name tuple WHERE the WHERE clause appears in the SQL statement. The federated column reflects the relevance of the attribute column at the query angle.

Value List in parameterized SQL procedure, all SQL parameterized parts contained in each transaction form a Value tuple, and Value tuples extracted by all transactions with the same transaction template form a Value List.

TPS transaction per second number of transactions executed per second.

Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application.

The Database management system (Database MANAGEMENT SYSTEM, DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup and the like. The database management system may be categorized according to the database model it supports, such as relational, extensible markup language (Extensible Markup Language, XML), or according to the type of computer supported, such as server clusters, mobile phones, or according to the query language used, such as structured query language (Structured Query Language, SQL), XQuery, or according to the performance impact emphasis, such as maximum scale, maximum speed of operation, or other categorization means. Regardless of the manner of classification used, some DBMSs are able to support multiple query languages across categories, for example, simultaneously.

Cloud computing (cloud computing) refers to the delivery and usage mode of an IT infrastructure, to obtaining required resources in an on-demand, easily scalable manner over a network, and generalized cloud computing refers to the delivery and usage mode of a service, to obtaining required services in an on-demand, easily scalable manner over a network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (DistributedComputing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load Balance), and the like.

With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept.

In a cloud computing scenario, in order to ensure service stability, performance test cannot be performed in an actual database environment, so how to collect service features and generate similar synthesized loads to perform database test is a problem to be solved urgently.

The related data collection and processing in the application can obtain the informed consent or independent consent of the personal information body according to the requirements of the relevant national laws and regulations when the example is applied, and develop the subsequent data use and processing behaviors within the authorized range of the laws and regulations and the personal information body.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. The application scenario at least includes the terminal device 110 and the server 120. The number of the terminal devices 110 may be one or more, and the number of the servers 120 may be one or more, and the number of the terminal devices 110 and the servers 120 is not particularly limited in the present application.

In the embodiment of the present application, the terminal device 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an internet of things device, a smart home appliance, a vehicle-mounted terminal, etc., but is not limited thereto.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform.

The terminal device 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

The database performance test method mentioned in the embodiment of the present application may be applied to the terminal device 110 or the server 120, which is not limited. Hereinafter, the application to the server 120 will be described by way of example only.

Referring to fig. 2, a flow chart of a database performance testing method provided in an embodiment of the present application is shown, and the method may be applied to a server, and the specific flow chart is as follows:

S201, based on at least one constant contained in each historical transaction in a designated time period in an original database, carrying out parameterization processing on each historical transaction respectively to obtain a corresponding transaction template.

In the embodiment of the application, each history transaction is formed by one or more SQL sentences, and constants such as numbers, character strings and the like contained in each SQL sentence are converted into parameters, so that one SQL sentence containing actual data is expressed as a parameterized SQL template.

Referring to fig. 3, a schematic diagram of a history transaction provided in the embodiment of the present application is shown, where the history transaction includes three SQL statements, in the three SQL statements, SQL statement 1 is used to obtain respective identifications (ids) and names (names) of records from records whose ages (ages) are less than 30 years old, SQL statement 2 is used to obtain records whose values of ids included in a T2 table are 1024, and SQL statement 2 is used to set values of attribute ball of records whose values of ids are 1024 in a T3 table to 0. Obviously, the constant contained in the SQL statement 1 is the value of age, the constant contained in the SQL statement 2 is the value of id, and the constants contained in the SQL statement 3 are the values of ball and id. The server replaces the value of the age in the SQL sentence 1 with the parameters P1 and 1, replaces the value of the id in the SQL sentence 2 with the parameters P2 and 1, and replaces the values of the ball and the id in the SQL sentence 3 with the parameters P3,1, P3 and 2 respectively to obtain a transaction template corresponding to the historical transaction.

Furthermore, the server can also utilize a hash algorithm to perform hash calculation on each transaction template respectively to obtain template fingerprints corresponding to each transaction template. In the embodiment of the present application, the hash algorithm is not limited. For example, still referring to fig. 3, the server takes each SQL statement in the transaction template after removing the replaced template parameters as an input of a hash function to obtain a template fingerprint 0x7DA30EB5 of the transaction template.

S202, extracting respective file structure information of each storage file associated with the target field from each obtained transaction template based on the set target field.

The storage file may be a data table stored in a database. For the data table, the set target field is a From field, and each storage file associated with the target field refers to each storage file pointed to by the From field.

Specifically, in the embodiment of the present application, when S202 is executed, library table access analysis and table structure extraction may be performed, which specifically includes the following steps:

The server extracts the file names contained in each transaction template from the obtained transaction templates based on the set target fields;

And the server initiates a file structure query request to the original database based on the obtained file names respectively to obtain the file structure information of the corresponding storage file.

In the embodiment of the application, for each transaction template, the service can respectively parse at least one SQL contained in the transaction template, extract the file name in the corresponding From clause, and take the extracted at least one file name as the at least one file name contained in the transaction template. In view of the possible use of table aliases in SQL, the extracted table aliases may be converted to full table names and then the full table names may be used as file names.

For example, referring to fig. 4, among three SQL sentences included in the transaction template, the file name in the From clause of SQL sentence 1 is T1, the file name in the From clause of SQL sentence 2 is T2, and the server extracts, from the obtained transaction template, that the file name included in the transaction template includes T1 and T2 based on the From field.

In the embodiment of the application, the file structure information comprises, but is not limited to, each attribute, a main key, the data type of each attribute and the like contained in the corresponding storage file. The data type of the attribute includes, but is not limited to, characters, values, and the like.

The file structure query request may use SHOW CREATE TABLE 'xxx', xxx as a file name (i.e. a table name), after each file name is obtained by the server, the file structure query request may respectively obtain table building sentences corresponding to T1 and T2 from the original database SHOW CREATE TABLE 'T1', SHOW CREATE TABLE 'T2', and then obtain file structure information corresponding to T1 and T2 according to the table building sentences corresponding to T1 and T2.

Referring to fig. 5, in the file structure information of T1, it is described that T1 includes three attributes name, ball, triangle, a main key is name, characters in 20 bits of name are ball, triangle, no name, ball, triangle can be null, and if no value of name is set, name is automatically interpolated, and if no value of ball, triangle is set, default value 0 is adopted.

Further, after each transaction template is obtained, the server may further perform joint column extraction, specifically, the server performs joint column extraction on each obtained transaction template respectively, to obtain at least one corresponding joint column.

Wherein, the joint column can contain one or more attributes, and the attributes in the joint column are used for representing the data screening range. A federated column may specifically be a group of column tuples in WHERE clauses (one or more attributes in a WHERE clause may be referred to as a column tuple of the WHERE clause).

For each transaction template, the server may parse at least one SQL contained in the transaction template, extract a column tuple in the corresponding WHERE clause, and use the extracted at least one column tuple as at least one joint column corresponding to the transaction template.

It should be noted that, because the same name attribute may exist in different storage files, in the embodiment of the present application, a table name and an attribute name are used to represent an attribute for convenience of distinction.

For example, referring to fig. 6, the attribute included in the WHERE clause of the SQL statement 1 is t1.Age and t1.Sex, the attribute included in the WHERE clause of the SQL statement 2 is t2.Id, the attribute included in the WHERE clause of the SQL statement 3 is t3.Id and t3.Ball, the server parses the SQL statement 1, extracts the column tuples t1.Age and t1.Sex in the WHERE clause, parses the SQL statement 2, extracts the column tuple t2.Id in the WHERE clause, parses the SQL statement 3, extracts the column tuples t3.Id and t3.Ball in the WHERE the extracted three column tuples are used as three joint columns corresponding to the transaction template.

In some implementations, the server may also perform sub-load extraction. Specifically, the server sorts the transaction templates based on the execution start time corresponding to each transaction template to obtain a transaction template sequence, then divides each transaction template in the transaction template sequence according to a set time interval to obtain a plurality of groups of transaction templates, and further clusters the plurality of groups of transaction templates based on the category characteristics corresponding to each of the plurality of groups of transaction templates to obtain at least one group of merged transaction templates.

As a possible case, if only one thread exists, the server may directly sort each transaction template based on the execution start time corresponding to each transaction template, to obtain a transaction template sequence, then divide each transaction template in the transaction template sequence according to a set time interval, to obtain multiple groups of transaction templates, and further cluster the multiple groups of transaction templates based on the class features corresponding to each of the multiple groups of transaction templates, to obtain at least one group of merged transaction templates.

As another possible case, if there are multiple threads, the server may also obtain a thread model of each thread first, and specifically, the server divides each historical transaction according to the threads according to a thread identifier (such as an ID) field in the audit log, to obtain one or more historical transactions corresponding to each thread. Wherein, each history transaction can form a transaction sequence, and each history transaction in the transaction sequence can be ordered according to the corresponding processing starting time of each history transaction. Each thread and its corresponding transaction sequence may be referred to as its corresponding thread model.

After obtaining the transaction sequences corresponding to the threads, for each thread, the server may obtain a transaction template sequence corresponding to the thread based on at least one transaction template corresponding to each historical transaction included in the transaction subsequence corresponding to each thread.

For each thread, the server may divide each transaction template in the transaction template sequence corresponding to the thread according to a set time interval to obtain multiple groups of transaction templates corresponding to the thread, where each group of transaction templates includes at least one transaction template, and then cluster the multiple groups of transaction templates based on respective corresponding category features of the multiple groups of transaction templates to obtain at least one group of merged transaction templates.

As a possible implementation manner, based on the category characteristics corresponding to each of the plurality of groups of transaction templates, the plurality of groups of transaction templates are clustered to obtain at least one group of merged transaction templates, which may be, but is not limited to, the following manners:

the server performs at least one merging process for a plurality of groups of transaction templates, and performs the following operations in each merging process:

Acquiring current transaction templates of each group, determining two groups of transaction templates which accord with preset merging conditions in the current transaction templates of each group based on respective corresponding category characteristics of the current transaction templates, and merging the two groups of transaction templates to acquire a new transaction template of each group;

based on at least one transaction template contained in each of the two groups of transaction templates, category characteristics corresponding to the new group of transaction templates are obtained.

In the embodiment of the application, in order to improve the merging efficiency and accuracy, in each merging process, it may be determined whether the two adjacent transaction templates meet the preset merging condition or not for each two adjacent transaction templates, and then merge the two adjacent transaction templates meeting the preset merging condition.

The arrangement sequence among the transaction templates of each group is determined according to the execution starting time corresponding to each transaction template.

For example, referring to fig. 6, W represents a total workload corresponding to one thread, and the transaction template sequence corresponding to the total workload is: A, A, B, B, B, C, where letters such as A, B, C denote different transaction templates, the transaction template sequence characterization corresponding to the total workload is performed A, A, B, B, B, C sequentially. Assuming that the set time interval is 25 seconds, firstly, the server divides each transaction template in the transaction template sequence according to the set time interval to obtain four groups of transaction templates, wherein the four groups of transaction templates are respectively SW1, SW2, SW3 and SW4, the transaction template sequence contained in SW1 is A, A, B, B, B, C, the QPS of SW1 is 110, the time interval is 25 seconds, the transaction template sequence contained in SW2 is A, B, B, C, C, the QPS of SW2 is 100, the time interval is 25 seconds, the transaction template sequence contained in SW3 is C, D, C, A, B, C, the QPS of SW3 is 40, the time interval is 25 seconds, the transaction template sequence contained in SW4 is C, D, A, B, B, C, the QPS of SW4 is 150, and the time interval is 25 seconds.

Then, the server clusters four groups of transaction templates, supposing that the current transaction templates are SW1, SW2, SW3 and SW4, and combining the SW1 and the SW2 when the SW1 and the SW2 meet the preset combining conditions based on the respective corresponding category characteristics of the SW1, SW2, SW3 and SW4, so as to obtain a new group of transaction templates SW1& SW2, wherein the QPS of the SW1& SW2 is 105, the time interval is 50s, and the transaction template sequence contained in the SW1& SW2 is A, A, B, B, B, C, A, B, B, C, C.

After the first merging process, the current transaction templates of each group are SW1& SW2, SW3 and SW4, and it is assumed that, based on the category characteristics corresponding to each of SW1& SW2, SW3 and SW4, it is determined that SW1& SW2 and SW3 meet the preset merging condition, and the SW1& SW2 and SW3 are merged to obtain a new transaction template of each group SW1& SW2& SW3. After the second merging process, if two groups of transaction templates which meet the preset merging conditions do not exist, merging is stopped.

As one possible implementation manner, when at least one of the following conditions is met, it is determined that the two sets of transaction templates meet a preset merging condition:

And 1, the template similarity between two groups of transaction templates is smaller than a preset similarity threshold value.

Template similarity between two sets of transaction templates may be calculated according to, but is not limited to, a dynamic time warping (DYNAMIC TIME WRAPPING, DTW) algorithm. The preset similarity threshold may be, but is not limited to, 0.2.

As a possible implementation manner, the server obtains initial similarity between the two groups of transaction templates based on the fingerprints of each template contained in the two groups of transaction templates, and normalizes the initial similarity based on the number of the transaction templates contained in the two groups of transaction templates, so as to obtain template similarity between the two groups of transaction templates.

For two groups of transaction templates, the template fingerprint sequences of the two groups of transaction templates are respectively expressed by x and y, each template fingerprint sequence is composed of the template fingerprints of each transaction template in the group of transaction templates, and the lengths of the two groups of transaction templates are respectively expressed by m and n.

Illustratively, the initial similarity may be calculated using equation (1):

dtw [ i ] [ j ] =min (dtw [ i-1] [ j-1], dtw [ i ] [ j-1], dtw [ i-1] [ j ])+dj [ i ] [ j ] equation (1)

Wherein dtw [ i ] [ j ] represents the initial similarity between the two sets of transaction templates. The calculation mode of di j is that if x i is equal to y i, then di j is 1, otherwise 0. The template similarity Temp_diff is obtained by dividing dtw [ m ] [ n ] by max (m, n) for normalization. min represents a minimum value and max represents a maximum value.

And 2, performing a difference value of a plurality of evaluation values between two groups of transaction templates, wherein the difference value is smaller than a preset evaluation value threshold value. The execution amount evaluation value is used to characterize the number of SQL statements executed per unit time.

The execution amount evaluation value may employ, but is not limited to, a Query Per Second (QPS), and the preset evaluation value threshold may also be referred to as a preset QPS threshold. QPS is used to characterize the number of SQLs executed per unit time (e.g., per second). The QPS may be calculated by dividing the total number of SQL statements contained in each transaction template in the corresponding set of transaction templates by the total length of time for processing each transaction template in the set of transaction templates. The preset QPS threshold may be, but is not limited to, 20. For example, qps=w corresponds to the number of SQL statements executed by each transaction template in the transaction template sequence (10000)/(total processing duration of the transaction template sequence corresponding to W (100 s) =100).

That is, if the category characteristic is a template similarity between two sets of transaction templates, the server determines that two sets of transaction templates conforming to a preset merge condition exist in the current transaction templates when the template similarity between the two sets of transaction templates is smaller than a preset similarity threshold, if the category characteristic is an execution number evaluation value between the two sets of transaction templates, the server determines that two sets of transaction templates conforming to the preset merge condition exist in the current transaction templates when the category characteristic is a execution number evaluation value between the two sets of transaction templates and is smaller than the preset evaluation value threshold, and if the category characteristic comprises the template similarity between the two sets of transaction templates and the execution number evaluation value, the template similarity between the two sets of transaction templates is smaller than the preset similarity threshold, and the execution number evaluation value difference between the two sets of transaction templates is smaller than the preset evaluation value threshold, the server determines that the two sets of transaction templates conforming to the preset merge condition exist in the current transaction templates.

S203, obtaining a file generation sequence among the storage files based on the obtained file structure information, and obtaining numerical distribution conditions of the attributes contained in the storage files.

In some embodiments, when performing library table dependency analysis, each file structure information includes each attribute included in a corresponding storage file, and based on the obtained file structure information, a file generation sequence between the storage files is obtained, and specifically includes the following steps:

The server determines the file dependency relationship among the storage files based on the obtained attributes contained in the file structure information;

the server obtains a file generation order between each stored file based on the file dependency relationship.

In the embodiment of the application, the server can establish a directed acyclic graph (DAG graph) representing the storage files through external key dependency analysis, and topologically sequence the DAG graph to obtain the file generation sequence among the storage files.

For example, assuming that the table T4 contains three attributes name, ball, triangle, the primary key is name, the T2 contains three attributes id and name, the primary key is id, and the external key is name, in the DAG graph, T4 is a preamble of T2, and based on the DAG graph, the file generation sequence for obtaining T4 and T2 is first generation T4, and then regeneration T2.

In some embodiments, when data distribution sampling is performed, based on the obtained file structure information, obtaining a numerical distribution condition of each attribute contained in each storage file includes:

Extracting at least one joint column from each transaction template based on the set query field, wherein each joint column contains at least one attribute in one storage file queried by the corresponding transaction template;

Based on at least one joint column, respectively carrying out sampling inquiry on an original database according to a set sampling proportion to obtain a corresponding sampling result, and based on the corresponding sampling result, obtaining corresponding sampling statistical information;

When the value ranges corresponding to the attributes contained in the at least one joint column are obtained by combining the original database based on the extracted at least one joint column, the server can initiate an aggregation query statement to the original database for each joint column to obtain the value ranges corresponding to the attributes contained in the joint column.

Any one or more of the evaluation indexes such as the maximum value, the minimum value, the base number and the like of the attribute can be contained in the numerical range of the attribute. The base is the number of non-repeated values in the attribute, and the lower the base is, the more elements are repeated in the attribute column, the higher the base is, and the fewer the elements are repeated in the attribute column.

For example, referring still to fig. 5, domian Info is the overall distribution information of the attribute column of T1, each row in Domian Info represents a corresponding numerical range of an attribute, and each class in Domian Info represents a different evaluation index, where attr is the attribute column name, NULL is the number of NULL value tuples, dom is the value range of the attribute column, card is the cardinal number of the attribute column, and len is the average length of the attribute column. Taking the name attribute as an example, the number of NULL value tuples of the name is 0, the value range is within 2000 (because the data is too long and is shown by..in the figure), the radix is 200, and the average length is (4, 10).

As a possible implementation manner, the statistics of the sampling result may be, but are not limited to, a histogram, a kernel density estimation, a statistical learning method, a machine learning algorithm, and the like.

Taking a histogram as an example, referring to fig. 5, for a joint column (ball, triage) in T1, the server generates a random query SQL for T1 according to a set sampling ratio (1% of tuples in a sampling table) to sample and query an original database, obtains a sampling result of the joint column (ball, triage), obtains sampling statistics of the joint column (ball, triage) based on the sampling result of the joint column (ball, triage), and presents the sampling statistics of the joint column (ball, triage) in the form of a two-dimensional histogram, wherein the histogram comprises 5 high-frequency items and 10 bins, each high-frequency item comprises a value combination of ball and triage, and the prob line represents the frequency of the item or bin, and the cu. In the figure, each serial number 1-10 is a section number, and the section meaning is that after the high-frequency item is removed, 10 sections are uniformly divided in a two-dimensional space, and the dividing intervals are equal, namely, the equal-width histograms.

After obtaining the sampling statistics information corresponding to each of the at least one joint column and each obtained numerical range, the server obtains the numerical distribution situation of each attribute contained in each storage file based on the sampling statistics information corresponding to each of the at least one joint column and each obtained numerical range, that is, the numerical distribution situation of each attribute in the storage file contains the numerical range of the attribute and also contains the sampling statistics information of the attribute.

S204, constructing virtual synthesized files corresponding to the storage files based on file structure information and numerical distribution conditions corresponding to the storage files respectively, combining file generation sequences, and performing performance test based on the constructed virtual synthesized files to obtain performance test results.

Specifically, in the embodiment of the present application, based on the file structure information and the numerical distribution situation corresponding to each storage file, in combination with the file generation sequence, a virtual composite file corresponding to each storage file is constructed, which may include the following steps:

The server sequentially constructs virtual synthesized files corresponding to the storage files according to the file generation sequence, wherein in each construction process, the following operations are respectively executed:

the server determines all attributes contained in a virtual synthesized file based on file structure information of a storage file corresponding to the virtual synthesized file;

The server generates a value set of each attribute contained in the virtual synthetic file based on the attribute type of each attribute contained in the virtual synthetic file and the data generation strategy corresponding to the attribute type, wherein each data generation strategy is generated based on the numerical distribution condition of the corresponding attribute.

For example, the server may perform breadth-first searching on the DAG graph, record file nodes of each layer, and file nodes at the same layer, thereby generating a file generation order.

Referring to fig. 7, it is assumed that the storage file includes tables 1 to 8, table 2 and table 5 are generated in accordance with table 1, table 6 is generated in accordance with table 5, table 7 is generated in accordance with table 6, table 3 is not dependent on table 1, table 2, table 5, table 6 and table 7, table 4 is generated in accordance with table 3, and table 8 is generated in accordance with table 4, and therefore, in the file generation sequence, table 1, table 5, table 6 and table 7 are sequentially generated, table 2 is generated after table 1, and table 3, table 4 and table 8 are sequentially generated.

Assume that the server sequentially constructs virtual composite files corresponding to the storage files in the order of T1, T2 and T3. Firstly, a server constructs a virtual composite file of T1, specifically, file structure information of the server T1, determines attributes id, name, age contained in the virtual composite file of T1, and assigns values to id, name, age columns respectively based on respective attribute types of id, name, age in T1 and respective corresponding data generation strategies of respective attribute types of id, name, age, so as to generate respective value sets of id, name, age.

Secondly, the server constructs a virtual composite file of the T2, specifically, file structure information of the server T2, determines an attribute id contained in the virtual composite file of the T2, assigns an id column based on an attribute type of the id in the T2 and a data generation strategy corresponding to the attribute type of the id, and generates a value set of each of the id and the ball.

And finally, constructing a virtual composite file of the T3 by the server, specifically, determining the attributes contained in the virtual composite file of the T3 by the file structure information of the server T3, and respectively assigning values to the id column and the ball column based on the respective attribute types of the id and the ball in the T3 and the respective corresponding data generation strategies of the attribute types of the id and the ball to generate respective value sets of the id and the ball.

In some embodiments, based on respective attribute types of each attribute included in a virtual composite file, a value set of each attribute included in the virtual composite file is generated in combination with a data generation policy corresponding to the attribute types, and specifically includes the following steps:

the server performs the following operations for each attribute contained in one virtual composite file:

if one attribute is a primary key, adopting each non-repeated value as a value set of the attribute;

if one attribute is an external key, determining a preamble file of a storage file corresponding to a virtual synthesized file from all storage files based on the file generation sequence, and generating a value set of the attribute based on the numerical value distribution condition of one attribute in the preamble file;

If one attribute does not belong to the primary key or the foreign key, generating a value set of the attribute based on the value distribution condition of the attribute in the storage file corresponding to one virtual synthetic file.

In some embodiments, for the case that the attribute does not belong to the primary key or the foreign key, the server may further generate a value set of the attribute according to sampling statistical information in the numerical distribution case of the attribute in the storage file corresponding to one virtual composite file if the attribute belongs to the joint column.

If an attribute belongs to a common column (not a primary key, an external key or a joint column), the server can generate a value set of the attribute according to a value range in the value distribution condition of the attribute in a storage file corresponding to a virtual composite file.

In the embodiment of the application, for a main key column, a generating module maintains a self-increment ID counter to output a main key ID for each tuple, for an external key column, a key value filling is randomly selected from a corresponding main key column of a dependency table because a generating strategy ensures that the table corresponding to the dependency main key has been generated, for a joint column, a data point is randomly sampled in a multi-dimensional histogram of the joint column to be used as filling data of the joint column, and for a common column, the key value filling is randomly generated based on a value field and base information recorded in a Domain Info table.

In some embodiments, when the value set of the common column is generated, different generation strategies can be designed according to different numerical types of the value set. Specifically, for the attribute of the numeric type, the value can be randomly generated within the range of the value range under the condition of guaranteeing the base constraint, for the attribute of the date type, the random number is converted into the Unix timestamp, the Unix timestamp is converted into data in the date format, the generated data in the date format is used as the value, for the attribute column of the character string type, if the attribute column belongs to the joint column, a prefix tree is used for generating the character string consisting of 0' and 1 under the base constraint and the average length constraint, for example, the base of the attribute column is 5, the average length is 7, the first four bits of the generated character string are 0000, the last three bits are 000, 001, 010, 011 and 101, if the attribute column does not belong to the joint column, one length is randomly selected within the length range, and the character string with one length is randomly selected from the 26-bit alphabet as the generated character string.

In some embodiments, the performance test may be a performance simulation test or a performance press test, specifically, the performance test is performed based on each constructed virtual composite file to obtain a performance test result, which specifically includes the following steps:

If the performance test is a performance simulation test, the server performs the performance test by adopting each constructed virtual synthetic file based on the performance constraint parameter set of each historical transaction in the original database to obtain a performance test result;

If the performance test is performance press test, the server performs multiple rounds of performance tests based on each constructed virtual synthetic file to obtain a performance test result.

The performance simulation test scene and the performance pressure test scene are described below.

The purpose of the performance simulation is to simulate a performance curve that is highly similar to the original load. In the scenario of performance simulation testing, the load of the simulation test (i.e. the transaction sequence used for testing) may use the original load sequence (i.e. the transaction sequence formed by each history transaction), or may use the obtained instantiation transaction by instantiating according to each transaction template.

Specifically, as one possible implementation manner, the server may respectively instantiate each transaction template based on the historical execution results corresponding to each transaction template, obtain corresponding instantiated transactions, and obtain performance test results based on each obtained instantiated transaction and each constructed virtual synthesis file.

The history execution result may include one or more of an actual number of return lines or an influence line number after the corresponding history transaction is executed. During instantiation, the server can search in sampling statistical information (such as a two-dimensional histogram of the joint column) of the joint column according to the actual return line number or the influence line number after the SQL statement recorded in the audit log is executed, select the value of the tuple meeting the set line number range, and fill the position corresponding to the joint column in the transaction template. The tuples satisfying the set number of rows may be such that the amount of data in the tuple is within 5% of the number of return rows or the number of influencing rows. For example, the number of actual return lines of the SQL sentence is 10, the joint columns contained in the server transaction template are (ball, triage), the histogram of (ball, triage) is searched, and the value (1, 3) of (ball, triage) is selected.

After the load for performing the simulation test is obtained, the performance test result is obtained by combining the obtained instantiation transactions and the virtual synthesis files based on the performance constraint parameter set of the historical transactions in the original database.

In some embodiments, as shown with reference to table 1, performance metrics include, but are not limited to, one or more of server central processing unit (Central Processing Unit, CPU) usage, server resident memory set (RESIDENT SET Size, RSS) RSS memory usage, server acceptance traffic, server occurrence traffic, disk read data volume, disk write data volume, number of runs, number of slow queries, innoDB cache hit rate, etc.

TABLE 1 Performance evaluation index

In some embodiments, the set of performance constraint parameters may comprise one or more performance constraint parameters including, but not limited to, one or more of a number of transmission control protocol (Transmission Control Protocol, TCP) connections, QPS, etc. of the original load.

For example, the server may perform the load in terms of the number of TCP connections, QPS constraints of the original load. Specifically, the server analyzes the thread number and the thread start time in the audit log, sorts the threads, calculates the interval time of creating connection of different threads, and establishes TCP connection on the test instance according to the same thread creation sequence and time interval when performing simulation test. Then, for each thread in each audit log, the server calculates the QPS for that thread and uses a restrictor to limit the execution rate of the transaction in each test case to be the same as the QPS of the original load.

In the simulation test, the server may also obtain the load simulation similarity based on the performance evaluation index set included in the performance test result.

By way of example, the load simulation similarity can be obtained by calculating the time series similarity of each performance evaluation index under the original load and the composite load. For each performance evaluation index, it can be calculated from two dimensions of shape similarity and value deviation. The shape similarity can be represented by normalized cross-correlation coefficients, and the value deviation can be the Euclidean distance after alignment.

The Shape similarity may be calculated by, but not limited to, a Shape-based Distance (SBD) algorithm based on the sequence morphology.

Illustratively, the shape similarity may be calculated using equations (2) to (5):

wherein, adopt AndTime series and sequence of performance evaluation index under original load and synthetic loadAndAre all m in length, in formula (2)To be fixed bySliding movementThe inner product of the two vectors after step k, in equation (3)For the co-correlation coefficient, the subscript w represents the step size and ranges from 1 to 2m-1, in formula (4)Is a normalized co-correlation coefficient, in equation (5)Is the shape similarity.

The time series similarity score can be calculated by using formula (6):

Where α and β are coefficients that trade off value bias and shape similarity, and exemplary values for α and β are 0.2 and 0.8, respectively.

In the performance pressure test scene, the load for simulation test can be instantiated according to TOP transaction templates in each transaction template, and each obtained instantiated transaction can be also another transaction template corresponding to the instantiated sub-load according to the configuration sub-load.

Specifically, as a possible implementation manner, based on the historical transaction quantity corresponding to each transaction template, ordering each transaction template, selecting a proportion according to a set, extracting each target template from each transaction template according to an ordering result, and instantiating each target template to obtain each instantiated transaction.

That is, in the embodiment of the present application, the TOP TOP transaction templates are instantiated. The motivation for instantiating the TOP templates is derived from the observation of online examples, most of the transactions have the same transaction templates, namely the transaction template distribution has long tail distribution characteristics, so that the instantiation TOP templates can accelerate the generation process under the condition that the generation load has high similarity with the original load. The distribution characteristics of the number of transaction template examples are shown in fig. 8, ND in fig. 8 represents number distribution, CFD represents accumulated frequency distribution, CND represents accumulated number distribution, fig. 8 includes 9 groups of distribution characteristic diagrams, 3 groups of distribution characteristic diagrams in the first row are distribution characteristic diagrams corresponding to ND, 3 groups of distribution characteristic diagrams in the second row are distribution characteristic diagrams corresponding to CFD, 3 groups of distribution characteristic diagrams in the third row are distribution characteristic diagrams corresponding to CFD, the ordinate in each group of distribution characteristic diagrams represents the number of transaction template examples, 3 groups of distribution characteristic diagrams in the first column are distribution characteristics of the number of transaction template examples when all transaction templates are instantiated, 3 groups of distribution characteristic diagrams in the second column are distribution characteristics of the number of transaction template examples when 100 transaction templates are instantiated, and 3 groups of distribution characteristic diagrams in the third column are distribution characteristics of the number of transaction template examples when 20 transaction templates are instantiated.

As another possible implementation manner, based on the target sub-load configured in the configuration file, in combination with the corresponding relationship between the target sub-load and the target template, each target template corresponding to the target sub-load is obtained from each transaction template, and each target template is instantiated, so as to obtain each instantiation transaction.

The corresponding relation between the target sub-load and the target template can be determined according to the corresponding relation between at least one group of transaction templates after the merging of the fingers and each sub-load.

For example, SW1, SW2, and SW3 correspond to one set of transaction templates, and SW4 corresponds to another set of transaction templates, and assuming that the target sub-load configured in the configuration file is SW4, each target template corresponding to SW4 is obtained from each transaction template based on each target template corresponding to SW4, and each target template is instantiated, so as to obtain each instantiated transaction.

It should be noted that, in the embodiment of the present application, the transaction template instantiation may also be performed using other generative models, including but not limited to rule-based SQL generation, reinforcement learning-based SQL generation, and the like.

In the pressure test scene, considering that each round of test can cause certain offset of data, the offset is larger, and the pressure test result is influenced, so that data recovery needs to be carried out at a proper time.

For example, the performance test result may include the returned line number or the affected line number of the SELECT statement and the DML statement, and of course, the performance test result may also include the returned line number or the affected line number of the SELECT statement, and the performance test result may also include the returned line number or the affected line number of the DML statement. In the process of determining that data offset occurs based on the reference performance test result and the current performance test result, the server may count the returned line number or the affected line number of the SELECT statement and the DML statement (including UPDATE, INSERT, DELETE) in the current round, then compare the returned line number or the affected line number with the returned line number or the affected line number in the reference performance test result obtained by the first pressure test, and determine that data offset occurs when the line number difference is greater than a preset line number threshold.

In some embodiments, data recovery may be performed using at least one of:

In the first way, data recovery flashes back (binlog flashback). Specifically, the server analyzes the binary log file (binlog), generates a corresponding reverse SQL sentence, and executes the reverse SQL sentence. The method belongs to a local recovery method, and is suitable for recovering a small amount of DML statement load or a table with larger data quantity.

In the second mode import tablespace. Specifically, the server replaces each virtual composite file after the pressure measurement with an initial database table file (each virtual composite file before the pressure measurement) and reconstructs the map. The method belongs to a full-quantity recovery method, and is suitable for recovering a large quantity of DML statement loads or tables with smaller data quantity.

In some embodiments, the number of DML sentences and the storage files with data changes in each round of compression measurement may be recorded, if the data amount stored in the storage files with data changes exceeds a set data amount threshold (for example, the number of data lines exceeds 1000 ten thousand) or the number of DML sentences is lower than the set DML sentence threshold, local recovery is performed in the first mode, otherwise full recovery is performed directly in the second mode.

In some embodiments, for scenarios where real data may be used directly, the cloning task may be initiated from the cold standby node at the load start time, depending on the capabilities of the instance clone, cloning the backup into the test environment machine.

In some embodiments, a test case is created prior to performance testing. Specifically, the configuration specification (such as the number of CPU cores, the memory size, the disk capacity, and the database version) of the target instance may be queried in the cloud service management platform, so as to create test instance machines with the same specification. And then adopting the created test instance to perform performance test based on each constructed virtual synthesized file to obtain a performance test result.

In some embodiments, SQL precompilation may also be performed, specifically, each transaction template that is required for performance testing is precompiled, so that each transaction template in a precompilable form is obtained, for example, the parameter location in the transaction template is replaced by'.

In some embodiments, performance configuration is performed prior to performance testing. Specifically, in the configuration file, one or more of the information such as the QPS, the number of thread connections, the load multiplying power, the target sub-load and the like expected during the performance test is configured, so that the performance test is performed according to the specified performance configuration information in the performance test process. The configuration file may be determined from instance configuration information specifying the database instance.

In some embodiments, cache warm-up is required for the cache because the newly applied test case machine has not executed any SQL. Specifically, the server may randomly pick a transaction template for instantiation and execution, where the number of instantiated transactions may be 20% of the number of transactions executed at the time of formal testing.

Fig. 9 is a schematic diagram of a system architecture according to an embodiment of the application.

The system comprises an online environment, a service environment and a test environment. The online environment stores a designated database instance and an audit log in the database instance within a designated time period. Instance configuration information can be configured through an online environment, and then in a test environment, test instances with the same configuration are applied through a test module.

The service environment comprises a data analysis template and a load analysis template, wherein the data analysis template is used for inquiring and analyzing the library table structure, data distribution, access condition and library table dependence of the database instance according to each historical transaction in a specified time period. The load analysis template is used for pulling an audit log in the database instance in a specified time period to carry out load analysis (transaction parameterization, joint column extraction, thread model and sub-load extraction), and the thread model refers to dividing a transaction sequence from a thread ID field in the audit log according to threads, and all transactions under the same thread ID are ordered according to the starting time.

The test environment comprises a data generation module, a load generation module and a test module. The data generation module is used for constraint generation and cloning of the instance, and the load generation module is used for original transaction playback and transaction template instantiation. The constraint generation is to construct virtual synthesized files corresponding to each storage file based on file structure information and numerical distribution conditions corresponding to each storage file respectively and combining file generation sequences. The test module is used for creating test cases, SQL precompilation, cache preheating and execution performance configuration.

The test environment also comprises a load simulation scene and a multi-wheel pressure test scene. In the load simulation scene, performance constraint execution and similarity calculation are carried out, and in the multi-wheel pressure measurement scene, data offset detection and data recovery are carried out. The performance constraint execution refers to performance test by adopting each constructed virtual synthetic file based on a performance constraint parameter set of each historical transaction in an original database, a performance test result is obtained, and similarity calculation is used for calculating load simulation similarity.

Referring to fig. 10, a flowchart of a database performance test provided in an embodiment of the application is shown, the flowchart includes the following steps:

First, a specified database instance and a specified time period are acquired. The designated database instance may be identified with an ID.

The method comprises the steps of obtaining instance configuration information of a designated database instance, configuring execution performance of a test instance according to the instance configuration information, pulling an audit log in the database instance in a designated time period, parameterizing each historical transaction in the audit log aiming at the pulled audit log to obtain each transaction template, extracting a corresponding joint column from each transaction template, constructing a thread model and extracting a sub-load, further judging whether to play back the transaction, if so, carrying out transaction template instantiation, otherwise adopting an original transaction, further carrying out SQL precompilation and cache preheating, extracting respective file structure information of each storage file and file generation sequence, further judging whether to use the original data, if so, generating test data through cloning the instance, and if not, obtaining respective numerical distribution conditions of each storage file through dynamic adoption.

And then judging whether to perform simulation test, if so, performing performance test based on performance constraint, and calculating simulation performance identity. And if the simulation test is not performed, performing performance pressure test, performing multi-round performance test based on test data to obtain performance test results, performing data offset detection after each round of pressure test is finished, and performing data recovery when the data recovery is required.

The present application will be described with reference to specific examples.

(1) And (3) verifying the large version upgrade of the database, namely when the version upgrade of the database is required, performing playback test of an original load on a high-version test environment, comparing a performance test result with the original result, and judging whether serious performance problems or faults (BUGs) occur after the version upgrade according to the comparison result.

Specifically, when the version of the database is updated, a synthetic database can be constructed according to each historical transaction in a specified time period in the database under a low-version test environment, the synthetic database contains constructed virtual synthetic files, and performance simulation tests are performed based on each historical transaction and the synthetic database under a high-version test environment to obtain performance test results. The performance test result may include one or more performance evaluation indexes, and according to the time sequence similarity of each performance evaluation index, load simulation similarity is obtained, and further according to the load simulation similarity, whether serious performance problems occur or whether faults exist after version upgrading are judged.

(2) And (3) performing simulation test before service function online or upgrading in the application program, namely calling the service to be online or upgraded to perform playback test of an original load when the service function online or upgrading is required, comparing a performance test result with the original result, and judging whether the service function has a performance problem or failure according to the comparison result.

(3) The intelligent parameter tuning for the application program is applied to intelligent parameter tuning service in a specific application program, corresponding performance test results can be obtained in each round of pressure test process through multiple rounds of pressure test, and target parameters affecting the performance of the specific application program can be determined from candidate parameters of the application program according to the performance test results of each round, so that more personalized parameter configuration is carried out for the specific application program, and performance is improved.

(4) The method is more efficient in index recommendation, and is applied to index recommendation service, corresponding performance test results can be obtained in each round of pressure test process, indexes influencing data query performance are determined from the indexes according to the performance test results of each round, so that the recommendation process can be quickened, and the quality of the recommendation results is improved.

Based on the same inventive concept, the embodiment of the application provides a database performance testing device. As shown in fig. 11, which is a schematic structural diagram of the database performance testing apparatus 1100, may include:

A parameterizing unit 1101, configured to perform parameterization on each historical transaction based on at least one constant that each historical transaction in a specified time period includes in an original database, so as to obtain a corresponding transaction template;

A structure extracting unit 1102, configured to extract, based on a set target field, respective file structure information of each storage file associated with the target field from each obtained transaction template;

A distribution extraction unit 1103, configured to obtain a file generation order between the storage files, and obtain a numerical distribution situation of each attribute included in each storage file, based on the obtained file structure information;

and the test unit 1104 is used for constructing virtual synthesized files corresponding to the storage files respectively based on the file structure information and the numerical distribution situation corresponding to the storage files respectively and combining the file generation sequence, and performing performance test based on the constructed virtual synthesized files to obtain a performance test result.

As a possible implementation manner, the test unit 1104 is specifically configured to, when performing performance test based on each virtual composite file constructed to obtain a performance test result:

As a possible implementation manner, the performance test is performed by using each constructed virtual composite file based on the performance constraint parameter set of each historical transaction in the original database, and when a performance test result is obtained, the test unit 1104 is specifically configured to:

As a possible implementation manner, the test unit 1104 is specifically configured to, when performing multiple rounds of performance tests based on each constructed virtual composite file to obtain a performance test result:

As a possible implementation, the test unit 1104 is further configured to:

As a possible implementation manner, when the plurality of transaction templates are clustered based on the class features corresponding to the plurality of transaction templates, and at least one merged transaction template is obtained, the test unit 1104 is specifically configured to:

As a possible implementation manner, the test unit 1104 is specifically configured to, when determining that two groups of transaction templates that meet a preset merging condition exist in the current groups of transaction templates based on the category features corresponding to the current groups of transaction templates, where the two groups of transaction templates correspond to each other:

if the category characteristic is an execution statement number evaluation value between two groups of transaction templates, determining that two groups of transaction templates which meet preset merging conditions exist in each current group of transaction templates when the difference value of the execution statement number evaluation values between the two groups of transaction templates is smaller than a preset evaluation value threshold, wherein the execution statement number evaluation value is used for representing the number of Structured Query Language (SQL) statements executed in unit time.

As a possible implementation manner, when the set target field is used to extract, from the obtained transaction templates, respective file structure information of each storage file associated with the target field, the structure extracting unit 1102 is specifically configured to:

As a possible implementation manner, when obtaining the numerical distribution of each attribute included in each storage file based on the obtained file structure information, the distribution extracting unit 1103 is specifically configured to:

As a possible implementation manner, when the virtual composite file corresponding to each storage file is constructed based on the file structure information and the numerical distribution situation corresponding to each storage file and in combination with the file generation sequence, the test unit 1104 is specifically configured to:

As a possible implementation manner, when the generating a value set of each attribute included in the one virtual composite file based on each attribute type of each attribute included in the one virtual composite file and in combination with a data generating policy corresponding to the attribute type, the testing unit 1104 is specifically configured to:

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

The specific manner in which the respective units execute the requests in the apparatus of the above embodiment has been described in detail in the embodiment concerning the method, and will not be described in detail here.

In the embodiment of the application, the data can be constrained to be generated when the data is synthesized by analyzing and extracting the storage files accessed in the historical transaction, the file structure information and the numerical distribution condition of the storage files and the file generation sequence among the storage files, so that the data generation speed is increased, the data synthesis efficiency is improved, and the test efficiency of the database performance test is further improved.

Further, different generation strategies are set for the simulation scene and the pressure measurement scene, so that different test requirements are met, and the test efficiency is improved. Particularly, in the pressure measurement scene, a technical scheme of data offset detection is also provided, and a data recovery strategy is designed, so that multi-wheel pressure measurement can be automatically performed, the waiting time of data recovery is shortened, and the disk read-write loss of the test equipment is reduced.

Furthermore, considering that the database specification parameters used by different users are configured differently in the cloud database environment, performance test is difficult to ensure to be performed in the same software and hardware configuration environment as the users, in the embodiment of the application, the performance test can be automatically applied to a machine with the same specification as the users, and the work such as cache preheating and precompilation can be automatically performed to ensure the consistency with the machine environment of the users, thereby realizing the performance test of the load of the online transaction processing process (On-Line Transaction Processing, OLTP) under the cloud environment.

Furthermore, different types of sub-load sequences can be classified from the overall load through sub-load extraction technology, and test loads with different specifications can be flexibly generated by using sub-load assembly and execution performance configuration.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be referred to herein collectively as a "circuit," module "or" system.

Based on the same inventive concept, the embodiment of the application also provides electronic equipment. In one embodiment, the electronic device may be a server or a terminal device. Referring to fig. 12, which is a schematic diagram of one possible electronic device provided in an embodiment of the present application, fig. 12 shows an electronic device 1200 including a processor 1210 and a memory 1220.

The memory 1220 stores a computer program executable by the processor 1210, and the processor 1210 can execute the steps of the database performance test method by executing the instructions stored in the memory 1220.

The Memory 1220 may be a volatile Memory (RAM) such as a random-access Memory (RAM), a non-volatile Memory (ROM) such as a Read-Only Memory (ROM), a flash Memory (HDD) or a Solid State Disk (SSD), or the Memory 1220 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. Memory 1220 may also be a combination of the above.

Processor 1210 may include one or more central processing units (central processing unit, CPUs) or a digital processing unit or the like. Processor 1210 is configured to implement the database performance test method described above when executing the computer program stored in memory 1220.

In some embodiments, processor 1210 and memory 1220 may be implemented on the same chip, or in some embodiments they may be implemented separately on separate chips.

The specific connection medium between the processor 1210 and the memory 1220 is not limited in the embodiment of the present application. In the embodiment of the present application, the processor 1210 and the memory 1220 are connected by a bus, and the bus is depicted in fig. 12 by a thick line, and the connection manner between other components is only schematically illustrated, but not limited thereto. The buses may be divided into address buses, data buses, control buses, etc. For ease of description, only one thick line is depicted in fig. 12, but only one bus or one type of bus is not depicted.

Based on the same inventive concept, an embodiment of the present application provides a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of the above-mentioned database performance test method when the computer program is run on the electronic device. In some possible embodiments, aspects of the database performance testing method provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps of the database performance testing method described above, when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 2.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, RAM, ROM, erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disc read-only memory (Compact Disk Read Only Memory, CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a CD-ROM and comprise a computer program and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a computer program for use by or in connection with a command execution system, apparatus, or device.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A database performance testing method, comprising:

2. The method of claim 1, wherein performing a performance test based on each virtual composite file constructed to obtain a performance test result comprises:

3. The method of claim 2, wherein performing performance testing with each constructed virtual composite file based on the performance constraint parameter set of each historical transaction in the original database to obtain a performance test result comprises:

4. The method of claim 2, wherein performing multiple rounds of performance testing based on each virtual composite file constructed to obtain performance test results comprises:

5. The method of claim 2, wherein performing multiple rounds of performance testing based on each virtual composite file constructed to obtain performance test results comprises:

6. The method of claim 4, wherein the obtaining each target template corresponding to the target sub-load from each transaction template based on the target sub-load configured in the configuration file in combination with the correspondence between the target sub-load and each target template further comprises:

7. The method of claim 6, wherein clustering the plurality of sets of transaction templates based on respective corresponding class features of the plurality of sets of transaction templates to obtain the merged at least one set of transaction templates comprises:

8. The method of claim 7, wherein determining that there are two sets of transaction templates that meet a preset merge condition in the current sets of transaction templates based on respective corresponding class features of the current sets of transaction templates comprises:

9. The method according to any one of claims 1-8, wherein extracting, based on the set target field, respective file structure information of each storage file associated with the target field from each obtained transaction template, includes:

10. The method according to any one of claims 1 to 8, wherein each file structure information includes attributes included in a corresponding storage file, and wherein the obtaining a file generation order between the storage files based on the obtained file structure information includes:

11. The method according to any one of claims 1 to 8, wherein obtaining, based on the obtained file structure information, a numerical distribution of each attribute contained in each of the storage files includes:

12. The method as claimed in any one of claims 1 to 8, wherein said constructing, in combination with the file generation order, the virtual composite file corresponding to each of the storage files based on the file structure information and the numerical distribution situation corresponding to each of the storage files, includes:

13. The method as set forth in claim 12, wherein the generating the value set of each attribute included in the one virtual composite file based on each attribute type of each attribute included in the one virtual composite file in combination with the data generation policy corresponding to the attribute type includes:

14. The method of any one of claims 1-8, further comprising:

According to the configuration specification of the target instance, creating a test instance with the same specification as the target instance;

Performing performance test based on each constructed virtual synthesized file to obtain performance test results, wherein the performance test results comprise:

And adopting the test example, and performing performance test based on each constructed virtual synthesized file to obtain a performance test result.

15. A database performance testing apparatus, comprising:

16. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 14.

17. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to execute the steps of the method according to any one of claims 1-14 when said computer program is run on the electronic device.

18. A computer program product, characterized in that it comprises a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to carry out the steps of the method according to any one of claims 1-14.