Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a database performance prediction method, which is used for collecting historical data of a plurality of performance indexes of a database, calculating marginal distribution functions of the performance indexes and joint distribution functions among the performance indexes based on the historical data of the performance indexes, predicting for a plurality of times based on the marginal distribution functions of the performance indexes and the joint distribution functions among the performance indexes, obtaining a predicted value of the overall performance indexes of the database each time, judging whether the database has faults according to the predicted values, and giving out fault early warning when the database has faults.
Fig. 1 schematically illustrates an application scenario diagram of a database performance prediction apparatus according to an embodiment of the present disclosure.
As shown in fig. 1, the network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as database applications like MySQL, may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for database applications browsed by users using the terminal devices 101, 102, 103. The background management server can analyze and process the received data such as the user request and the like, and feed back the processing result to the terminal equipment.
It should be noted that the database performance prediction method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the database performance prediction apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The database performance prediction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the database performance prediction apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that the method and apparatus for predicting database performance of the present disclosure may be used in the financial field, and may also be used in any field other than the financial field, and the application field of the method and apparatus for predicting database performance of the present disclosure is not limited.
The database performance prediction method of the disclosed embodiment will be described in detail with reference to fig. 2 to 5 based on the scenario described in fig. 1.
Fig. 2 schematically illustrates a flowchart of a database performance prediction method according to an embodiment of the present disclosure.
As shown in FIG. 2, the database performance prediction method of this embodiment includes operations S210-S250.
In operation S210, historical data of a plurality of performance indicators of a database is collected.
In operation S220, a marginal distribution function of each performance index and a joint distribution function between each performance index are calculated based on the history data of each performance index.
In operation S230, a plurality of predictions are performed based on the marginal distribution function of each performance index and the joint distribution function between each performance index, and each prediction obtains a predicted value of the overall performance index of the database.
In operation S240, it is determined whether the database is faulty according to the plurality of predicted values, and when the database is faulty, a fault warning is issued.
According to an embodiment of the present disclosure, the method includes determining a plurality of performance indicators that affect overall performance of a database, and sampling historical data of the respective performance indicators at predetermined time intervals at operation S210. The plurality of performance indexes affecting the overall performance of the database may be common performance indexes of the database, such as CPU utilization, memory utilization, database connection number, network IO, and the like.
According to the embodiment of the present disclosure, the operation S210 further includes analyzing each of the collected performance indexes in real time, and when some of the performance index data is abnormal, sending out alarm information.
According to an embodiment of the present disclosure, training a POT model of each performance index based on historical data of each performance index, respectively, using the trained POT model as a marginal distribution function, training a Copula model between each performance index based on the marginal distribution function of each performance index, and using the trained Copula model as a joint distribution function is included in operation S220.
According to embodiments of the present disclosure, database operations are greatly impacted and lost due to the occurrence of extreme events. To avoid such situations, a generalized pareto distribution model in extremum theory is employed to model the extreme cases that exist. For low-frequency high-loss events, POT models in extremum theory are generally selected for depiction.
A specific process for establishing a POT model corresponding to an individual performance index according to an embodiment of the present disclosure is shown in fig. 3. The POT model establishing method in the embodiment of the disclosure comprises operation S310-operation S330.
In operation S310, a POT model of individual performance indicators is constructed.
In accordance with an embodiment of the present disclosure, assuming that the individual performance index is X i, the historical data of the performance index is denoted as X i=(x1,x2,x3,…xn), where X i is a random variable sequence subject to an independent distribution of the distribution function F (X), and i is any positive integer. A sufficiently large random variable is selected from (x 1,x2,x3,…xn) as the threshold μ. Let Y i=Xi - μ, then Y i be the extreme statistic that exceeds the threshold, i.e., the excess, thus defining F u (Y) as a conditional distribution function of excess:
Fu(y)=P(X-u≤y∣X>u),y≥0; (1)
From the conditional probability formula, we can get
The resulting distribution function F (x) is:
F(x)=Fu(y)(1-F(u))+F(u); (3)
when μ is sufficiently large, F u (y) can be approximated by a generalized Pareto distribution, namely:
Where ζ is a shape parameter and σ is a scale parameter. When xi is more than or equal to 0, the generalized pareto distribution presents a thick tail characteristic, and when xi is less than 0, the generalized pareto distribution presents a short tail characteristic.
From equation (4), it can be known that, in the case of the known random variable sequence X i, to calculate F u (y), it is necessary to determine the sum of the threshold μ, the shape parameter ζ, and the scale parameter σ.
In operation S320, a threshold value of the POT model is determined.
According to the embodiment of the present disclosure, the selection of the threshold μ is a crucial step because if the threshold μ is selected too large, it causes too small an amount of data of the excess Y i, whereas if the threshold μ is selected too small, it is impossible to ensure the convergence of F u (Y). The estimation method of the estimation threshold μmay be Hill diagram method or average overstock diagram method.
In operation S330, shape parameters and scale parameters of the POT model are determined.
In accordance with embodiments of the present disclosure, there are a number of ways in which the shape parameter ζ and the scale parameter σ in the POT function may be estimated, such as a maximum likelihood estimation method, a moment estimation method, a probability weight estimation method, and so on, in the event that the threshold μ is determined. The maximum likelihood estimation method will be described as an example.
Deriving the formula (4) to obtain a probability density function as follows:
Its corresponding log likelihood function is:
The likelihood function expression is maximized to obtain estimated values of the shape parameter ζ and the scale parameter σ.
The solved thresholds, shape parameters and scale parameters are brought back into the POT model. And (3) bringing the calculated threshold value mu, the shape parameter xi and the scale parameter sigma into a formula (4), and obtaining F u (y) of the performance index X i, thus obtaining the marginal distribution function.
Because there is a correlation between the various performance indicators of the database, this correlation is not a simple sum. So to describe the relationship of the various performance indicators to each other, the Copula model is used to reveal the relationship well.
A specific building process of a Copula model between performance indicators according to an embodiment of the present disclosure is shown in fig. 4. The method for establishing the Copula model in the embodiment of the disclosure comprises operation S410-operation S420.
In operation S410, a Copula model between the respective performance indexes is constructed.
The disclosure builds a Copula model based on the Sklar theorem to build a joint distribution function, and related theorem and formula contents involved are described as follows:
The basic idea of the Copula model is to express the joint distribution function F (X 1,X2,…Xm) of the variables X 1,X2,…Xm with their edge distribution function F 1(X1),F2(X2),…Fm(Xm), namely:
F(X1,X2,…Xm)=C[F1(X1),F2(X2),…Fm(Xm)]; (7)
the function C is referred to herein as the Copula model of the marginal distribution function F 1(X1),F2(X2),…Fm(Xm).
In operation S420, parameters of the Copula model are determined.
According to the embodiment of the disclosure, after the Copula model is constructed, parameters of the Copula model need to be estimated. There are various ways to estimate parameters in the Copula model, such as one-step maximum likelihood estimation method, pseudo maximum likelihood estimation method, two-step maximum likelihood segment estimation method, non-parameter estimation method, etc., where two-step maximum likelihood segment estimation method is taken as an example for illustration.
The two-stage maximum likelihood segment estimation method is to divide the unknown parameters theta i and theta c of the model of the Copula and the unknown parameters theta 38335 as the distribution functions of the edges into two steps of estimation, namely:
The first step:
and a second step of:
According to the embodiment of the disclosure, for the constructed Copula model, verification can be performed based on a K-S verification method and a Q-Q graph method. And when the verification result is good, taking the constructed Copula model as a joint distribution function among all performance indexes of the disclosure.
According to an embodiment of the present disclosure, as shown in fig. 5, operation S230 includes operations S231 to S235.
In operation S231, a random number sequence compliant with the joint distribution function is generated, where the number of random numbers in the random number sequence is the same as the number of performance indexes. For example, each performance index is X 1,X2,X3,…Xm, then a pseudo-random number sequence is generated as a 1,a2,…,am, where the pseudo-random number sequences a 1,a2,…,am obey not only the joint distribution function established by the present disclosure, but also the (0, 1) uniform distribution.
And S232, obtaining predicted values of the performance indexes according to marginal distribution functions of the random numbers and the corresponding performance indexes in the random number sequence. Assuming that the marginal distribution function of each index is F 1(X1),F2(X2),…Fm(Xm), making a1=F1(X1),a2=F2(X2),…am=Fm(Xm), solve the predicted value of each performance indexI is an integer 1~m.
In operation S233, weights of the performance indexes are set, wherein the sum of the weights of the performance indexes is 1. Assume that each performance index weight is w i, an
In operation S234, the product of each performance index predicted value and the corresponding weight is accumulated to obtain the predicted value of the overall performance index. The formula of the overall performance index predicted value is
And S235, repeating the steps for a plurality of times to obtain predicted values of a plurality of overall performance indexes. Repeating operations S231-S234 for a plurality of times, and changing the value of the random number sequence when repeating each time, thereby obtaining a plurality of possibilities of the predicted value. The number of repetitions may be 1000 or more.
Operation S240 includes setting a confidence level, calculating a fault value at the confidence level based on predicted values of a plurality of overall performance indicators, and issuing a fault early warning when the fault value exceeds a predetermined threshold, according to an embodiment of the present disclosure. Confidence level refers to the probability that all predictors fall within a certain region. Assuming a confidence level of 95%, a fault value VAR of the overall indicator is derived using the formula P (z < VAR) =5%.
The method for predicting the performance of the database further comprises the step of predicting the running trend curve of the overall performance index based on the predicted values of the overall performance indexes.
According to the embodiment of the disclosure, the change trend of the overall performance of the database is predicted according to the following propset time sequence model, and the function formula of the trend curve is as follows:
y(t)=g(t)+f(t)+h(t)+εt; (12)
Where t is time, g (t) represents a trend function, and the aperiodic variation of the time series can be modeled, f (t) represents a periodic variation (e.g., daily, weekly), h (t) represents an unscheduled effect, and epsilon t the last term is simply the error term. By fitting the three terms, the overall performance running trend of the database can be obtained by finally accumulating the three terms.
According to the method, the edge distribution function of each commonly used performance index of the database is obtained through the POT model, the correlation among the performance indexes is obtained through the copula function, finally the overall performance of the database is predicted, trend changes of a period of time in the future such as three days or one week in the future can be predicted, operation and maintenance personnel are helped to deal with and deploy early, resource allocation is adjusted, operation and maintenance cost is effectively saved, and emergency time is shortened.
Based on the database performance prediction method, the disclosure also provides a database performance prediction device. The device will be described in detail below in connection with fig. 6.
As shown in fig. 6, the database performance prediction apparatus 600 of this embodiment includes a data collection module 610, a distribution function calculation module 620, an overall performance prediction module 630, and a failure early warning module 640.
The data collection module 610 is configured to collect historical data for a plurality of performance indicators of a database. In an embodiment, the data collection module 610 may be configured to perform the operation S210 described above, which is not described herein.
The distribution function calculation module 620 is configured to calculate a marginal distribution function of each performance index and a joint distribution function between each performance index based on the historical data of each performance index. In an embodiment, the distribution function calculation module 620 may be used to perform the operation S220 described above, which is not described herein.
The overall performance prediction module 630 is configured to obtain predicted values of a plurality of overall performance indexes of the database based on the marginal distribution function of each performance index and the joint distribution function between each performance index multiple times. In an embodiment, the overall performance prediction module 630 may be configured to perform the operation S230 described above, which is not described herein.
The fault early warning module 640 is configured to send out a fault early warning when the predicted value meets a preset condition. In an embodiment, the fault early warning module 640 may be configured to perform the operation S240 described above, which is not described herein.
Any of the data collection module 610, the distribution function calculation module 620, the overall performance prediction module 630, the fault pre-warning module 640 may be combined in one module to be implemented, or any of the modules may be split into multiple modules, according to embodiments of the present disclosure. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the data collection module 610, the distribution function calculation module 620, the overall performance prediction module 630, the fault pre-warning module 640 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the data collection module 610, the distribution function calculation module 620, the overall performance prediction module 630, the fault pre-warning module 640 may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a database performance prediction method according to an embodiment of the disclosure.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.
According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of an input portion 706 including a keyboard, mouse, etc., an output portion 707 including a Cathode Ray Tube (CRT), liquid Crystal Display (LCD), etc., and speaker, etc., a storage portion 708 including a hard disk, etc., and a communication portion 709 including a network interface card such as a LAN card, modem, etc., connected to the I/O interface 705. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
The present disclosure also provides a computer-readable storage medium that may be included in the apparatus/device/system described in the above embodiments, or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the item recommendation method provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may comprise program code that is transmitted using any appropriate network medium, including but not limited to wireless, wireline, etc., or any suitable combination of the preceding.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.