[go: up one dir, main page]

CN109344201A - A system and method for evaluating database performance load based on machine learning - Google Patents

A system and method for evaluating database performance load based on machine learning Download PDF

Info

Publication number
CN109344201A
CN109344201A CN201811207264.9A CN201811207264A CN109344201A CN 109344201 A CN109344201 A CN 109344201A CN 201811207264 A CN201811207264 A CN 201811207264A CN 109344201 A CN109344201 A CN 109344201A
Authority
CN
China
Prior art keywords
data
model
training
feature
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811207264.9A
Other languages
Chinese (zh)
Inventor
张明明
钱琳
俞俊
朱广新
邵星星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information And Communication Branch Of Jiangsu Electric Power Co Ltd
NARI Group Corp
NARI Technology Co Ltd
Original Assignee
Information And Communication Branch Of Jiangsu Electric Power Co Ltd
NARI Group Corp
NARI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information And Communication Branch Of Jiangsu Electric Power Co Ltd, NARI Group Corp, NARI Technology Co Ltd filed Critical Information And Communication Branch Of Jiangsu Electric Power Co Ltd
Priority to CN201811207264.9A priority Critical patent/CN109344201A/en
Publication of CN109344201A publication Critical patent/CN109344201A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种基于机器学习的数据库性能负载评估系统及评估方法,利用机器学习算法训练数据生成性能负载学习模型,经过数据加工,将这些海量数据作为训练集并运用机器学习技术加以训练,最终生成性能和负载学习模型,利用此模型对新产生的特征数据进行预测,评估数据库的性能和负载情况。一方面,该评估系统的结论比专家模型对特征的分析更加合理,更不会遗漏重要的性能负载指标,对数据库性能和负载的问题定位更加准确;另一方面,降低对数据库运维人员的知识和能力要求,可以极大节省人力成本,提供工作效率。

The invention provides a database performance load evaluation system and evaluation method based on machine learning. The machine learning algorithm training data is used to generate a performance load learning model. After data processing, these massive data are used as training sets and machine learning technology is used for training. Generate a performance and load learning model, use this model to predict the newly generated feature data, and evaluate the performance and load of the database. On the one hand, the conclusion of the evaluation system is more reasonable than the analysis of the characteristics of the expert model, and it will not miss important performance load indicators, and the problem location of database performance and load is more accurate; on the other hand, it reduces the need for database operation and maintenance personnel. Knowledge and ability requirements can greatly save labor costs and improve work efficiency.

Description

A kind of database performance load evaluation system and method based on machine learning
Technical field
The present invention is a kind of database performance load evaluation system and method based on machine learning, belongs to artificial intelligence machine Device learning areas is related to database O&M.
Background technique
Database application requires lower response time and high-performance, so the property of will do it before disposing database application It can load testing.But with the lasting operation of database application, user is more and more, and data volume is increasing, leads to data The performance in library and load increase, if being intervened not in time and being handled, may result in the different degrees of event such as system failure Barrier occurs.So the performance of real-time monitoring data and load become quite important now.
The performance load monitoring tools of database are very various at present, but these tools are generally only that simple displaying is some Crucial operating index can not really reflect the operation conditions of database sometimes.When high response and low performance occurs in database When situation, DBA analysis expert is usually still needed, these monitoring tools do not play the positioning and analysis of problem very big Effect.
Although performance expert model commonly used in the art and load expert model are for common performance load monitoring tool For, be greatly improved in terms of analytical database problem and positioning failure, but it there is also following some disadvantages:
1) expert model is the accumulation of advanced DBA knowledge and experience over more years, this working for database operation maintenance personnel Experience and it is proficient in degree and has very high requirement, virtually improves human cost.
2) expert model is in addition to rule definition, and there are also complicated scripts and code, this needs stable developer and fortune Troop is tieed up, has higher requirement to personnel's circulation.
3) expert model is the experience accumulation of advanced DBA, it defines whether really to be fitted for some indexs and regular The performance of database and load have uncertainty, because each DBA can some deviations for the understanding of index and rule.Such as There are the abnormal conditions of some small probabilities in fruit, and the index of expert model is not related to, then just needing advanced DBA from a large amount of Historical data in discovery influence performance and load factor, this process Wang Wang is very time-consuming.
In summary illustrate, the technical solution for needing one kind new is to solve the above problems.
Summary of the invention
Goal of the invention: the present invention discloses a kind of database performance load evaluation system based on machine learning and assessment side Method generates performance load learning model using machine learning algorithm training data, is subject to pre- assessment to new data using this model Estimate.On the one hand, the conclusion of the assessment system is more reasonable than analysis of the expert model to feature, will not more omit important performance Loading index, it is more accurate to position to the problem of database performance and load;On the other hand it reduces again to database operation maintenance personnel Knowledge and Capability Requirement, human cost can be greatly saved, working efficiency is provided.
Technical solution: in order to achieve the above objectives, the present invention is based on the database performance load evaluation systems of machine learning can It adopts the following technical scheme that
A kind of database performance load evaluation system based on machine learning, comprising:
Data acquisition module grabs characteristic to report to neutralize in database journal from database awr;
Data preprocessing module, to delete the single value tag in characteristic, missing feature is deleted, high correlation Feature is deleted, while to fill missing data, make data normalization;
Training data model module generates model to training data;
Model evaluation module collects assessment models using verifying to the evaluation index according to different machine learning models;
Model tuning module, to adjust the hyper parameter of model to model automated tuning;
Model prediction module is divided into offline prediction and on-line prediction to prediction model;Offline prediction refers to utilizing instruction Practice the test set that collection is separated;On-line prediction refers to the collected real time data of data collector, need by scaler into It is being predicted after row data normalization.
Further, in data preprocessing module, single value tag is deleted are as follows: when a certain column characteristic value is all identical, Directly delete;
It lacks feature to delete are as follows: then delete this feature column when characteristic series missing ratio reaches specific threshold;
High correlated characteristic is deleted are as follows: calculates the correlation between characteristic variable by Pearson correlation coefficient, building is related Relational matrix deletes one of feature when finding that feature correlation is higher than some threshold value;
Missing Data Filling includes: to go to fill with the previous row data of missing values;It goes to fill with the average value of missing characteristic series; Performance and load are A, B, C, D, E according to grade classification, fill missing values according to the average value of each grade.
Data normalization are as follows: scaling is normalized in initial characteristic data, using deviation standardization and standard difference standard Change;Scaler during data normalization is stored in local, needs to utilize the contracting when predicting freshly harvested data Put the data normalization that device carries out equal extent.
Further, in training data model module, data set is divided into training set, verifying collection and test set.Wherein instruct Practice collection, it is respectively 70%, 15%, 15% that verifying collection and test set, which account for the ratio of data set,;
This system uses Integrated Algorithm training pattern, and gradient is promoted tree algorithm GBM and got a promotion based on gradient descent algorithm Exponential model.Regression model uses training aids LGBMRegressor, and disaggregated model uses training aids LGBMClassifier.
Further, early stop technology and CheckPoint technology are used during assessment models;
Early stop: deconditioning in advance sees in discovery training process that the validation error for verifying collection will not become again When change, training can be automatically stopped after k wheel, and k is preset value;
CheckPoint technology: the technology can automatically save training pattern optimal in training process.
Further, the hyper parameter in model tuning module includes:
Iteration wheel number epochs
Learning rate learning_rate
Setting tree depth max_depth
Learner leaf maximum number max_leaves
The quantity number n_estimators of weak learner
Loss function objective.
The utility model has the advantages that the present invention is based on the database performance load evaluation systems of machine learning from the database of multiple examples Extract multiple characteristic indexs about performance and load in AWR report and database journal, by data mart modeling (data cleansing and Conversion), it is trained using these mass datas as training set and with machine learning techniques, ultimately generates performance and load is learned Model is practised, newly generated characteristic is predicted using this model, assesses the performance and loading condition of database.The assessment 4~6 or so, performance classification model and load are classified for the performance regression model of system and the root-mean-square error of load regression model Model accuracy rate reaches 99.3 or so, has reached production application requirement.The system is artificial intelligence in database application Primary innovation and application, reduce the human cost of enterprise, reach good economic benefit.
And the corresponding above-mentioned database performance load evaluation system based on machine learning, the present invention also provides be based on machine The technical solution of the database performance load evaluation method of study:
A kind of database performance load evaluation method based on machine learning, comprising the following steps:
(1), data acquisition: report to neutralize in database journal grabbing characteristic from database awr;
(2), data prediction: the single value tag in characteristic is deleted, and missing feature is deleted, high correlation feature It deletes, while to be filled missing data, make data normalization;
(3) training data generates model;
(4), assessment models: according to the evaluation index of different machine learning models, collect assessment models using verifying;
(5), the hyper parameter of model model tuning: is adjusted to model automated tuning;
(6), model prediction: it is divided into offline prediction and on-line prediction;Offline prediction refers to separating using training set Test set;On-line prediction refers to the collected real time data of data collector, needs to carry out data normalization by scaler It is being predicted afterwards.
In step (2), single value tag is deleted are as follows: when a certain column characteristic value is all identical, is directly deleted;
It lacks feature to delete are as follows: then delete this feature column when characteristic series missing ratio reaches specific threshold;
High correlated characteristic is deleted are as follows: calculates the correlation between characteristic variable by Pearson correlation coefficient, building is related Relational matrix deletes one of feature when finding that feature correlation is higher than some threshold value;
Missing Data Filling includes: to go to fill with the previous row data of missing values;It goes to fill with the average value of missing characteristic series; Performance and load are A, B, C, D, E according to grade classification, fill missing values according to the average value of each grade.
Data normalization are as follows: scaling is normalized in initial characteristic data, using deviation standardization and standard difference standard Change;Scaler during data normalization is stored in local, needs to utilize the contracting when predicting freshly harvested data Put the data normalization that device carries out equal extent.
In step (3), data set is divided into training set, verifying collection and test set.Wherein training set, verifying collection and test set The ratio for accounting for data set is respectively 70%, 15%, 15%;
This system uses Integrated Algorithm training pattern, and gradient is promoted tree algorithm GBM and got a promotion based on gradient descent algorithm Exponential model.Regression model uses training aids LGBMRegressor, and disaggregated model uses training aids LGBMClassifier.
In step (4), early stop technology and CheckPoint technology are used;
Early stop: deconditioning in advance sees in discovery training process that the validation error for verifying collection will not become again When change, training can be automatically stopped after k wheel, and k is preset value;
CheckPoint technology: the technology can automatically save training pattern optimal in training process.
Hyper parameter in step (5) includes:
Iteration wheel number epochs
Learning rate learning_rate
Setting tree depth max_depth
Learner leaf maximum number max_leaves
The quantity number n_estimators of weak learner
Loss function objective.
The utility model has the advantages that the beneficial effect of the corresponding above-mentioned database performance load evaluation system based on machine learning, this is negative It is more reasonable than analysis of the expert model to feature to carry the conclusion that appraisal procedure is made, more will not omit important performance load and refer to Mark, it is more accurate to position to the problem of database performance and load;On the other hand the knowledge to database operation maintenance personnel is reduced again And Capability Requirement, it can be greatly saved human cost, working efficiency is provided.
Detailed description of the invention
Fig. 1 is the flow chart of the database performance load evaluation method in this assessment system based on machine learning.
Specific embodiment
Incorporated by reference to shown in Fig. 1,
The present invention provides a kind of database performance load evaluation system based on machine learning, comprising:
Data acquisition module grabs characteristic to report to neutralize in database journal from database awr;
Data preprocessing module, to delete the single value tag in characteristic, missing feature is deleted, high correlation Feature is deleted, while to fill missing data, make data normalization;
Training data model module generates model to training data;
Model evaluation module collects assessment models using verifying to the evaluation index according to different machine learning models;
Model tuning module, to adjust the hyper parameter of model to model automated tuning;
Model prediction module is divided into offline prediction and on-line prediction to prediction model;Offline prediction refers to utilizing instruction Practice the test set that collection is separated;On-line prediction refers to the collected real time data of data collector, need by scaler into It is being predicted after row data normalization.
In data preprocessing module, single value tag is deleted are as follows: when a certain column characteristic value is all identical, is directly deleted;
It lacks feature to delete are as follows: then delete this feature column when characteristic series missing ratio reaches specific threshold;
High correlated characteristic is deleted are as follows: calculates the correlation between characteristic variable by Pearson correlation coefficient, building is related Relational matrix deletes one of feature when finding that feature correlation is higher than some threshold value;
Missing Data Filling includes: to go to fill with the previous row data of missing values;It goes to fill with the average value of missing characteristic series; Performance and load are A, B, C, D, E according to grade classification, fill missing values according to the average value of each grade.
Data normalization are as follows: scaling is normalized in initial characteristic data, using deviation standardization and standard difference standard Change;Scaler during data normalization is stored in local, needs to utilize the contracting when predicting freshly harvested data Put the data normalization that device carries out equal extent.
In training data model module, data set is divided into training set, verifying collection and test set.Wherein training set, verifying It is respectively 70%, 15%, 15% that collection and test set, which account for the ratio of data set,;
This system uses Integrated Algorithm training pattern, and gradient is promoted tree algorithm GBM and got a promotion based on gradient descent algorithm Exponential model.Regression model uses training aids LGBMRegressor, and disaggregated model uses training aids LGBMClassifier.
Early stop technology and CheckPoint technology are used during assessment models;
Early stop: deconditioning in advance sees in discovery training process that the validation error for verifying collection will not become again When change, training can be automatically stopped after k wheel, and k is preset value;
CheckPoint technology: the technology can automatically save training pattern optimal in training process.
Hyper parameter in model tuning module includes:
Iteration wheel number epochs
Learning rate learning_rate
Setting tree depth max_depth
Learner leaf maximum number max_leaves
The quantity number n_estimators of weak learner
Loss function objective
Please in conjunction with shown in Fig. 1, the database performance load evaluation method in this assessment system based on machine learning includes:
1, data set is obtained.
The performance load index with log log is reported using the awr of PostgreSQL database access oracle database, Data are grabbed to grab according to average 3 minutes.Characteristic in awr tables of data has 62 dimensions, and the characteristic in log sheet has Data in PostgreSQL tables of data are stored in this according to the sequencing of crawl time using python script by 200 dimensions In ground csv file.
2, data prediction.
Single value tag is deleted: when a certain column characteristic value is all identical, any help no to the fitting of model also increases Add calculation amount, can directly delete.
Missing feature is deleted: then being deleted this feature column when characteristic series missing ratio reaches specific threshold, is lacked in this system It loses threshold value and is set as 60%.
High correlated characteristic is deleted: being calculated the correlation between characteristic variable by Pearson correlation coefficient, is constructed related close It is matrix, deletes one of feature when finding that feature correlation is higher than some threshold value, the setting of this system relevance threshold It is 0.9.
Missing Data Filling: there is missing data in obtaining data engineering and be inevitable, we only go as far as possible It is fitted missing values.Three kinds of methods are used in this system:
It goes to fill with the previous row data of missing values.
It goes to fill with the average value of missing characteristic series.
Performance and load are A, B, C, D, E according to grade classification, fill missing values according to the average value of each grade.
Data normalization: scaling is normalized in initial characteristic data, this system uses two methods: deviation standardization (Min-max normalization) and standard deviation standardize (zero-mean normalization).Data normalization process In scaler be stored in local, need that the scaler is utilized to carry out equal extent when predicting freshly harvested data Data normalization.
In disaggregated model, need to divide performance rate and load etc. by section according to performance scores and load score Grade, specific as follows:
3, training data generates model
Data set is divided into training set, verifying collection and test set.Wherein training set, verifying collection and test set account for data set Ratio is respectively 70%, 15%, 15%.
This system uses Integrated Algorithm training pattern, and gradient is promoted tree algorithm GBM and got a promotion based on gradient descent algorithm Exponential model.Regression model uses training aids LGBMRegressor, and disaggregated model uses training aids LGBMClassifier.
4, assessment models
Early stop technology and CheckPoint technology are used during assessment models.
Early stop: deconditioning in advance sees in discovery training process that the validation error for verifying collection will not become again When change, training can be automatically stopped after k wheel, and k is settable.Invalid training process, training for promotion effect will not thus occur Rate.
CheckPoint technology: the technology can automatically save training pattern optimal in training process, even if being abnormal Training is caused to stop that training result can also be saved in time.
5, model tuning
Model tuning function adjusts the hyper parameter of model using hyopt, and major parameter includes:
Iteration wheel number epochs
Learning rate learning_rate
Setting tree depth max_depth
Learner leaf maximum number max_leaves
The quantity number n_estimators of weak learner
Loss function objective
6, model prediction
Offline prediction: being predicted in model after tuning using the data in training set, and
Prediction in real time: grabbing data from database in real time and predicted, shows prediction result and is inserted into the new data Into historical data, in case doing training set use when later period more new model.
In addition, there are many concrete methods of realizing and approach of the invention, the above is only a preferred embodiment of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, can also do Several improvements and modifications out, these modifications and embellishments should also be considered as the scope of protection of the present invention.What is be not known in the present embodiment is each The available prior art of component part is realized.

Claims (10)

1.一种基于机器学习的数据库性能负载评估系统,其特征包括:1. A system for evaluating database performance load based on machine learning, comprising: 数据获取模块,用以从数据库awr报告和数据库日志中抓取特征数据;Data acquisition module to capture feature data from database awr reports and database logs; 数据预处理模块,用以将特征数据中的单一值特征删除,缺失特征删除,高相关性特征删除,同时用以将缺失数据填充、使数据规范化;The data preprocessing module is used to delete single-value features, missing features, and high-correlation features in the feature data, and at the same time, it is used to fill in missing data and normalize data; 训练数据模型模块,用以训练数据生成模型;The training data model module is used to train the data generation model; 模型评估模块,用以根据不同的机器学习模型的评价指标,使用验证集评估模型;The model evaluation module is used to evaluate the model using the validation set according to the evaluation indicators of different machine learning models; 模型调优模块,用以对模型自动调优而调整模型的超参数;The model tuning module is used to automatically tune the model and adjust the hyperparameters of the model; 模型预测模块,用以预测模型,分为离线预测和在线预测;离线预测指的是利用训练集分离出来的测试集;在线预测是指数据采集器采集到的实时数据,需要经过缩放器进行数据规范化后在进行预测。The model prediction module is used to predict the model, which is divided into offline prediction and online prediction; offline prediction refers to the test set separated from the training set; online prediction refers to the real-time data collected by the data collector, which needs to be processed by the scaler. Prediction after normalization. 2.根据权利要求1所述的数据库性能负载评估系统,其特征在于,数据预处理模块中,单一值特征删除为:当某一列特征值全部相同时,直接删除;2. The database performance load evaluation system according to claim 1, wherein, in the data preprocessing module, the single-value feature deletion is: when all the feature values of a certain column are the same, directly delete; 缺失特征删除为:当特征列缺失比例达到特定阈值时则删除该特征列;The deletion of missing features is: when the missing ratio of the feature column reaches a certain threshold, the feature column is deleted; 高相关特征删除为:通过皮尔逊相关系数计算特征变量之间的相关性,构建相关关系矩阵,当发现特征相关性高于某个阈值时将其中一个特征删除;Deletion of highly correlated features is: Calculate the correlation between feature variables through the Pearson correlation coefficient, build a correlation matrix, and delete one of the features when the feature correlation is found to be higher than a certain threshold; 缺失值填充包括:用缺失值的前一行数据去填充;用缺失特征列的平均值去填充;性能和负载按照等级划分为A,B,C,D,E,根据各个等级的平均值填充缺失值。Filling of missing values includes: filling with the previous row of missing values; filling with the average value of missing feature columns; performance and load are divided into A, B, C, D, E according to grades, and filling missing according to the average value of each grade value. 数据规范化为:将原始特征数据进行归一化缩放,采用离差标准化和标准差标准化;数据规范化过程中的缩放器保存于本地,当对新采集的数据进行预测时需要利用该缩放器进行同等程度的数据规范化。Data normalization is: normalize and scale the original feature data, using dispersion normalization and standard deviation normalization; the scaler in the data normalization process is saved locally, and the scaler needs to be used to perform the same function when predicting the newly collected data. A degree of data normalization. 3.根据权利要求2所述的数据库性能负载评估系统,其特征在于:训练数据模型模块中,将数据集分为训练集,验证集和测试集。其中训练集,验证集和测试集占数据集的比例分别为70%,15%,15%;3. The database performance load evaluation system according to claim 2, wherein in the training data model module, the data set is divided into a training set, a verification set and a test set. The training set, validation set and test set account for 70%, 15% and 15% of the dataset respectively; 本系统采用集成算法训练模型,梯度提升树算法GBM基于梯度下降算法得到提升数模型。回归模型使用训练器LGBMRegressor,分类模型使用训练器LGBMClassifier。This system uses the ensemble algorithm to train the model, and the gradient boosting tree algorithm GBM obtains the boosted number model based on the gradient descent algorithm. The regression model uses the trainer LGBMRegressor, and the classification model uses the trainer LGBMClassifier. 4.根据权利要求3所述的数据库性能负载评估系统,其特征在于:在评估模型过程中使用early stop技术和CheckPoint技术;4. database performance load evaluation system according to claim 3, is characterized in that: use early stop technology and CheckPoint technology in evaluating model process; early stop:提前停止训练,看发现训练过程中验证集的验证误差不会再发生变化时,训练会在k轮之后自动停止,k为预设值;early stop: stop training in advance, and see that when the validation error of the validation set will not change during the training process, the training will automatically stop after k rounds, and k is the default value; CheckPoint技术:该技术会自动保存训练过程中最优的训练模型。CheckPoint technology: This technology automatically saves the optimal training model during the training process. 5.根据权利要求4所述的数据库性能负载评估系统,其特征在于,模型调优模块中的超参数包括:5. The database performance load evaluation system according to claim 4, wherein the hyperparameters in the model tuning module comprise: 迭代轮数epochsIteration rounds epochs 学习率learning_ratelearning rate learning_rate 设置树深度max_depthSet tree depth max_depth 学习器叶子最大数目max_leavesThe maximum number of learner leaves max_leaves 弱学习器的数量数目n_estimatorsNumber of weak learners n_estimators 损失函数objective。Loss function objective. 6.一种基于机器学习的数据库性能负载评估方法,其特征在于,包括以下步骤:6. A method for evaluating database performance load based on machine learning, comprising the following steps: (1)、数据获取:从数据库awr报告中和数据库日志中抓取特征数据;(1), data acquisition: grab characteristic data from the database awr report and the database log; (2)、数据预处理:将特征数据中的单一值特征删除,缺失特征删除,高相关性特征删除,同时用以将缺失数据填充、使数据规范化;(2) Data preprocessing: delete single-value features, missing features, and high-correlation features in the feature data, and at the same time, it is used to fill in missing data and normalize data; (3)、训练数据生成模型;(3), training data generation model; (4)、评估模型:根据不同的机器学习模型的评价指标,使用验证集评估模型;(4) Evaluation model: According to the evaluation indicators of different machine learning models, use the validation set to evaluate the model; (5)、模型调优:对模型自动调优而调整模型的超参数;(5) Model tuning: automatically tune the model and adjust the hyperparameters of the model; (6)、模型预测:分为离线预测和在线预测;离线预测指的是利用训练集分离出来的测试集;在线预测是指数据采集器采集到的实时数据,需要经过缩放器进行数据规范化后在进行预测。(6) Model prediction: divided into offline prediction and online prediction; offline prediction refers to the test set separated from the training set; online prediction refers to the real-time data collected by the data collector, which needs to be normalized by the scaler. making predictions. 7.根据权利要求6所述的数据库性能负载评估系统,其特征在于,步骤(2)中,单一值特征删除为:当某一列特征值全部相同时,直接删除;7. The database performance load evaluation system according to claim 6, wherein in step (2), the single value feature deletion is: when a certain column feature value is all the same, it is directly deleted; 缺失特征删除为:当特征列缺失比例达到特定阈值时则删除该特征列;The deletion of missing features is: when the missing ratio of the feature column reaches a certain threshold, the feature column is deleted; 高相关特征删除为:通过皮尔逊相关系数计算特征变量之间的相关性,构建相关关系矩阵,当发现特征相关性高于某个阈值时将其中一个特征删除;Deletion of highly correlated features is: Calculate the correlation between feature variables through the Pearson correlation coefficient, build a correlation matrix, and delete one of the features when the feature correlation is found to be higher than a certain threshold; 缺失值填充包括:用缺失值的前一行数据去填充;用缺失特征列的平均值去填充;性能和负载按照等级划分为A,B,C,D,E,根据各个等级的平均值填充缺失值。Filling of missing values includes: filling with the previous row of missing values; filling with the average value of missing feature columns; performance and load are divided into A, B, C, D, E according to grades, and filling missing according to the average value of each grade value. 数据规范化为:将原始特征数据进行归一化缩放,采用离差标准化和标准差标准化;数据规范化过程中的缩放器保存于本地,当对新采集的数据进行预测时需要利用该缩放器进行同等程度的数据规范化。Data normalization is: normalize and scale the original feature data, using dispersion normalization and standard deviation normalization; the scaler in the data normalization process is saved locally, and the scaler needs to be used to perform the same function when predicting the newly collected data. A degree of data normalization. 8.根据权利要求2所述的数据库性能负载评估系统,其特征在于:步骤(3)中,将数据集分为训练集,验证集和测试集。其中训练集,验证集和测试集占数据集的比例分别为70%,15%,15%;8. The database performance load evaluation system according to claim 2, wherein in step (3), the data set is divided into a training set, a verification set and a test set. The training set, validation set and test set account for 70%, 15% and 15% of the dataset respectively; 本系统采用集成算法训练模型,梯度提升树算法GBM基于梯度下降算法得到提升数模型。回归模型使用训练器LGBMRegressor,分类模型使用训练器LGBMClassifier。This system uses the ensemble algorithm to train the model, and the gradient boosting tree algorithm GBM obtains the boosted number model based on the gradient descent algorithm. The regression model uses the trainer LGBMRegressor, and the classification model uses the trainer LGBMClassifier. 9.根据权利要求3所述的数据库性能负载评估系统,其特征在于:步骤(4)中,使用early stop技术和CheckPoint技术;9. database performance load assessment system according to claim 3, is characterized in that: in step (4), use early stop technology and CheckPoint technology; early stop:提前停止训练,看发现训练过程中验证集的验证误差不会再发生变化时,训练会在k轮之后自动停止,k为预设值;early stop: stop training in advance, and see that when the validation error of the validation set will not change during the training process, the training will automatically stop after k rounds, and k is the default value; CheckPoint技术:该技术会自动保存训练过程中最优的训练模型。CheckPoint technology: This technology automatically saves the optimal training model during the training process. 10.根据权利要求4所述的数据库性能负载评估系统,其特征在于,步骤(5)中的超参数包括:10. The database performance load evaluation system according to claim 4, wherein the hyperparameter in step (5) comprises: 迭代轮数epochsIteration rounds epochs 学习率learning_ratelearning rate learning_rate 设置树深度max_depthSet tree depth max_depth 学习器叶子最大数目max_leavesThe maximum number of learner leaves max_leaves 弱学习器的数量数目n_estimatorsNumber of weak learners n_estimators 损失函数objective。Loss function objective.
CN201811207264.9A 2018-10-17 2018-10-17 A system and method for evaluating database performance load based on machine learning Pending CN109344201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811207264.9A CN109344201A (en) 2018-10-17 2018-10-17 A system and method for evaluating database performance load based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811207264.9A CN109344201A (en) 2018-10-17 2018-10-17 A system and method for evaluating database performance load based on machine learning

Publications (1)

Publication Number Publication Date
CN109344201A true CN109344201A (en) 2019-02-15

Family

ID=65310448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811207264.9A Pending CN109344201A (en) 2018-10-17 2018-10-17 A system and method for evaluating database performance load based on machine learning

Country Status (1)

Country Link
CN (1) CN109344201A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019151A (en) * 2019-04-11 2019-07-16 深圳市腾讯计算机系统有限公司 Database performance method of adjustment, device, equipment, system and storage medium
CN110134665A (en) * 2019-04-17 2019-08-16 北京百度网讯科技有限公司 Database self-learning optimization method and device based on traffic mirroring
CN110263939A (en) * 2019-06-24 2019-09-20 腾讯科技(深圳)有限公司 A kind of appraisal procedure, device, equipment and medium indicating learning model
CN110443304A (en) * 2019-08-06 2019-11-12 民生科技有限责任公司 A kind of business risk appraisal procedure based on machine learning model
CN110717535A (en) * 2019-09-30 2020-01-21 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
CN110750512A (en) * 2019-09-09 2020-02-04 北京新数科技有限公司 Database performance evaluation management method and device
CN111079361A (en) * 2019-12-07 2020-04-28 复旦大学 Load modeling method of FPGA circuit
CN112153636A (en) * 2020-10-29 2020-12-29 浙江鸿程计算机系统有限公司 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning
CN112628132A (en) * 2020-12-24 2021-04-09 上海大学 Water pump key index prediction method based on machine learning
CN113157814A (en) * 2021-01-29 2021-07-23 东北大学 Query-driven intelligent workload analysis method under relational database
CN113297169A (en) * 2021-02-26 2021-08-24 阿里云计算有限公司 Database instance processing method, system, device and storage medium
US11194785B2 (en) 2019-08-14 2021-12-07 International Business Machines Corporation Universal self-learning database recovery
CN114924943A (en) * 2022-05-27 2022-08-19 中国平安财产保险股份有限公司 Data middling station evaluation method based on artificial intelligence and related equipment
CN116113961A (en) * 2020-08-30 2023-05-12 惠普发展公司,有限责任合伙企业 Battery Life Prediction Using Machine Learning Models
CN116166513A (en) * 2023-01-30 2023-05-26 浪潮卓数大数据产业发展有限公司 An evaluation method, device and storage medium for database performance testing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106851A1 (en) * 2004-11-03 2006-05-18 Dba Infopower, Inc. Real-time database performance and availability monitoring method and system
US20080005317A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation Method and apparatus for cross-tier management in multi-tier computing system architecture
CN104361337A (en) * 2014-09-10 2015-02-18 苏州工业职业技术学院 Sparse kernel principal component analysis method based on constrained computation and storage space
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning
CN108388503A (en) * 2018-02-13 2018-08-10 中体彩科技发展有限公司 Data-base performance monitoring method, system, equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106851A1 (en) * 2004-11-03 2006-05-18 Dba Infopower, Inc. Real-time database performance and availability monitoring method and system
US20080005317A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation Method and apparatus for cross-tier management in multi-tier computing system architecture
CN104361337A (en) * 2014-09-10 2015-02-18 苏州工业职业技术学院 Sparse kernel principal component analysis method based on constrained computation and storage space
CN108388503A (en) * 2018-02-13 2018-08-10 中体彩科技发展有限公司 Data-base performance monitoring method, system, equipment and computer readable storage medium
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
优达学城UDACITY: "有了这个神器,机器学习特征选择不用愁!", 《HTTPS://M.VLAMBDA.COM/MIP/WZ_X8I6VU5KT7.HTML》 *
那伊抹微笑: "LightGBM中文文档", 《HTTPS://LIGHTGBM.APACHECN.ORG/#/DOCS/6》 *
韩忠明: "《数据分析与R》", 31 August 2014, 北京邮电大学出版社 *
马倩: "基于机器学习的电子商务平台重复购买客户预测", 《中国优秀硕士学位论文全文数据库经济与管理科学辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12287768B2 (en) 2019-04-11 2025-04-29 Tencent Technology (Shenzhen) Company Limited Database performance tuning method, apparatus, and system, device, and storage medium
WO2020207268A1 (en) * 2019-04-11 2020-10-15 腾讯科技(深圳)有限公司 Database performance adjustment method and apparatus, device, system, and storage medium
CN110019151B (en) * 2019-04-11 2024-03-15 深圳市腾讯计算机系统有限公司 Database performance adjustment method, device, equipment, system and storage medium
CN110019151A (en) * 2019-04-11 2019-07-16 深圳市腾讯计算机系统有限公司 Database performance method of adjustment, device, equipment, system and storage medium
CN110134665A (en) * 2019-04-17 2019-08-16 北京百度网讯科技有限公司 Database self-learning optimization method and device based on traffic mirroring
CN110263939A (en) * 2019-06-24 2019-09-20 腾讯科技(深圳)有限公司 A kind of appraisal procedure, device, equipment and medium indicating learning model
CN110443304A (en) * 2019-08-06 2019-11-12 民生科技有限责任公司 A kind of business risk appraisal procedure based on machine learning model
US11194785B2 (en) 2019-08-14 2021-12-07 International Business Machines Corporation Universal self-learning database recovery
CN110750512A (en) * 2019-09-09 2020-02-04 北京新数科技有限公司 Database performance evaluation management method and device
CN110717535A (en) * 2019-09-30 2020-01-21 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
CN111079361B (en) * 2019-12-07 2023-05-02 复旦大学 Load modeling method of FPGA circuit
CN111079361A (en) * 2019-12-07 2020-04-28 复旦大学 Load modeling method of FPGA circuit
CN116113961A (en) * 2020-08-30 2023-05-12 惠普发展公司,有限责任合伙企业 Battery Life Prediction Using Machine Learning Models
CN112153636A (en) * 2020-10-29 2020-12-29 浙江鸿程计算机系统有限公司 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning
CN112628132B (en) * 2020-12-24 2022-04-26 上海大学 Water pump key index prediction method based on machine learning
CN112628132A (en) * 2020-12-24 2021-04-09 上海大学 Water pump key index prediction method based on machine learning
CN113157814A (en) * 2021-01-29 2021-07-23 东北大学 Query-driven intelligent workload analysis method under relational database
CN113157814B (en) * 2021-01-29 2023-07-18 东北大学 Query-driven intelligent workload analysis method in relational database
CN113297169A (en) * 2021-02-26 2021-08-24 阿里云计算有限公司 Database instance processing method, system, device and storage medium
CN114924943A (en) * 2022-05-27 2022-08-19 中国平安财产保险股份有限公司 Data middling station evaluation method based on artificial intelligence and related equipment
CN116166513A (en) * 2023-01-30 2023-05-26 浪潮卓数大数据产业发展有限公司 An evaluation method, device and storage medium for database performance testing

Similar Documents

Publication Publication Date Title
CN109344201A (en) A system and method for evaluating database performance load based on machine learning
CN115063020B (en) Multi-dimensional safety scheduling device and method for cascade hydropower station based on risk monitoring fusion
CN111412579B (en) Air conditioning unit fault type diagnosis method and system based on big data
CN110148285A (en) A kind of oilwell parameter intelligent early-warning system and its method for early warning based on big data technology
CN113610381B (en) Water quality remote real-time monitoring system based on 5G network
US20130080117A1 (en) System and method for failure detection for artificial lift systems
GB2611727A (en) Automated feedback and continuous learning for query optimization
CN107480731A (en) A kind of EARLY RECOGNITION method of thermal power plant's automobile assembly welding Iine fault signature
CN117191147A (en) Flood discharge dam water level monitoring and early warning method and system
CN119006207B (en) An ecological assessment method and system for garden plant environmental monitoring
CN110889440A (en) Rockburst grade prediction method and system based on principal component analysis and BP neural network
CN105787283A (en) Earthen site monitoring data correcting and fitting method based on spatial and temporal correlation
CN116467923A (en) Beam pumping unit indicator diagram self-diagnosis and multi-objective optimization method
CN113807671A (en) Oil well supply and production matching degree quantitative evaluation method based on multi-source data
CN116451854A (en) Daily oil well yield prediction method based on time sequence method
CN120165385A (en) Wind farm optimization scheduling method, system, electronic equipment and storage medium based on dual power prediction system
CN119577641B (en) Drought influence prediction method and system based on hydrothermal abnormal characteristics
CN118657519B (en) Intelligent security district management and control system based on cloud platform
CN119379041A (en) An ecological restoration method for groundwater overexploitation guided by an ecohydrological model
CN114444616A (en) Sample library construction method, fault diagnosis method and device
CN109630092A (en) A kind of pumpingh well pump efficiency multi-model flexible measurement method based on data
CN114493915A (en) Intelligent diagnosis platform for production of oil field ground equipment
CN118095878A (en) Oil well blocking removal two-stage well selection method, device, computing equipment and storage medium
CN117536804A (en) A fan status monitoring method based on centralized control system
CN111144682A (en) Method for mining main influence factors of operation efficiency of power distribution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190215

RJ01 Rejection of invention patent application after publication