CN113672467B

CN113672467B - Operation and maintenance early warning method and device, electronic equipment and storage medium

Info

Publication number: CN113672467B
Application number: CN202110973497.5A
Authority: CN
Inventors: 韩思祺; 侯晓东
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2024-08-06
Anticipated expiration: 2041-08-24
Also published as: CN113672467A

Abstract

The disclosure provides an operation and maintenance early warning method and device, electronic equipment and a storage medium, and relates to the technical field of Internet, wherein Yu Yunwei fault warning scenes can be applied. The operation and maintenance early warning method comprises the following steps: acquiring sample monitoring data; training the neural network model based on sample monitoring data under a plurality of training time windows with different sizes to obtain monitoring prediction models corresponding to the training time windows; selecting a target monitoring prediction model according to the current prediction time window; inputting the predicted time data under the current predicted time window into a target monitoring prediction model, and calculating to obtain predicted data; and outputting alarm information according to the predicted data and preset alarm rules. According to the technical scheme, the trend of index data to be monitored in different timelines can be well predicted, faults are predicted and alarmed according to the trend, and the problem of service interruption caused by insufficient preparation time for handling the faults is avoided.

Description

Operation and maintenance early warning method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of internet operation and maintenance, in particular to an operation and maintenance early warning method, an operation and maintenance early warning device, electronic equipment and a computer readable storage medium.

Background

With the continuous penetration and perfection of IT (Internet Technology ) construction, operation and maintenance work of a computer software and hardware system becomes more and more complex, and technical difficulty is also higher and higher. For example, in the enterprise-level operation and maintenance process, indexes such as occupancy rate, utilization rate, input-Output (IO) and the like of objects such as a resource pool, a CPU, a memory, a disk, a software process heap memory and the like need to be monitored, and when an abnormality occurs, an alarm is given so that an operation and maintenance personnel can handle the fault according to the alarm.

Currently, commonly used operation tools such as zabbix, prometheus, solarwinds in enterprises generally alarm based on a threshold, namely, an alarm is triggered after a monitored index exceeds a set value. For example, some abnormal processes of the server may cause a sudden increase in memory occupancy rate, if the alarm is triggered when the threshold value (for example, 80% or 90%) is exceeded, since the abnormality has occurred when the operation and maintenance personnel receives the alarm, the operation and maintenance personnel does not have enough time to handle the fault after receiving the alarm, which affects the production service.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide an operation and maintenance early warning method and apparatus, an electronic device, and a computer readable storage medium, so as to at least overcome to a certain extent the problem that in the prior art, an operation and maintenance mode based on threshold value warning is easy to cause a failure to be able to be handled in time, thereby causing service interruption.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to a first aspect of an embodiment of the present disclosure, there is provided an operation and maintenance early warning method, including:

acquiring sample monitoring data, wherein the sample monitoring data comprises sample time data and sample monitoring index data;

training the neural network model based on the sample time data and the sample monitoring index data under a plurality of training time windows with different sizes to obtain monitoring prediction models corresponding to the training time windows;

Selecting a target monitoring prediction model from the monitoring prediction models corresponding to the training time windows according to the current prediction time window;

inputting the predicted time data under the current predicted time window into the target monitoring prediction model, and calculating to obtain predicted data;

and outputting alarm information according to the prediction data and a preset alarm rule.

In this embodiment, the monitored index data is predicted by using the better data fitting capability of the neural network model, so that the data prediction capability is improved, and by prediction, faults can be predicted according to the trend of the monitored index data and alarmed in advance, so that sufficient time is available for handling the faults, and smooth production service is ensured. Meanwhile, the trend of the index data monitored in different timelines can be predicted in parallel by using a plurality of time windows with different sizes, and the prediction requirement of the index data under different timelines is met.

In some example embodiments of the disclosure, based on the foregoing approach, the training the neural network model based on the sample time data and the sample monitor metric data includes:

inputting the sample monitoring data into the neural network model to obtain an output result;

calculating a loss function value according to the output result and a loss function containing a regularization term;

And adjusting the neural network model through a back propagation algorithm according to the loss function value.

In the embodiment, regularization terms are added in the loss function of the neural network model, so that the neural network model can be effectively prevented from being fitted, the accuracy of data prediction is improved, meanwhile, the influence of short-term severe fluctuation data on a middle-and-long-term training time window can be reduced, and the prediction accuracy of the middle-and-long-term training time window is improved.

In some example embodiments of the present disclosure, based on the foregoing scheme, the loss function including regularization term is:

Wherein w and b are the connection weight and bias term of the last layer in the neural network model, loss _new (w, b) is the loss function value containing regularization term, loss (w, b) is the original loss function, As a regularization term, lambda _i adjusts parameters for the regularization term corresponding to the ith training time window, where lambda _i is greater than 0.

In some example embodiments of the present disclosure, based on the foregoing solution, the calculation method of the regularization term adjustment parameter corresponding to the ith training time window includes:

Calculating the ratio r _i of the ith training time window to the minimum time window Deltat ₁ in each training time window according to a first formula, wherein the first formula is that

Based on the r _i, calculating an adjustment parameter lambda _i of a regularization term corresponding to the ith training time window according to a second formula, wherein the second formula is thatAnd k is greater than or equal to 0.

In this embodiment, the adjustment parameters of the regularization term may be increased along with the increase of the training time window, so that the suppression capability of the middle and long training time window to noise and short-term severe fluctuation is improved in the training of the neural network model, and the prediction accuracy of the neural network model is further improved.

In some example embodiments of the present disclosure, based on the foregoing solution, the outputting the alert information according to the prediction data and the preset alert rule includes:

Outputting alarm information when data larger than a first threshold exists in the predicted data;

And outputting alarm information according to a sub-alarm rule when the predicted data is smaller than the first threshold value and data larger than a second threshold value exists, wherein the second threshold value is smaller than the first threshold value.

In this embodiment, a "double-threshold" alarm rule of a first threshold and a second threshold is adopted, and when the predicted data is between the first threshold and the second threshold, whether to output alarm information is further determined by using a sub-alarm rule, so that repeated alarm and noise alarm caused by repeated fluctuation of monitored index data around the threshold are avoided.

In some example embodiments of the present disclosure, based on the foregoing solution, the outputting the alert information according to the sub-alert rule includes:

Acquiring a target prediction time window, wherein the target prediction time window is q times of the current prediction time window, and q is an integer greater than 1;

reselecting a target monitoring prediction model from monitoring prediction models corresponding to the training time windows based on the target prediction time windows;

inputting the predicted time data under the target predicted time window into a reselected target monitoring prediction model, and calculating to obtain new predicted data;

and outputting alarm information when data larger than a first threshold exists in the new prediction data.

In this embodiment, when the predicted data is between the first threshold and the second threshold, the current prediction time window is enlarged, the predicted data is recalculated, and then whether the new predicted data has data exceeding the first threshold is determined, and whether to output the alarm information is determined in the manner of the nested determination, so that repeated alarm and noise alarm caused by repeated fluctuation of the monitored index data around the threshold are avoided.

In some example embodiments of the present disclosure, based on the foregoing solution, the operation and maintenance early warning method further includes:

And re-acquiring sample monitoring data when the new predicted data is smaller than the first threshold value and data larger than the second threshold value exists.

According to a second aspect of the embodiments of the present disclosure, there is provided an operation and maintenance pre-warning device, including:

the acquisition unit is used for acquiring sample monitoring data, wherein the sample monitoring data comprises sample time data and sample monitoring index data;

The model training unit is used for training the neural network model based on the sample time data and the sample monitoring index data under a plurality of training time windows with different sizes to obtain monitoring prediction models corresponding to the training time windows;

the model selection unit is used for selecting a target monitoring prediction model from the monitoring prediction models corresponding to the training time windows according to the current prediction time window;

the data prediction unit is used for inputting the predicted time data under the current predicted time window into the target monitoring prediction model and calculating to obtain predicted data;

and the alarm unit is used for outputting alarm information according to the prediction data and a preset alarm rule.

According to the operation and maintenance early warning device provided by the embodiment of the disclosure, the monitored index data is predicted by using the better data fitting capacity of the neural network model, the data prediction capacity is improved, faults can be predicted and alarmed in advance according to the trend of the monitored index data through prediction, so that sufficient time is available for handling the faults, and smooth production business is ensured. Meanwhile, the trend of the index data monitored in different timelines can be predicted in parallel by using a plurality of time windows with different sizes, and the prediction requirement of the index data under different timelines is met.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; and a memory having stored thereon computer readable instructions that when executed by the processor implement any of the operation and maintenance pre-warning methods described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an operation and maintenance pre-warning method according to any one of the above.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 schematically illustrates a schematic diagram of an operation and maintenance pre-warning method according to some embodiments of the present disclosure;

FIG. 2 schematically illustrates a schematic diagram of a method of training neural network models, according to some embodiments of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of an operation and maintenance pre-warning method according to some embodiments of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of an operation and maintenance pre-warning device according to some embodiments of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of an operation and maintenance pre-warning device according to some embodiments of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of an operation and maintenance pre-warning device according to some embodiments of the present disclosure;

FIG. 7 schematically illustrates a structural schematic diagram of a computer system of an electronic device, in accordance with some embodiments of the present disclosure;

fig. 8 schematically illustrates a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Moreover, the drawings are only schematic illustrations and are not necessarily drawn to scale. The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

IT (Internet Technology ) operation and maintenance generally adopts a mode of alarming based on a threshold value, which has no prediction capability on monitoring indexes, and a large amount of alarms are easy to generate in the case that the monitoring indexes repeatedly fluctuate around the monitoring threshold value. In addition, for some monitoring indexes with quite slow degradation speed, for example, for the situation that the memory occupation needs to be increased from 90% to 95% for more than one month or even longer, a large number of repeated alarms are continuously generated after the monitoring threshold is set to 90%, and at this time, the sensitivity of operation and maintenance personnel is seriously reduced by a large number of repeated and low-value alarms, so that key alarm information is omitted.

The monitoring index is generally influenced by a plurality of factors such as host performance, an operating system, application development characteristics and the like, any traditional function model such as linear regression and nonlinear regression is difficult to cover all monitoring prediction scenes, and the situation that the prediction result is inconsistent with the real distribution of the monitoring index easily occurs.

Some prediction methods only predict in a single time window, for example, have a better prediction effect on an hour-level data set, and can predict the change trend of the monitoring index in the next several hours, but it is difficult to make effective prediction for the monitoring index with serious degradation in a short time and the monitoring index with slow degradation in several days, or the prediction accuracy and the actual accuracy have larger access.

In view of this, in the embodiment of the present disclosure, under a plurality of training time windows with different sizes, training a neural network model based on sample time data and sample monitor index data to obtain monitor prediction models corresponding to each training time window, then selecting a target monitor prediction model from the monitor prediction models corresponding to each training time window according to the current prediction time window, performing monitor index prediction according to the prediction time data and the target monitor prediction model under the current prediction time window to obtain prediction data, and then outputting alarm information according to the prediction data and a preset alarm rule.

In this exemplary embodiment, an operation and maintenance early warning method is provided first, and the operation and maintenance early warning method can be applied to terminal devices, such as mobile phones, computers and other electronic devices. Fig. 1 schematically illustrates a schematic diagram of an operation and maintenance pre-warning method flow according to some embodiments of the present disclosure. Referring to fig. 1, the operation and maintenance early warning method may include the steps of:

step S101: sample monitoring data is obtained.

The sample monitoring data includes sample time data and sample monitoring index data. The terminal equipment can collect monitoring data from a monitoring database, a monitoring component log, a monitoring component program interface and the like, extract time data and corresponding monitoring index data in the monitoring data, obtain sample time data and sample monitoring index data, and take the sample time data and the sample monitoring index data as sample monitoring data.

Step S102: and training the neural network model based on the sample monitoring data under a plurality of training time windows with different sizes to obtain a monitoring prediction model corresponding to each training time window.

And constructing a neural network model under a plurality of training time windows with different sizes, and training the neural network model based on the acquired sample time data and sample monitoring index data until the training stopping conditions are met, so as to obtain a monitoring prediction model corresponding to each training time window.

Specifically, the number of training time windows and the size of each training time window are first determined. For example, the training time windows are Δt ₁、Δt₂, … …, and Δt _m, where m is an integer greater than or equal to 1, i.e., at least one training time window, and the sizes of the training time windows sequentially increase with the index m.

Next, a neural network model is built under each training time window.

And then training the neural network model based on the acquired sample time data and sample monitoring index data to obtain a monitoring prediction model corresponding to each training time window. Referring to fig. 2, a flow chart of a method of training various neural network models is schematically shown, which may include the steps of:

step S1021: and inputting the sample monitoring data into the neural network model to obtain an output result.

The sample monitoring data comprise sample time data and sample monitoring index data, wherein the sample time data is used as an input item of the neural network model, and the sample monitoring index data is used as an output item of the neural network model.

Step S1022: and calculating a loss function value according to the output result and the loss function containing regularization term.

The loss function containing regularization term is:

where w and b are the connection weights and bias terms of the last layer in the neural network model, respectively, loss _new (w, b) is the loss function value containing the regularization term, loss (w, b) is the original loss function, for example, the mean square error loss function MSE (Mean Squared Error), As regularization term, λ _i adjusts the parameters for regularization term corresponding to the ith training time window and λ _i is greater than 0. Wherein the regularization term is used for calculating the norm sum of the node connection weights between layers of the neural network model, p is a positive integer, represents the norm of the node connection weights between layers of the neural network model, L is the total layer number of the neural network model, L is the layer number index of the neural network model, h and j represent the indexes of two connection nodes,Representing node connection weights.

Wherein, lambda _i can be calculated by the following steps S10221 to S10222:

step S10221: calculating the ratio r _i of the ith training time window to the minimum time window Deltat ₁ in each training time window according to a first formula of

Step S10222: based on r _i calculated in step S10221, calculating a regulating parameter lambda _i of a regularization term corresponding to the ith training time window according to a second formula, wherein the second formula is thatWherein k is greater than or equal to 0.

The lambda _i conforming to the calculation mode can be increased along with the increase of the time window, so that the suppression capability of the middle and long time window to noise and short-term severe fluctuation is improved in the training of the neural network model, the prediction precision of the neural network model is improved, and the influence of over fitting during the training on the accuracy of data prediction is prevented.

Step S1023: and adjusting the neural network model through a back propagation algorithm according to the loss function value.

Training the neural network model by the method corresponding to the figure 2 until the training termination condition is met, and obtaining the monitoring prediction model corresponding to each training time window. The training termination conditions include:

loss _n′_ew(w,b)^(e+s)≥loss′_new(w,b)^(e) or trian _time ⁽ⁱ⁾>η⁽ⁱ⁾;

Wherein s and e are positive integers, e and e+s represent the number of iterations of training, loss' _new (w, b) is a loss function containing regularization term, and calculation is performed based on test set data selected from the sample monitoring data;

trian _time ⁽ⁱ⁾ is the training duration of the neural network model corresponding to the ith training time window, η ⁽ⁱ⁾ is the maximum training duration preset under the ith training time window, for example, there are three training time windows, and η ⁽¹⁾＝η⁽²⁾＝η⁽³⁾ =60 s can be set.

Step S103: and selecting a target monitoring prediction model from the monitoring prediction models corresponding to the training time windows according to the current prediction time window.

And after the monitoring prediction model corresponding to each training time window is obtained, predicting the monitored index. Specifically, the formula is firstly adoptedDetermining the size of a current prediction time window, selecting a training time window which is the closest to the size of the current prediction time window from the training time windows, and taking a monitoring prediction model corresponding to the training time window as a target monitoring prediction model.

Where n is a positive integer, i is an index of the training time window, Δt _i' is a size of the current prediction time window, and Δt _i is a size of the training time window corresponding to the current prediction time window.

Step S104: and inputting the predicted time data under the current predicted time window into a target monitoring prediction model, and calculating to obtain predicted data.

After determining a current prediction time window and a corresponding target monitoring prediction model, generating prediction time data of the current prediction time window by taking sampling frequency of sample monitoring data in a corresponding training time window as frequency under the current prediction time window; and then, the predicted time data is used as an input item to be input into a target monitoring prediction model, corresponding predicted monitoring index data is calculated, and then, the predicted time data and the predicted monitoring index data are combined to obtain predicted data.

Step S105: and outputting alarm information according to the predicted data and preset alarm rules.

After the prediction data is obtained, judging whether to trigger an alarm or not based on the prediction data and a preset alarm rule, and outputting alarm information to prompt fault handling when the alarm is determined to be triggered.

Specifically, when the predicted data has data larger than a first threshold value, outputting alarm information; and outputting alarm information according to the sub-alarm rules when the predicted data is smaller than the first threshold and data larger than a second threshold exists, wherein the second threshold is smaller than the first threshold.

Wherein, outputting the alarm information according to the sub-alarm rule comprises:

Reselecting a target monitoring prediction model from monitoring prediction models corresponding to all training time windows based on the target prediction time windows;

And re-acquiring the sample monitoring data when the new predicted data is smaller than the first threshold value and data larger than the second threshold value exists.

When the predicted data is between the high threshold value and the low threshold value, the current predicted time window is enlarged to expand the predicted range, and then whether the monitored index data exceeding the high threshold value exists or not is judged again, so that repeated alarming and noise alarming caused by index fluctuation are reduced, and the early warning capability is improved.

According to the operation and maintenance early warning method in the embodiment, the trend of the monitored index is predicted by adopting the neural network model, and the abnormality can be predicted and warned according to the trend, so that the abnormality can be treated in time, and smooth production business is ensured. On the one hand, the neural network model in the embodiment has stronger fitting capability on the function distribution of the monitored indexes, and can better predict the trend of various different monitoring indexes; on the other hand, the loss function of the neural network model of the embodiment contains regularization items related to the size of the time window, so that the over-fitting tendency of the neural network model can be effectively restrained, and the data prediction precision is improved; furthermore, a plurality of time windows with different sizes are used for training and predicting the trend of the monitored index in different time periods in parallel, so that the prediction requirement of index data under different time periods is met.

In the embodiment of the disclosure, the monitored index may include an occupancy rate, a usage rate, an IO, and the like of an object such as a CPU (Central Processing Unit ) of a resource pool, a cloud computing platform, and the like, a memory, a disk, and the like. Based on the operation and maintenance early warning method of the corresponding embodiment of fig. 1, the operation and maintenance early warning method of the embodiment of the present disclosure is further illustrated below by taking the monitoring of the memory usage rate of the resource pool host as an example. It can be understood that any monitored index can adopt the operation and maintenance early warning method to realize operation and maintenance early warning.

Fig. 3 is a flowchart of another operation and maintenance early warning method provided in this embodiment, where the operation and maintenance early warning method can be applied to terminal devices, such as mobile phones, computers, and other electronic devices. The operation and maintenance early warning method can comprise the following steps:

step S301: and acquiring memory monitoring data.

The electronic device collects time-series memory monitoring data from promethus monitoring databases, wherein the memory monitoring data comprises time data and monitoring index data.

Step S302: and carrying out normalization processing on the memory monitoring data to obtain the memory monitoring data after normalization processing.

After the memory monitoring data is obtained, normalization processing is carried out on the memory monitoring data to obtain the memory monitoring data after normalization processing. Specifically, the time data and the monitoring index data of each data in the memory monitoring data are respectively normalized.

Wherein, normalization processing of the time data includes: uniformly converting time data in different formats in the memory monitoring data into time stamps (unit is seconds) in a floating point number format, and carrying out normalization processing by adopting a time normalization processing formula, wherein the time normalization processing formula is as follows:

wherein t is a time stamp after normalization processing, t' is a time stamp before normalization processing, Δt=t _end-t_start,t_start is a start time stamp of the current training time window, and t _end is an end time stamp of the current training time window. It may be agreed in the present disclosure that Δt alone represents a time window, i.e. a period of acquisition of monitoring data, and that occurrence in formulas such as a calculation formula and a definition formula represents a scalar of a size of the time window.

The normalizing process of the monitoring index data in the memory monitoring data comprises the following steps: the percentage fraction is directly used.

In the embodiment of the present disclosure, the memory monitoring data after normalization processing is set to conform to the following form:

<t,d>_k；

wherein t is a time stamp of time data after normalization processing, d is data obtained by normalization processing of monitoring index data, k is an index of memory monitoring data, and the number of the total monitoring data in the current training time window is smaller than or equal to the number of the total monitoring data.

It should be noted that, in the embodiment of the present disclosure, the monitored index may include an occupancy rate, a usage rate, an IO, and the like of an object such as a CPU (Central Processing Unit ) of a resource pool, a cloud computing platform, and the like, a memory, a disk, and the like. Then, the normalization processing of the monitored index data of the monitored index includes:

The percentage indexes such as occupancy rate, utilization rate and the like can be directly the percentage fraction; and normalizing indexes such as disk IO, process heap memory and the like according to the maximum design standard of the manufacturer technical specification and the maximum value of the preset resource.

It will be appreciated that for any of the indices, the monitored data can be normalized to the form < t, d > _k described above.

Taking the example of "2016-05-05 20:28:54" as the time data in the memory monitor data, the corresponding time stamp is "1462451334.0". Then using time normalization processing formulaAnd carrying out normalization processing on the monitored time data. For example, taking Δt=3600s, t' =1462451334.0, t _end = 14624513340, then t _start = 1462447734.0, the time stamp of the normalized time data "2016-05-20:28:54" is t=1.

The memory utilization rate adopts the percentage fraction thereof as the monitoring index data after normalization processing. For example, the current memory usage is 89%, and the normalized value is 0.89.

And combining t=1 and numerical value 0.89 according to the data format of < t, d > _k to obtain one piece of memory monitoring data after normalization processing, and performing the normalization processing on each piece of memory monitoring data in the current training time window to obtain the memory monitoring data after normalization processing in the current training time window.

Step S303: the number of training time windows and the size of each training time window are determined.

In this embodiment, taking the example of predicting the memory usage rate of three aging periods of short term, medium term and long term at the same time, the training time windows of three aging periods of short term, medium term and long term can be defined as follows: Δt ₁＝3600s,Δt₂＝86400,Δt₃ = 2592000s.

Step S304: a neural network model is built under each training time window. The neural network model includes the following features:

(1) Comprising an input layer, at least 1 hidden layer and an output layer, e.g. 3 hidden layers;

(2) The whole neural network model adopts a full connection mode;

(3) Each hidden layer contains at least 1 node, such as 10 nodes;

(4) Using nonlinear activation functions, such as function ReLU (RECTIFIED LINEAR Unit, linear rectification function);

(5) The penalty function contains a regularization term, and the value of the regularization term is related to the size of the training time window.

Step S305: and performing iterative training on the neural network model under each training time window based on the memory monitoring data after normalization processing until the training termination condition is met, and obtaining a monitoring prediction model corresponding to each training time window.

Step S306: the size of the prediction time window is determined.

The training time windows of the short term, the middle term and the long term are respectively: Δt ₁＝3600s,Δt₂＝86400,Δt₃ = 2592000s. Defining a prediction time window and training time window to satisfyTaking 1, 2, and 3 in order, the sizes of the predicted time windows at '₁、Δt′₂ and at' ₃ for the three short, medium, and long term ages can be determined.

Step S307: the predicted time data is collected under a predicted time window.

Taking the above-mentioned short-term, medium-term and long-term aging as an example, the predicted time data is collected (or generated) under the predicted time windows Δt '₁、Δt′₂ and Δt' ₃, respectively, and the sampling frequency is the same as the recording frequency of the memory monitoring data in the corresponding Δt ₁、Δt₂ and Δt ₃.

Step S308: and carrying out normalization processing on the predicted time data to obtain normalized predicted time data.

The normalization processing is performed on the predicted time data under each predicted time window by the same normalization processing manner as in step S302, and the value of t _start may be the same as Δt _i.

Step S309: and selecting a target monitoring prediction model from the monitoring prediction models corresponding to the training time windows according to the current prediction time window.

Taking the above-mentioned short-term, medium-term and long-term aging as an example, the target monitoring prediction model of Δt ' ₁ may be selected as the monitoring prediction model corresponding to Δt ₁, the target monitoring prediction model of Δt ' ₂ may be selected as the monitoring prediction model corresponding to Δt ₂, and the target monitoring prediction model of Δt ' ₃ may be selected as the monitoring prediction model corresponding to Δt ₃.

Step S310: and inputting the normalized prediction time data into a corresponding target monitoring prediction model, and calculating to obtain the prediction data of the current prediction time window.

Specifically, the predicted time data after normalization processing in Δt ' ₁ is input to a monitoring prediction model corresponding to Δt ₁, the predicted time data after normalization processing in Δt ' ₂ is input to a monitoring prediction model corresponding to Δt ₂, the predicted time data after normalization processing in Δt ' ₃ is input to a monitoring prediction model corresponding to Δt ₃, calculation is performed respectively, predicted monitor index data under Δt ' ₁、Δt′₂ and Δt ' ₃ are obtained, and then the predicted time data under Δt ' ₁、Δt′₂ and Δt ' ₃ and the predicted monitor index data are combined respectively to obtain predicted data under Δt ' ₁、Δt′₂ and Δt ' ₃.

Step S311: judging whether the predicted data of the current predicted time window is larger than a second threshold value and smaller than a first threshold value, if so, executing the steps S312-S316; if there is data greater than the first threshold in the prediction data of the current prediction time window, executing step S316; if the prediction data of the current prediction time window is smaller than the second threshold, the step S301 is returned. The first threshold is greater than the second threshold, that is, whether the predicted data of the current predicted time window is between the high threshold and the low threshold is judged.

Step S312: a target prediction time window is obtained, which is q times the current prediction time window, q being an integer greater than 1, for example q=3 in the present embodiment.

Step S313: and reselecting the target monitoring prediction model from the monitoring prediction models corresponding to the training time windows based on the target prediction time windows.

Step S314: and inputting the predicted time data under the target predicted time window into the reselected target monitoring prediction model, and calculating to obtain new predicted data.

Step S315: judging whether the new predicted data has data larger than a first threshold value, if so, executing step S316; if the new predicted data is smaller than the first threshold and there is data greater than the second threshold, or the new predicted data is smaller than the second threshold, the process returns to step S301 to retrieve the memory monitoring data.

Step S316: and outputting alarm information.

The alarm information can be output to operation and maintenance personnel in a short message, telephone or other modes, or the alarm information can be output to an automatic fault disposal system, instant messaging software and the like through an alarm interface, so that the impending fault can be disposed in sufficient time, the fault discovery capability of the operation and maintenance personnel and the automatic disposal system is improved, and the problem of service interruption such as disk write-up, memory overflow and the like caused by insufficient preparation time is avoided.

According to the operation and maintenance early warning method, the trend of the memory utilization rate in different timelines can be predicted in parallel through the multi-time window neural network model, prediction tasks of the memory utilization rate under different timelines such as short term, medium term and long term can be considered, operation and maintenance prediction capability is effectively improved, fault discovery capability of operation and maintenance personnel and an automatic treatment system is improved, and service interruption problems such as disk write-up and memory overflow caused by insufficient preparation time are avoided. Meanwhile, the loss function of the neural network model comprises regularization items related to the size of the time window, so that the over-fitting tendency of the neural network model can be effectively restrained, the data prediction precision of the neural network model is improved, and particularly, the fault prediction capability of the medium-and-long-term time window is improved. In addition, a double-threshold alarm strategy is set, when the predicted data is between a high threshold and a low threshold, the current predicted time window is enlarged, whether the memory usage data exceeding the high threshold exists or not is judged again, repeated alarm and noise alarm caused by index fluctuation are reduced, and the operation and maintenance early warning capability is improved.

It should be noted that although the steps of the methods of the present disclosure are illustrated in a particular order in the figures, this does not require or imply that the steps must be performed in that particular order or that all of the illustrated steps must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

In addition, in the present exemplary embodiment, an operation and maintenance early warning device is also provided. Referring to fig. 4, the operation and maintenance pre-warning device 40 includes: an acquisition unit 401, a model training unit 402, a model selection unit 403, a data prediction unit 404, and an alarm unit 405. Wherein:

the acquisition unit 401 is configured to acquire sample monitoring data, where the sample monitoring data includes sample time data and sample monitoring index data;

The model training unit 402 is configured to train the neural network model based on the sample time data and the sample monitoring index data under a plurality of training time windows with different sizes, so as to obtain a monitoring prediction model corresponding to each training time window;

The model selecting unit 403 is configured to select a target monitoring prediction model from monitoring prediction models corresponding to each training time window according to the current prediction time window;

The data prediction unit 404 is configured to input prediction time data under the current prediction time window to the target monitoring prediction model, and calculate to obtain prediction data;

the alarm unit 405 is configured to output alarm information according to the prediction data and a preset alarm rule.

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, as shown in fig. 5, the model training unit 402 may include an input subunit 4021, a calculation subunit 4022, and an adjustment subunit 4023. Wherein:

the input subunit 4021 is configured to input sample time data and sample monitoring index data to a neural network model, to obtain an output result;

The calculation subunit 4022 is configured to calculate a loss function value according to the output result and a loss function including a regularization term;

The adjustment subunit 4023 is configured to adjust the neural network model according to the loss function value through a back propagation algorithm.

In one exemplary embodiment of the present disclosure, based on the foregoing scheme, the loss function containing regularization term is:

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the method for calculating, by the calculating subunit 4022, the regularization term adjustment parameter corresponding to the ith training time window includes:

Calculating the ratio r _i of the ith training time window to the minimum time window Deltat ₁ in each training time window according to a first formula of

Based on the r _i obtained by calculation, calculating the adjustment parameter lambda _i of the regularization term corresponding to the ith training time window according to a second formula, wherein the second formula is thatWherein k is greater than or equal to 0.

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, as shown in fig. 6, the alarm unit 405 may include

The first alarm subunit 4051 is configured to output alarm information when data greater than a first threshold exists in the predicted data;

The second alarm subunit 4052 is configured to output alarm information according to the sub-alarm rule when the predicted data is less than the first threshold and there is data greater than a second threshold, where the second threshold is less than the first threshold.

In an exemplary embodiment of the present disclosure, based on the foregoing solution, the second alert subunit 4052 is specifically configured to obtain a target prediction time window, where the target prediction time window is q times of the current prediction time window, q is an integer greater than 1, reselect a target monitoring prediction model from monitoring prediction models corresponding to each training time window based on the target prediction time window, input prediction time data under the target prediction time window to the reselected target monitoring prediction model, calculate to obtain new prediction data, and output alert information when data greater than a first threshold exists in the new prediction data.

In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the second alarm subunit 4052 is further specifically configured to re-acquire the sample monitoring data when the new predicted data is less than the first threshold and there is data greater than the second threshold.

The operation and maintenance early warning device provided by the embodiment of the disclosure realizes the prediction of the trend of the monitored index data, and can predict the abnormality and warn according to the trend, so that the abnormality can be treated in time, and the smooth proceeding of the production business is ensured. The operation and maintenance early warning device predicts the monitored index data by using the better data fitting capacity of the neural network model, and improves the data prediction capacity; the method can use a plurality of time windows with different sizes to predict the trend of the index data to be monitored in different time periods in parallel, and meets the prediction requirement of the index data under different time periods.

The specific details of each module of the operation and maintenance early warning device are described in detail in the corresponding operation and maintenance early warning method, so that the details are not repeated here.

It should be noted that although several modules or units of the operation and maintenance pre-warning device are mentioned in the above detailed description, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

In addition, in the exemplary embodiment of the present disclosure, an electronic device capable of implementing the foregoing operation and maintenance early warning method is also provided.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to such an embodiment of the present disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in fig. 7, the electronic device 700 is embodied in the form of a general purpose computing device. Components of electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one storage unit 720, a bus 730 connecting the different system components (including the storage unit 720 and the processing unit 710), and a display unit 740.

Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the processing unit may perform the steps as shown in fig. 1: step S101, sample monitoring data is obtained, wherein the sample monitoring data comprises sample time data and sample monitoring index data; step S102, training a neural network model based on sample monitoring data under a plurality of training time windows with different sizes to obtain monitoring prediction models corresponding to the training time windows; step S103, selecting a target monitoring prediction model from monitoring prediction models corresponding to all training time windows according to the current prediction time window; step S104, inputting the predicted time data under the current predicted time window into a target monitoring prediction model, and calculating to obtain predicted data; step S105, outputting alarm information according to the predicted data and the preset alarm rules.

The memory unit 720 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 721 and/or cache memory 722, and may further include Read Only Memory (ROM) 723.

The storage unit 720 may also include a program/utility 724 having a set (at least one) of program modules 725, such program modules 725 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 730 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 770 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 700, and/or any device (e.g., router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 750. Also, electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 760. As shown, network adapter 760 communicates with other modules of electronic device 700 over bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 8, a program product 800 for implementing the above-described operation and maintenance pre-warning method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An operation and maintenance early warning method is characterized by comprising the following steps:

outputting alarm information according to the prediction data and a preset alarm rule;

selecting a target monitoring prediction model from the monitoring prediction models corresponding to the training time windows according to the current prediction time window, wherein the target monitoring prediction model comprises the following steps:

determining a size of a plurality of prediction time windows; wherein each prediction time window is associated with a training time window of the plurality of training time windows;

Selecting a monitoring prediction model corresponding to a training time window associated with the current prediction time window from monitoring prediction models corresponding to all training time windows as the target monitoring prediction model; wherein the current prediction time window is any one of the plurality of prediction time windows.

2. The operation and maintenance pre-warning method according to claim 1, wherein the training the neural network model based on the sample time data and the sample monitor index data comprises:

inputting the sample time data and the sample monitoring index data into the neural network model to obtain an output result;

3. The operation and maintenance pre-warning method according to claim 2, wherein the loss function including regularization term is:

4. The operation and maintenance early warning method according to claim 3, wherein the calculation method of regularization term adjustment parameters corresponding to the ith training time window includes:

5. The operation and maintenance pre-warning method according to any one of claims 1 to 4, wherein the outputting warning information according to the prediction data and a preset warning rule includes:

6. The operation and maintenance pre-warning method according to claim 5, wherein the outputting the warning information according to the sub-warning rule comprises:

7. The operation and maintenance pre-warning method according to claim 6, further comprising:

8. An operation and maintenance early warning device, which is characterized by comprising:

the alarm unit is used for outputting alarm information according to the prediction data and preset alarm rules;

The model selection unit is specifically configured to:

9. An electronic device, comprising:

A processor; and

A memory having stored thereon computer readable instructions which when executed by the processor implement the operation and maintenance pre-warning method according to any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the operation and maintenance pre-warning method according to any one of claims 1 to 7.