CN119886567A

CN119886567A - Intelligent power engineering management and control system and method based on big data

Info

Publication number: CN119886567A
Application number: CN202510067978.8A
Authority: CN
Inventors: 王鹏宇; 邓嵘; 王奕镇; 韦淦予; 向小峰
Original assignee: Guangxi Pingban Hydropower Development Co ltd
Current assignee: Guangxi Pingban Hydropower Development Co ltd
Priority date: 2025-01-16
Filing date: 2025-01-16
Publication date: 2025-04-25

Abstract

The invention discloses an intelligent management and control system and method for electric power engineering based on big data, and relates to the technical field of engineering risk management and control, wherein the intelligent management and control system comprises the steps of collecting multi-source heterogeneous data in a management and control area and dividing the multi-source heterogeneous data into suspicious data and non-suspicious data; when the multi-source heterogeneous data are non-suspicious data, carrying out fit degree analysis and coverage degree analysis on each multi-source heterogeneous data, evaluating the accuracy of the multi-source heterogeneous data, integrating the accurate data into a deep learning model according to an evaluation result, and predicting the overall management risk score of the electric power engineering through the deep learning model based on inaccurate data in the multi-source heterogeneous data. The method has the advantages that the coverage and consistency of the data are ensured through a multi-level data verification and risk assessment mechanism, the adverse effect of data distortion on power engineering management and control is fundamentally solved, and the reliability and practicability of a power engineering management and control system are remarkably improved.

Description

Intelligent power engineering management and control system and method based on big data

Technical Field

The invention relates to the technical field of engineering risk management and control, in particular to an intelligent management and control system and method for electric power engineering based on big data.

Background

Along with the rapid development of information technology, big data and artificial intelligence technology are applied to engineering management and control more and more widely, and especially in power engineering management and control, the efficiency and accuracy of engineering risk management and control can be greatly improved by utilizing real-time multi-source heterogeneous data. However, in practical application, the quality of the data becomes a key problem, the multi-source heterogeneous data often comes from different data sources including weather feature data, population density feature data and the like, and the data may have inconsistent, incomplete or distorted conditions, if the multi-source heterogeneous data is distorted in the process of training a big data analysis model, the multi-source heterogeneous data is inconsistent with the actual conditions, which may cause an engineering manager to be misled in making a decision or executing the decision, and in the emergency development process, if the multi-source heterogeneous data distortion condition cannot be predicted in advance in time, the decision maker may have understanding deviation in executing the decision, so that the risk event cannot be predicted accurately in time, and the engineering management risk is increased.

Disclosure of Invention

In order to solve the technical problems, the technical scheme solves the problems that the distortion condition of the multi-source heterogeneous data cannot be predicted in advance in time, and the decision maker can possibly cause understanding deviation in decision execution, so that risk events cannot be predicted accurately in time, and engineering management risks are increased.

In order to achieve the above purpose, the invention adopts the following technical scheme:

an intelligent management and control method for electric power engineering based on big data comprises the following steps:

Collecting multi-source heterogeneous data in a management and control area, comparing each multi-source heterogeneous data with historical multi-source heterogeneous data, and dividing the multi-source heterogeneous data into suspicious data and non-suspicious data according to a comparison result;

when the multi-source heterogeneous data are non-suspicious data, carrying out fit degree analysis on each multi-source heterogeneous data, and evaluating the integral fit degree of the multi-source heterogeneous data and the deep learning model;

when the multi-source heterogeneous data are non-suspicious data, performing coverage degree analysis on each multi-source heterogeneous data, and evaluating the overall coverage of the multi-source heterogeneous data;

comprehensively analyzing the overall fit degree of the multi-source heterogeneous data and the deep learning model and the overall coverage of the multi-source heterogeneous data, and evaluating the accuracy of the multi-source heterogeneous data;

dividing the multi-source heterogeneous data into accurate data and inaccurate data according to the evaluation result, and integrating the accurate data into a deep learning model;

based on inaccurate data in the multi-source heterogeneous data, predicting an overall management and control risk score of the power engineering through a deep learning model.

Preferably, the comparing each multi-source heterogeneous data with the historical multi-source heterogeneous data specifically includes:

Comparing the data feature vectors among different independent sources with the feature vectors of the corresponding historical data as a group of data;

Representing the new data collected at each point in time as a feature vector x _n, where n is the number of features and x _n represents the feature vector of the multi-source heterogeneous data currently collected;

The corresponding historical data is expressed as a characteristic vector y _n,y_n which represents the characteristic vector of the multi-source heterogeneous data collected in the history;

calculating the difference between the feature vectors by using a Euclidean distance calculation formula;

wherein, the Euclidean distance formula is:

Wherein d (x, y) is a difference degree value between the historical data and the current data.

Preferably, the classifying the multi-source heterogeneous data into suspicious data and non-suspicious data according to the comparison result specifically includes:

Judging whether the difference degree value between the data characteristic vectors between the independent sources and the characteristic vectors of the corresponding historical data is larger than or equal to a difference degree threshold value, if so, marking as suspicious data, and if not, marking as non-suspicious data.

Preferably, when the multi-source heterogeneous data is non-suspicious data, performing a fitness analysis on each multi-source heterogeneous data, and evaluating the overall fitness of the multi-source heterogeneous data and the deep learning model specifically includes:

Selecting characteristics capable of reflecting the relation between the data and the deep learning model from the multi-source heterogeneous data as nodes of the Bayesian network model;

For each node Ys, calculating the conditional probability thereof by a conditional probability calculation formula;

the conditional probability calculation formula is as follows:

Where P (Yp, xp) is the joint probability distribution of node Yp and all its parent nodes Xp, P (Xp) is the edge probability distribution of parent nodes Xp, and P (ys|xs) is the conditional probability of each node Ys;

calculating probability distribution for a target variable Xs under the overall target of the deep learning model;

Wherein, the probability distribution calculation formula is:

Wherein P (Xs-Ys) is probability distribution of a target variable Xs under the whole target of the deep learning model;

calculating the overall fitness deviation value of the multi-source heterogeneous data and the deep learning model through the calculated conditional probability P (ys|xs) and the probability distribution P (xs|ys) of the target variable;

The calculation formula of the fitness deviation value is as follows:

Where q is the number of data samples and D _z is the fitness bias value between the multi-source heterogeneous data and the deep learning model.

Preferably, when the multi-source heterogeneous data is non-suspicious data, performing coverage analysis on each multi-source heterogeneous data, and evaluating the coverage of the whole multi-source heterogeneous data specifically includes:

dividing the collected time series data into a series of fixed length time windows, each window representing one hour of data;

calculating the coverage rate of the data sources in each time window, wherein the coverage rate is defined as the actual coverage area or time proportion of the data sources in the time window;

The coverage rate of the data source is calculated according to the following formula:

Wherein R (t) represents coverage within the time window t, A (t) is the area or time actually covered within the time window t, and A _total is the total area or time;

Calculating a coverage fluctuation index based on the coverage rate in each time window, wherein the coverage fluctuation index reflects the change condition of the coverage rate between different time windows;

Wherein, the calculation formula of the coverage fluctuation index is as follows:

Where V is the coverage fluctuation index, N is the number of time windows, R _N is the coverage of the N time window, and R _y is the average of all time window coverage.

Preferably, the comprehensively analyzing the overall fit degree of the multi-source heterogeneous data and the deep learning model and the overall coverage of the multi-source heterogeneous data, and evaluating the accuracy of the multi-source heterogeneous data specifically includes:

normalizing the integral fitness deviation value D _z of the multi-source heterogeneous data and the deep learning model and the coverage fluctuation index V, and calculating the accuracy coefficient of the multi-source heterogeneous data through the integral fitness deviation value D _z of the multi-source heterogeneous data and the deep learning model after normalization and the coverage fluctuation index V;

the accuracy coefficient calculation formula of the multi-source heterogeneous data is as follows:

Wherein SA is an accuracy coefficient of multi-source heterogeneous data, a ₁ and a ₂ are preset scale coefficients, and a ₁ and a ₂ are both larger than 0.

Preferably, the dividing the multi-source heterogeneous data into accurate data and inaccurate data according to the evaluation result specifically includes:

Judging whether the accuracy coefficient SA of the multi-source heterogeneous data is larger than or equal to the accuracy coefficient threshold of the preset multi-source heterogeneous data, if so, recording the multi-source heterogeneous data as accurate data, and if not, recording the multi-source heterogeneous data as inaccurate data.

Preferably, predicting the overall management risk score of the power engineering through the deep learning model based on inaccurate data in the multi-source heterogeneous data specifically includes:

For each inaccurate data L _j, calculating its deviation from the model prediction result, quantifying this deviation using a loss function S, where j represents the number of inaccurate data and j is a positive integer greater than 0;

Wherein the loss function expression is:

e_j＝S(L_j,Ly_j);

Where L _j represents an actual observed value, ly _j represents a predicted value of the deep learning model, and e _j represents a predicted deviation;

summarizing the prediction deviations of all inaccurate data, and calculating an overall prediction risk score;

the total risk calculation formula is as follows:

where w _j is the weight factor for each inaccurate data, e _j is the predicted bias value for the j-th inaccurate data, and R is the overall predicted risk score.

Further, an intelligent management and control system for electric power engineering based on big data is provided, which is used for implementing the intelligent management and control method for electric power engineering based on big data, and specifically includes:

The multi-source heterogeneous data acquisition module is used for acquiring multi-source heterogeneous data in the management and control area;

The multi-source heterogeneous data comparison module is used for comparing each multi-source heterogeneous data with historical multi-source heterogeneous data and dividing the multi-source heterogeneous data into suspicious data and non-suspicious data according to a comparison result;

The fitness analysis module is used for carrying out fitness analysis on each multi-source heterogeneous data based on non-suspicious data and evaluating the overall fitness of the multi-source heterogeneous data and the deep learning model;

The coverage analysis module is used for analyzing the coverage degree of each multi-source heterogeneous data based on the non-suspicious data and evaluating the overall coverage of the multi-source heterogeneous data;

the accuracy dividing module divides the multi-source heterogeneous data into accurate data and inaccurate data according to the evaluation result, and integrates the accurate data into a deep learning model;

And the risk prediction module predicts the overall management risk score of the power engineering through a deep learning model based on inaccurate data in the multi-source heterogeneous data.

Compared with the prior art, the invention has the beneficial effects that:

(1) The method comprises the steps of carrying out strict comparison on real-time multi-source heterogeneous data and historical data, calculating the difference degree between new data and the historical data by using Euclidean distance, comparing the difference degree with a preset difference degree threshold value, dividing the data into suspicious data and non-suspicious data, effectively screening out data which are possibly distorted, avoiding the data from directly entering a model training link, reducing model prediction deviation caused by data distortion, and further evaluating the overall fit degree of the non-suspicious data with a deep learning model through a fit degree analysis module, further verifying the authenticity and reliability of the data through a Bayesian network model, and evaluating the coverage rate of the data in different time windows through calculating a coverage fluctuation index by a coverage analysis module so as to ensure the continuity and the integrity of the data in time and space. The steps work together to obviously improve the quality of data, thereby guaranteeing the prediction accuracy and stability of the deep learning model, enabling engineering management and control operators to rely on more reliable prediction results when making decisions, and reducing decision errors caused by data distortion. By the technical scheme, distorted data can be found and removed in time, the coverage and consistency of the data can be ensured through comprehensive analysis, and adverse effects of data distortion on power engineering management and control are fundamentally solved;

(2) According to the method, the accuracy coefficient of the multi-source heterogeneous data is calculated and compared with the preset accuracy coefficient threshold value, so that the fact that only high-accuracy data can be integrated into the deep learning model is further ensured. The mechanism not only improves the prediction performance of the model, but also enhances the robustness of the system; the method and the system can be used for judging inaccurate data, calculating deviation between the inaccurate data and a model prediction result and summarizing to obtain an overall prediction risk score, so that the influence degree of the inaccurate data on the model prediction is evaluated, when the overall prediction risk score exceeds a preset threshold, a decision maker is prompted to pay attention to potential risks and recommends taking corresponding remedial measures, the system can find and deal with the problem of data distortion at the first time, decision deviation caused by the problem of data quality is avoided, accuracy and reliability of the data in the whole engineering management and control system are guaranteed through comprehensive evaluation of the fit degree and the coverage of the data, speed and effect of emergency response are improved, in addition, model parameters can be timely adjusted through dynamic monitoring of the accuracy and the stability of the data, the data processing flow is optimized, and accordingly overall performance of the system is continuously improved.

Drawings

FIG. 1 is a flow chart of an intelligent control method of electric power engineering based on big data;

FIG. 2 is a flow chart of a method for computing differences between feature vectors in the present invention;

FIG. 3 is a flow chart of a method for evaluating the overall fitness of multi-source heterogeneous data and a deep learning model according to the present invention;

FIG. 4 is a flow chart of a method for computing risk of engineering management by a deep learning model according to the present invention.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.

Example 1

Referring to fig. 1, the invention discloses an intelligent management and control method for electric power engineering based on big data, which comprises the following steps:

the multi-source heterogeneous data in the monitoring area are collected, each multi-source heterogeneous data is compared with the historical multi-source heterogeneous data, and the multi-source heterogeneous data are divided into suspicious data and non-suspicious data according to the comparison result;

Example 2

On the basis of the embodiment 1, various types of multi-source heterogeneous data are collected from the monitoring area;

representing the characteristics of data among different independent sources as vectors, carrying out standardization processing on each characteristic rule, and setting a vector score of each rule, wherein each characteristic rule is respectively provided with a corresponding score;

specifically, please refer to fig. 2:

The historical data corresponding to x _n is expressed as a characteristic vector y _n,y_n which represents the characteristic vector of the multi-source heterogeneous data collected in a historical manner;

wherein, the Euclidean distance formula of calculating is:

Wherein d (x, y) is the difference degree value of the historical data and the current data;

comparing the difference degree value between the data characteristic vector between different independent sources and the characteristic vector of the corresponding historical data with a difference degree threshold value;

If the difference degree value between the data feature vectors of the independent sources and the feature vectors of the corresponding historical data is larger than or equal to the difference degree threshold value, the multi-source heterogeneous data is marked as suspicious data;

If the difference degree value between the data feature vectors of the independent sources and the feature vectors of the corresponding historical data is smaller than the difference degree threshold value, marking the multi-source heterogeneous data as non-suspicious data;

it should be noted that the larger the difference degree value between the data feature vector between different independent sources and the feature vector of the corresponding historical data, the larger the difference between the data and the historical data between different independent sources, and the higher the data suspicious degree.

Example 3

Referring to fig. 3, on the basis of embodiment 2, when the multi-source heterogeneous data is non-suspicious data, performing a fitness analysis on each multi-source heterogeneous data, and evaluating the overall fitness of the multi-source heterogeneous data and the deep learning model specifically includes:

the conditional probability calculation formula is as follows:

Wherein, the probability distribution calculation formula is:

the specific calculation formula is as follows:

wherein q is the number of data samples, and D _z is the fitness deviation value between the multi-source heterogeneous data and the deep learning model;

The method is characterized in that if the degree of fit deviation between the multi-source heterogeneous data and the deep learning model is smaller, the degree of fit between the multi-source heterogeneous data and the deep learning model is higher, and if the degree of fit deviation between the multi-source heterogeneous data and the deep learning model is larger, the degree of fit between the multi-source heterogeneous data and the deep learning model is lower.

When the multi-source heterogeneous data are non-suspicious data, performing coverage degree analysis on each multi-source heterogeneous data to obtain coverage fluctuation indexes in the non-suspicious data updating process, and evaluating the overall coverage of the multi-source heterogeneous data, wherein the coverage fluctuation indexes in the non-suspicious data updating process are obtained by the following steps:

Dividing the collected time series data into a series of fixed length time windows, dividing the data of one day into 24 hour time windows, each window representing one hour of data;

the coverage rate of the data source is calculated by the following formula:

wherein, the calculation formula of the coverage fluctuation index is:

Where V is the cover fluctuation index, N is the number of time windows, n=1, 2. R _N is the coverage of the N time window, and R _y is the average of all time window coverage;

It should be noted that the lower the coverage fluctuation index is, the more stable the data coverage is, and the higher the coverage fluctuation index is, the larger the fluctuation exists in the data coverage, and the fluctuation index can help us evaluate the overall coverage and stability of the data, so as to better support the operation of the power engineering management and control system.

Comprehensively analyzing the overall fit degree of the multi-source heterogeneous data and the deep learning model and the overall coverage of the multi-source heterogeneous data, calculating a data accuracy coefficient, and evaluating the accuracy of the multi-source heterogeneous data;

the data accuracy coefficient acquisition process comprises the following steps:

the calculation formula of the accuracy coefficient of the multi-source heterogeneous data is as follows:

Wherein SA is an accuracy coefficient of multi-source heterogeneous data, a ₁ and a ₂ are preset proportional coefficients, and a ₁ and a ₂ are both larger than 0;

It should be noted that, the overall fitness deviation value D _z of the multi-source heterogeneous data and the deep learning model is in a direct proportion relationship with the coverage fluctuation index V and the accuracy coefficient SA of the multi-source heterogeneous data, and when the overall fitness deviation value D _z of the multi-source heterogeneous data and the deep learning model is larger, the accuracy coefficient SA of the multi-source heterogeneous data is larger and the accuracy of the data is higher.

Example 5

Based on embodiment 4, comparing the accuracy coefficient SA of the multi-source heterogeneous data with a preset accuracy coefficient threshold of the multi-source heterogeneous data;

if the accuracy coefficient SA of the multi-source heterogeneous data is larger than or equal to the accuracy coefficient threshold of the preset multi-source heterogeneous data, the multi-source heterogeneous data is accurate and recorded as accurate data;

If the accuracy coefficient SA of the multi-source heterogeneous data is smaller than the accuracy coefficient threshold of the preset multi-source heterogeneous data, the multi-source heterogeneous data is inaccurate, and the inaccurate data is marked;

and integrating accurate data of the multi-source heterogeneous data into a deep learning model so as to facilitate the subsequent prediction of engineering management and control.

Referring to fig. 4, based on inaccurate data of multi-source heterogeneous data, a risk of engineering management and control is calculated by a deep learning model, and a specific calculation process is as follows:

For each inaccurate data L _j, the deviation from the model predictions is calculated. Quantifying this deviation using a loss function S, where j represents the amount of inaccurate data and j is a positive integer greater than 0;

wherein, the loss function expression is:

e_j＝S(L_j,Ly_j);

The overall risk calculation formula is:

Where w _j is the weight factor for each inaccurate data, e _j is the predicted bias value for the j-th inaccurate data, and R is the overall predicted risk score;

comparing the overall prediction risk score R with a preset threshold value;

If the overall prediction risk score R is larger than or equal to a preset threshold value, the fact that inaccurate data has high risk for overall prediction of deep learning, and the overall prediction of the deep learning is unstable is indicated;

If the overall prediction risk score R is smaller than a preset threshold value, the fact that inaccurate data has low risk for deep learning overall prediction does not lead to unstable deep learning overall prediction.

It should be noted that the deep learning model is further optimized or corresponding remedial measures are taken by calculating the risk brought by inaccurate data.

Example 6

An intelligent management and control system for electric power engineering based on big data, comprising:

the multi-source heterogeneous data acquisition module is used for acquiring multi-source heterogeneous data in the monitoring area;

The multi-source heterogeneous data comparison module is used for comparing each multi-source heterogeneous data with the historical multi-source heterogeneous data and dividing the multi-source heterogeneous data into suspicious data and non-suspicious data according to the comparison result;

The fitness analysis module is used for carrying out fitness analysis on each multi-source heterogeneous data based on the non-suspicious data and evaluating the overall fitness of the multi-source heterogeneous data and the deep learning model;

The coverage analysis module is used for carrying out coverage degree analysis on each multi-source heterogeneous data based on the non-suspicious data and evaluating the overall coverage of the multi-source heterogeneous data;

and the risk prediction module is used for calculating the risk of engineering management and control by the deep learning model based on inaccurate data in the multi-source heterogeneous data.

The technical scheme comprises a multi-source heterogeneous data acquisition module, a multi-source heterogeneous data comparison module, a fit analysis module, a coverage analysis module, an accuracy prediction module and a risk prediction module, wherein the multi-source heterogeneous data acquisition module is used for acquiring multi-source heterogeneous data from a monitoring area, the multi-source heterogeneous data comparison module is used for judging the suspicious degree of data by calculating the Euclidean distance between new data and historical data and dividing the data into suspicious data and non-suspicious data, the fit analysis module is used for carrying out fit analysis on the non-suspicious data, evaluating the integral fit of the non-suspicious data with a deep learning model and calculating conditional probability through a Bayesian network model, the coverage analysis module is used for evaluating the coverage rate of the non-suspicious data in different time windows and calculating a coverage fluctuation index, the accuracy division module is used for integrating fit deviation values and coverage fluctuation indexes to calculate the accuracy coefficient of the data and dividing the data into accurate data and inaccurate data according to a preset threshold, and the risk prediction module is used for calculating the risk of project management by summarizing the depth learning model and predicting the overall risk prediction deviation and calculating the overall risk prediction score to evaluate the stability prediction model. According to the technical scheme, through the multi-step data processing and evaluation flow of the system, the data quality of the power engineering management and control system is effectively improved, and the accuracy and stability of prediction are enhanced, so that the power engineering management and control and decision making are better supported.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B, and may mean that a exists alone, while a and B exist alone, and B exists alone, wherein a and B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims

1. A method for intelligent management and control of electric power engineering based on big data, characterized by comprising:

Collect multi-source heterogeneous data within the control area, compare each multi-source heterogeneous data with historical multi-source heterogeneous data, and classify the multi-source heterogeneous data into suspicious data and non-suspicious data based on the comparison results;

When the multi-source heterogeneous data is non-suspicious data, a fit analysis is performed on each multi-source heterogeneous data to evaluate the overall fit between the multi-source heterogeneous data and the deep learning model;

When the multi-source heterogeneous data is non-suspicious data, the coverage degree analysis is performed on each multi-source heterogeneous data to evaluate the overall coverage of the multi-source heterogeneous data;

Comprehensively analyze the overall fit between multi-source heterogeneous data and deep learning models and the overall coverage of multi-source heterogeneous data to evaluate the accuracy of multi-source heterogeneous data;

Based on the evaluation results, the multi-source heterogeneous data are divided into accurate data and inaccurate data, and the accurate data are integrated into the deep learning model;

Based on the inaccurate data in multi-source heterogeneous data, the overall management and control risk score of the power project is predicted through a deep learning model.

2. According to the method of intelligent management and control of electric power engineering based on big data in claim 1, it is characterized in that the step of comparing each multi-source heterogeneous data with the historical multi-source heterogeneous data specifically comprises:

The data feature vectors between different independent sources and the feature vectors of corresponding historical data are regarded as a set of data for comparison;

The new data collected at each time point is represented as a feature vector x _n , where n is the number of features and x _n represents the feature vector of the currently collected multi-source heterogeneous data;

Among them, the historical data corresponding to _xn is expressed as a feature vector _yn , and _yn represents the feature vector of multi-source heterogeneous data collected historically;

Use the Euclidean distance calculation formula to calculate the difference between feature vectors;

The Euclidean distance calculation formula is:

Among them, d(x, y) is the difference between historical data and current data.

3. According to the method of intelligent management and control of electric power engineering based on big data in claim 2, it is characterized in that the step of dividing the multi-source heterogeneous data into suspicious data and non-suspicious data according to the comparison result is as follows:

It is determined whether the difference between the data feature vectors of the independent sources and the feature vectors of the corresponding historical data is greater than or equal to the difference threshold. If so, it is recorded as suspicious data; if not, it is recorded as non-suspicious data.

4. According to the method of intelligent control of electric power engineering based on big data in claim 3, it is characterized in that when the multi-source heterogeneous data is non-suspicious data, the fit analysis is performed on each multi-source heterogeneous data, and the overall fit evaluation of the multi-source heterogeneous data and the deep learning model specifically includes:

Select features that can reflect the relationship between data and deep learning models from multi-source heterogeneous data as nodes of the Bayesian network model;

For each node Ys, calculate its conditional probability using the conditional probability calculation formula;

Wherein, the conditional probability calculation formula is:

Where P(Yp, Xp) is the joint probability distribution of node Yp and all its parent nodes Xp, P(Xp) is the marginal probability distribution of parent node Xp, and P(Ys|Xs) is the conditional probability of each node Ys;

For the target variable Xs under the overall goal of the deep learning model, calculate the probability distribution;

The probability distribution calculation formula is:

Where P(Xs|Ys) is the probability distribution of the target variable Xs under the overall goal of the deep learning model;

By calculating the conditional probability P(Ys|Xs) and the probability distribution P(Xs|Ys) of the target variable, the deviation value of the overall fit between multi-source heterogeneous data and the deep learning model is calculated;

The calculation formula of the fit deviation value is:

Among them, q is the number of data samples, and _Dz is the deviation value of the fit between multi-source heterogeneous data and the deep learning model.

5. According to the method of intelligent control of electric power engineering based on big data in claim 4, it is characterized in that when the multi-source heterogeneous data is non-suspicious data, the coverage degree analysis is performed on each multi-source heterogeneous data, and the evaluation of the overall coverage of the multi-source heterogeneous data specifically includes:

Divide the collected time series data into a series of fixed-length time windows, each of which represents one hour of data;

The coverage rate of the data source is calculated in each time window. The coverage rate is defined as the area or time proportion actually covered by the data source in the time window.

The calculation formula for the coverage of the data source is:

Where R(t) represents the coverage rate in time window t, A(t) is the area or time actually covered in time window t, and A _total is the total area or time;

Based on the coverage rate in each time window, the coverage fluctuation index is calculated. The coverage fluctuation index reflects the changes in coverage rate between different time windows.

The calculation formula of the coverage volatility index is:

Where V is the coverage fluctuation index, N is the number of time windows, _RN is the coverage rate of the Nth time window, and _Ry is the average coverage rate of all time windows.

6. According to the method of intelligent control of electric power engineering based on big data in claim 5, it is characterized in that the overall fit between multi-source heterogeneous data and deep learning model and the overall coverage of multi-source heterogeneous data are comprehensively analyzed to evaluate the accuracy of multi-source heterogeneous data, which specifically includes:

The overall fit deviation value _Dz and the coverage fluctuation index V of the multi-source heterogeneous data and the deep learning model are normalized, and the accuracy coefficient of the multi-source heterogeneous data is calculated by the overall fit deviation value _Dz and the coverage fluctuation index V of the multi-source heterogeneous data and the deep learning model after normalization;

The accuracy coefficient calculation formula for multi-source heterogeneous data is as follows:

Wherein, SA is the accuracy coefficient of multi-source heterogeneous data, _a1 and _a2 are preset proportional coefficients, and _a1 and _a2 are both greater than 0.

7. The method for intelligent management and control of electric power engineering based on big data according to claim 6 is characterized in that the step of dividing the multi-source heterogeneous data into accurate data and inaccurate data according to the evaluation results specifically comprises:

It is determined whether the accuracy coefficient SA of the multi-source heterogeneous data is greater than or equal to a preset accuracy coefficient threshold of the multi-source heterogeneous data. If so, it is recorded as accurate data; if not, it is recorded as inaccurate data.

8. According to the method of intelligent control of electric power engineering based on big data in claim 7, it is characterized in that the prediction of the overall control risk score of the electric power engineering through the deep learning model based on the inaccurate data in the multi-source heterogeneous data specifically includes:

For each inaccurate data L _j , calculate the deviation between it and the model prediction result, and use the loss function S to quantify this deviation, where j represents the number of inaccurate data and j is a positive integer greater than 0;

Wherein, the loss function expression is:

e _j =S(L _j ,Ly _j );

In the formula, _Lj represents the actual observed value, _Lyj represents the predicted value of the deep learning model, and _ej is the prediction deviation;

The prediction deviations of all inaccurate data are summed up to calculate the overall prediction risk score;

The overall risk calculation formula is:

Where _wj is the weight factor of each inaccurate data, _ej is the prediction deviation value of the jth inaccurate data, and R is the overall prediction risk score.

9. An intelligent management and control system for electric power engineering based on big data, characterized in that it is used to implement the intelligent management and control method for electric power engineering based on big data as described in any one of claims 1 to 8, specifically comprising:

A multi-source heterogeneous data acquisition module, which is used to collect multi-source heterogeneous data in the control area;

A multi-source heterogeneous data comparison module, which is used to compare each multi-source heterogeneous data with the historical multi-source heterogeneous data, and classify the multi-source heterogeneous data into suspicious data and non-suspicious data according to the comparison result;

A fit analysis module, which performs fit analysis on each multi-source heterogeneous data based on non-suspicious data to evaluate the overall fit between the multi-source heterogeneous data and the deep learning model;

A coverage analysis module, which performs coverage analysis on each multi-source heterogeneous data based on non-suspicious data to evaluate the overall coverage of the multi-source heterogeneous data;

An accuracy division module, which divides the multi-source heterogeneous data into accurate data and inaccurate data according to the evaluation results, and integrates the accurate data into the deep learning model;

A risk prediction module predicts the overall management and control risk score of the power project through a deep learning model based on inaccurate data in multi-source heterogeneous data.