CN118536009A

CN118536009A - Power data model construction method and system based on generative artificial intelligence

Info

Publication number: CN118536009A
Application number: CN202410996320.0A
Authority: CN
Inventors: 徐文峰; 吴颖波; 陈莉娟; 夏勇军; 叶倩文; 阮羚; 郑立; 李明鹏; 周济; 陈瑞俊; 张庆敏; 王紫雯; 陆浩; 毛亚飞
Original assignee: Hubei Central China Technology Development Of Electric Power Co ltd
Current assignee: Hubei Central China Technology Development Of Electric Power Co ltd
Priority date: 2024-07-24
Filing date: 2024-07-24
Publication date: 2024-08-23
Anticipated expiration: 2044-07-24
Also published as: CN118536009B

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a method and a system for constructing an electric power data model based on generated artificial intelligence. The method comprises the following steps: collecting power load data; determining the abnormal probability of each data point in each period, and clustering to obtain a plurality of clusters; judging whether the cluster is an abnormal cluster according to the abnormal probability of the data points in the cluster; calculating a suitable degree coefficient; determining a target input length through the minimum value of the appropriate degree coefficient; and processing the power load data through a power data processing model to replace the abnormal data. According to the invention, the input data length of the data which is most suitable for the processing of the model is obtained through the fluctuation degree of the data and the length of the abnormal data which is identified through the abnormal algorithm, so that the model complements the input data (namely, the abnormal data is replaced by the generated data), and the obtained output data is more accurate.

Description

Power data model construction method and system based on generative artificial intelligence

技术领域Technical Field

本发明涉及人工智能技术领域。更具体地，本发明涉及基于生成式人工智能的电力数据模型构建方法及系统。The present invention relates to the field of artificial intelligence technology. More specifically, the present invention relates to a method and system for constructing a power data model based on generative artificial intelligence.

背景技术Background Art

生成式人工智能是利用人工智能技术来生成内容，自然语言生成、图像生成、音频生成等。在自然语言生成方面，生成式人工智能可以用于文本摘要、机器翻译、对话系统、故事生成等。生成式人工智能的核心思想是训练模型来学习数据的分布，然后利用学习到的模型生成新的数据。例如，在自然语言生成中，可以训练一个模型来学习句子的结构和语法规则，然后利用该模型生成新的句子。在图像生成中，可以训练一个模型来学习图像的特征和样式，然后用该模型生成新的图像。Generative AI is the use of AI technology to generate content, such as natural language generation, image generation, audio generation, etc. In terms of natural language generation, generative AI can be used for text summarization, machine translation, dialogue systems, story generation, etc. The core idea of generative AI is to train a model to learn the distribution of data, and then use the learned model to generate new data. For example, in natural language generation, a model can be trained to learn the structure and grammatical rules of sentences, and then the model can be used to generate new sentences. In image generation, a model can be trained to learn the features and styles of images, and then the model can be used to generate new images.

在电力系统中，负荷数据的准确性与完整性对于电路系统的控制和维护至关重要，因此现有技术会通过生成式人工智能可以将缺失或异常的数据进行填充。在实际的应用中，生成式人工智能可以通过学习已有的数据分布和模式，然后利用这些模式来填充缺失的数据或修复异常的数据。例如，可以使用生成式对抗网络（GAN）来生成符合真实数据分布的新数据，从而填补缺失或异常的数据点。另一种常见的方法是使用序列到序列（seq2seq）模型或类似的循环神经网络（RNN）模型，根据已有的数据序列来预测缺失或异常数据点的取值。In the power system, the accuracy and completeness of load data are crucial to the control and maintenance of the circuit system. Therefore, existing technologies can fill in missing or abnormal data through generative artificial intelligence. In practical applications, generative artificial intelligence can learn the existing data distribution and patterns, and then use these patterns to fill in missing data or repair abnormal data. For example, a generative adversarial network (GAN) can be used to generate new data that conforms to the real data distribution, thereby filling in missing or abnormal data points. Another common method is to use a sequence-to-sequence (seq2seq) model or a similar recurrent neural network (RNN) model to predict the value of missing or abnormal data points based on the existing data sequence.

现有技术通常会识别出电荷数据中离群程度较大的数据点，并通过生成式人工智能将其替换。例如申请文件CN117150402A提出了基于生成式对抗网络的电力数据异常检测方法及模型，该发明通过生成器和鉴别器的异常检测模型来处理电力时间序列数据，并通过对抗训练，不断优化生成器和鉴别器的数据生成能力和异常鉴别能力，提高模型的异常检测能力。Existing technologies usually identify data points with a large degree of outliers in charge data and replace them through generative artificial intelligence. For example, application document CN117150402A proposes a power data anomaly detection method and model based on a generative adversarial network. The invention processes power time series data through an anomaly detection model of a generator and a discriminator, and continuously optimizes the data generation ability and anomaly identification ability of the generator and the discriminator through adversarial training, thereby improving the anomaly detection ability of the model.

但是，在向模型输入数据时，即便识别到的异常数据较为准确，仍然可能会因为输入数据中异常数据的占比较多，导致替换后的电荷数据的准确性较低。However, when inputting data into the model, even if the identified abnormal data is relatively accurate, the accuracy of the replaced charge data may still be low due to the large proportion of abnormal data in the input data.

发明内容Summary of the invention

本发明提供一种基于生成式人工智能的电力数据模型构建方法及系统，旨在解决相关技术中输入数据中异常数据较多，导致替换后的电荷数据的准确性较低的问题。The present invention provides a method and system for constructing an electric power data model based on generative artificial intelligence, aiming to solve the problem in the related art that there is a large amount of abnormal data in the input data, resulting in low accuracy of the replaced charge data.

在第一方面中，本发明提供了基于生成式人工智能的电力数据模型构建方法，该方法包括：采集用电设备在连续多个时段内的电力负荷数据；对于任一个时段，确定每个数据点的异常概率，并且根据所述异常概率进行聚类，得到预定数量的聚类簇；对于任一聚类簇，根据聚类簇中的数据点的异常概率的平均值判断所述聚类簇是否为异常聚类簇，其中，一个聚类簇中的数据点在时序上连续；计算合适程度系数P _x：P _x=exp(A _x+B _x)，其中，A _x与输入数据的长度x和各异常聚类簇的数据长度，B _x与输入长度x、数据点的值及其局部熵有关，exp()为以自然常数e底的指数函数；通过优化算法计算合适程度系数P _x的最小值以确定对应的输入长度x，记为目标输入长度x _A；利用预训练好的一个电力数据处理模型对所述电力负荷数据进行处理，以得到替换异常数据的电力负荷数据；其中，输入所述电力数据处理模型的数据长度为所述目标输入长度x _A，所述异常数据为电力负荷数据中属于异常聚类簇的数据点。In a first aspect, the present invention provides a method for constructing a power data model based on generative artificial intelligence, the method comprising: collecting power load data of power-consuming equipment in multiple consecutive time periods; for any time period, determining the abnormal probability of each data point, and clustering according to the abnormal probability to obtain a predetermined number of clusters; for any cluster, judging whether the cluster is an abnormal cluster according to the average value of _the abnormal probability of the data points in the cluster, wherein the data points in a cluster are continuous in time series; calculating a fitness coefficient Px : Px = exp( Ax + _Bx ₎ , wherein Ax _is related to the length x of the input data and the data length of each abnormal cluster, Bx is related to the input length _x , the value _of the data point and its local entropy, and exp() is an exponential function with a natural constant e as the base; calculating the minimum value of the fitness coefficient Px by an optimization algorithm to determine the corresponding input length _x , recorded as the target input length xA ; processing the power load data using a pre-trained power data processing model to obtain power load data that replaces _the abnormal data; wherein the data length input to the power data processing model is the target input length xA _. , the abnormal data are data points belonging to abnormal clusters in the power load data.

本发明的有益效果为：The beneficial effects of the present invention are:

发明人发现，在通过电力数据模型对电力负荷数据中的异常数据进行替换处理时，输入到电力数据处理模型的数据长度会影响到对异常数据进行替换后，得到的数据的准确性。例如，若输入到电力数据处理模型的数据长度较短，则较少的异常数据会对模型处理得到的的数据的准确性造成较大的影响；若输入到电力数据处理模型的数据长度较长，则对异常数据进行平滑处理的力度可能较小，因此发明人提出通过数据的本身的波动程度和通过异常算法识别到的异常数据的长度得到最适合模型所处理的数据的输入数据长度，使模型对输入数据进行补全（即把异常数据替换为模型生成数据），得到的输出数据更准确。The inventors found that when replacing abnormal data in power load data through the power data model, the length of data input to the power data processing model will affect the accuracy of the data obtained after replacing the abnormal data. For example, if the length of data input to the power data processing model is short, then the less abnormal data will have a greater impact on the accuracy of the data obtained by model processing; if the length of data input to the power data processing model is long, the strength of smoothing the abnormal data may be small. Therefore, the inventors proposed to obtain the input data length that is most suitable for the data processed by the model through the fluctuation degree of the data itself and the length of the abnormal data identified by the abnormal algorithm, so that the model can complete the input data (that is, replace the abnormal data with the model-generated data), and the output data obtained is more accurate.

在一个实施例中，计算A _x的公式为：，其中，u _i为一个时段中的第i个异常聚类簇中所有数据的数据长度，M为一个时段中异常聚类簇的总数量，u _T为预定大小的数据长度。In one embodiment, the formula for calculating A _x is: , where ui _is the data length of all data in the i -th abnormal cluster in a period, M is the total number of abnormal clusters in a period, and uT _is the data length of a predetermined size.

在一个实施例中，计算B _x的公式包括：对于任一个时段，计算的电力负荷数据中的第j个数据点对应的电力负荷值的方差σ ² _j，并对第j个数据点的方差进行归一化处理，得到各时段的第j个数据点的权重系数α _j；获得各时段的第j个数据点的局部熵S _j；对于任一数段，在所述时段中以采集数据点的时间顺序依次提取多组数据长度为所述模型输入数据的长度x的计算数据集；对于任一计算数据集，计算所述计算数据集中各数据点对应的权重系数与局部熵的乘积之和，得到所述计算数据集的波动程度值；所有计算数据集中的波动程度值的最大值为B _x。In one embodiment, the formula for calculating B _x includes: for any time period, calculating the variance σ ² _j of the power load value corresponding to the j -th data point in the power load data, and normalizing the variance of the j -th data point to obtain the weight coefficient α _j of the j -th data point in each time period; obtaining the local entropy S _j of the j -th data point in each time period; for any number segment, extracting multiple groups of calculation data sets with a data length of the length x of the model input data in the time sequence of collecting data points in the time period; for any calculation data set, calculating the sum of the products of the weight coefficients corresponding to each data point in the calculation data set and the local entropy to obtain the fluctuation degree value of the calculation data set; the maximum value of the fluctuation degree values in all calculation data sets is B _x .

在一个实施例中，所述对于所有时段，统计每个时段的第j个数据点是否异常，从而确定第j个数据点为异常数据的概率包括：获得每个时段的第j个数据点为异常的数据的个数n _j；获得第j个数据点为异常数据的概率：p _j=n _j/N，其中，p _j为每个时段的第j个数据点为异常数据的概率，N为所述时段数量。In one embodiment, for all time periods, counting whether the j -th data point in each time period is abnormal, so as to determine the probability that the j - th data point is abnormal data, includes: obtaining the number n _j of abnormal data points in each time period; obtaining the probability that the j -th data point is abnormal data: p _j = n _j / N , wherein p _j is the probability that the j -th data point in each time period is abnormal data, and N is the number of time periods.

有益效果为：发明人发现以各时段的电力负荷数据，例如，一整天所采集的电力负荷数据，与其他时段的电力负荷数据之间的变化幅度有相似性，异常数据会集中出现在一个时段中的相同的时间点，例如，每天的13点至14点，采集的电力负荷数据较大概率发生异常。因此，本发明采集多个时间段，统计各时间段中相同时间点所采集数据为异常的概率的占比为各时间段在该时间点发生故障的概率。基于此，通过比较多个时间段的数据，可以更准确地识别特定时间点的异常情况，避免单一时间段数据的局限性，提高异常概率的准确性和可靠性。Beneficial effects are as follows: the inventors have found that the power load data of each time period, for example, the power load data collected throughout the day, has similar changes in amplitude with the power load data of other time periods, and abnormal data will appear at the same time point in a time period. For example, from 13:00 to 14:00 every day, the power load data collected has a greater probability of being abnormal. Therefore, the present invention collects multiple time periods, and the proportion of the probability of the data collected at the same time point in each time period being abnormal is calculated as the probability of a failure in each time period at that time point. Based on this, by comparing the data of multiple time periods, it is possible to more accurately identify abnormal situations at specific time points, avoid the limitations of data in a single time period, and improve the accuracy and reliability of abnormal probability.

在一个实施例中，所述根据聚类簇中的数据点的异常概率的平均值判断所述聚类簇是否为异常聚类簇包括：所述对于任一聚类簇，计算聚类簇中的每个数据点为异常数据的概率的平均值；响应于所述平均值大于阈值，确定所述聚类簇为异常聚类簇；响应于所述平均值大于阈值，确定所述聚类簇不为异常聚类簇。In one embodiment, judging whether the cluster is an abnormal cluster according to the average value of the abnormal probabilities of the data points in the cluster includes: for any cluster, calculating the average value of the probability that each data point in the cluster is abnormal data; in response to the average value being greater than a threshold, determining that the cluster is an abnormal cluster; in response to the average value being greater than a threshold, determining that the cluster is not an abnormal cluster.

有益效果为：以聚类簇中所有数据点的平均值为该聚类簇为异常聚类簇的概率，通过综合考虑整个聚类簇的异常程度，更全面地评估了聚类簇的异常程度。再者，通过使用平均值作为判断指标，可以减少单个异常数据点对聚类簇异常判断的影响，从而降低了误判的风险，提高了判断的准确性The beneficial effect is: the average value of all data points in the cluster is taken as the probability that the cluster is an abnormal cluster. By comprehensively considering the abnormality of the entire cluster, the abnormality of the cluster is more comprehensively evaluated. Furthermore, by using the average value as the judgment indicator, the influence of a single abnormal data point on the abnormal judgment of the cluster can be reduced, thereby reducing the risk of misjudgment and improving the accuracy of judgment.

在一个实施例中，所述利用预训练好的一个电力数据处理模型对所述电力负荷数据进行处理包括：构建长度为所述目标输入长度x _A的窗口；对于任一异常聚类簇：响应于所述窗口的长度大于异常聚类簇的长度，使窗口覆盖所述异常聚类簇以进行处理；响应于所述窗口的长度小于异常聚类簇的长度，使所述窗口遍历所述异常聚类簇，首次遍历时，使所述窗口的一部分覆盖所述异常聚类簇的首端；在遍历过程中，滑动所述窗口。In one embodiment, the processing of the power load data using a pre-trained power data processing model includes: constructing a window with a length equal to the target input length x _A ; for any abnormal cluster: in response to the length of the window being greater than the length of the abnormal cluster, causing the window to cover the abnormal cluster for processing; in response to the length of the window being less than the length of the abnormal cluster, causing the window to traverse the abnormal cluster, and during the first traversal, causing a portion of the window to cover the head end of the abnormal cluster; during the traversal process, sliding the window.

有益效果为：对数据长度较小的异常聚类簇，窗口可以直接覆盖，使得其中包括有正常数据和异常数据，并将异常数据通过电力数据处理模型替换，通过使窗口覆盖了非异常数据（正常数据）确保替换后的数据能够更好地反映实际情况，使得模型输出的数据更准确。而对于数据长度较大的异常聚类簇，窗口不能完全覆盖，此时从窗口部分覆盖所述异常聚类簇的端开始遍历，历遍过程中滑动窗口，通过局部化处理使得窗口内的数据更好地保留原有的特征，从而保持数据的完整性和连续性。The beneficial effect is that for abnormal clusters with smaller data length, the window can directly cover them, so that they include normal data and abnormal data, and the abnormal data is replaced by the power data processing model. By making the window cover the non-abnormal data (normal data), it is ensured that the replaced data can better reflect the actual situation, making the data output by the model more accurate. For abnormal clusters with larger data length, the window cannot be completely covered. At this time, the traversal starts from the end where the window partially covers the abnormal cluster, and the window is slid during the traversal process. Through localized processing, the data in the window better retains the original characteristics, thereby maintaining the integrity and continuity of the data.

在一个实施例中，所述异常检测算法为随机森林异常检测算法。In one embodiment, the anomaly detection algorithm is a random forest anomaly detection algorithm.

有益效果为：本发明的电力负荷数据为单维度数据（电流数据、功率数据等）且各数据点之间存在时序关系，适配于孤立森林异常检测算法。原因为：构建随机树时可以考虑数据点之间的时序关系，从而利用时序信息来更有效地判断异常点。The beneficial effect is that the power load data of the present invention is single-dimensional data (current data, power data, etc.) and there is a time series relationship between each data point, which is suitable for the isolation forest anomaly detection algorithm. The reason is that the time series relationship between data points can be considered when constructing a random tree, so that the time series information can be used to more effectively judge the abnormal points.

在一个实施例中，所述优化算法为PSO优化算法。In one embodiment, the optimization algorithm is a PSO optimization algorithm.

在第二方面中，本发明提供了基于生成式人工智能的电力数据模型构建系统，其包括处理器和存储器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序以实现如上述发明内容中任一项所述的基于生成式人工智能的电力数据模型构建方法。In a second aspect, the present invention provides a power data model construction system based on generative artificial intelligence, which includes a processor and a memory, the memory stores a computer program, and the processor executes the computer program to implement a power data model construction method based on generative artificial intelligence as described in any one of the above invention contents.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过参考附图阅读下文的详细描述，本发明示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中，以示例性而非限制性的方式示出了本发明的若干实施方式，并且相同或对应的标号表示相同或对应的部分，其中：By reading the following detailed description with reference to the accompanying drawings, the above and other objects, features and advantages of the exemplary embodiments of the present invention will become readily understood. In the accompanying drawings, several embodiments of the present invention are shown in an exemplary and non-limiting manner, and the same or corresponding reference numerals represent the same or corresponding parts, wherein:

图1是示意性示出根据本发明的实施例的基于生成式人工智能的电力数据模型构建方法的步骤流程图；FIG1 is a flowchart schematically illustrating the steps of a method for constructing a power data model based on generative artificial intelligence according to an embodiment of the present invention;

图2是示意性示出根据本发明的实施例的步骤S4的流程图；FIG2 is a flow chart schematically illustrating step S4 according to an embodiment of the present invention;

图3是示意性示出根据本发明的实施例的基于生成式人工智能的电力数据模型构建系统的结构框图。FIG3 is a block diagram schematically showing a structure of a power data model building system based on generative artificial intelligence according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of the present invention.

下面结合附图来详细描述本发明的具体实施方式。The specific embodiments of the present invention are described in detail below with reference to the accompanying drawings.

图1是示意性示出根据本实施例中的基于生成式人工智能的电力数据模型构建方法的步骤流程图。FIG1 is a flowchart schematically illustrating the steps of a method for constructing a power data model based on generative artificial intelligence according to this embodiment.

如图1所示，基于生成式人工智能的电力数据模型构建方法包括步骤S1至步骤S4。As shown in FIG1 , the method for constructing a power data model based on generative artificial intelligence includes steps S1 to S4.

步骤S1：采集用电设备在连续多个时段内的电力负荷数据。Step S1: Collecting power load data of electrical equipment in multiple consecutive time periods.

其中，所述电力负荷数据为单维度的数据，电力负荷数据为电流数据、电压数据或频率数据。电力负荷数据对应有一个电力负荷值，电力负荷值为用电设备的电流值、电压值或频率大小。一个时段的长度等于用电设备的一个工作周期。The power load data is single-dimensional data, which is current data, voltage data or frequency data. The power load data corresponds to a power load value, which is the current value, voltage value or frequency of the power equipment. The length of a time period is equal to a working cycle of the power equipment.

在一个实施例中，工业用电设备的工作时间是每天的10点至16点，则一个时段为6小时（一天10点至16点），连续多个时段内电力负荷数据为连续多天的10点至16点采集的工业用电设备的电流值。电力负荷数据还可以是工业用电设备的功率大小。In one embodiment, the working hours of industrial electrical equipment are from 10:00 to 16:00 every day, so one period is 6 hours (from 10:00 to 16:00 a day), and the power load data in multiple consecutive periods are the current values of the industrial electrical equipment collected from 10:00 to 16:00 for multiple consecutive days. The power load data can also be the power size of the industrial electrical equipment.

在另一个实施例中，家庭用电设备，例如冰箱，其工作时间是每天的0点至24点（全天工作），则一个时段为24小时（一天10点至16点），连续多个时段内电力负荷数据为连续多天的不间断采集得到的家庭用电设备的电流值。电力负荷数据还可以是家庭用电设备的功率大小。In another embodiment, a household electrical appliance, such as a refrigerator, works from 0:00 to 24:00 every day (working all day), and a period is 24 hours (10:00 to 16:00 a day). The power load data in multiple consecutive periods are the current values of the household electrical appliances collected continuously for multiple consecutive days. The power load data can also be the power size of the household electrical appliances.

步骤S2：对于任一个时段，确定每个数据点的异常概率，并且根据所述异常概率进行聚类，得到预定数量的聚类簇；对于任一聚类簇，根据聚类簇中的数据点的异常概率的平均值判断所述聚类簇是否为异常聚类簇。Step S2: for any time period, determine the abnormal probability of each data point, and cluster according to the abnormal probability to obtain a predetermined number of clusters; for any cluster, determine whether the cluster is an abnormal cluster according to the average value of the abnormal probability of the data points in the cluster.

在一个实施例中，对数据点聚类的算法为DBSCAN聚类算法。需要说明的是，本实施例通过DBSCAN聚类算法进行分类时，其中类别半径为0.05，邻域最小数目为5，并在分类时对单个周期内所有时间点对应的时间值和每个数据点的异常概率的值分别进行归一化处理，以消除量纲的干扰。并且，通过DBSCAN聚类算法得到的聚类簇中的数据点在时序上连续，例如，一个聚类簇中包括一个时段上第10个至第20个采集的数据点。In one embodiment, the algorithm for clustering data points is the DBSCAN clustering algorithm. It should be noted that when the DBSCAN clustering algorithm is used for classification in this embodiment, the category radius is 0.05, the minimum number of neighbors is 5, and the time values corresponding to all time points in a single cycle and the values of the abnormal probability of each data point are normalized during classification to eliminate dimensional interference. In addition, the data points in the clusters obtained by the DBSCAN clustering algorithm are continuous in time series. For example, a cluster includes the 10th to 20th data points collected in a time period.

在一个实施例中，确定每个数据点的异常概率的过程为：获得每个时段的第j个数据点为异常的数据的个数n _j，其中，通过随机森林算法确定第j个数据点是否为异常。获得第j个数据点为异常数据的概率：p _j=n _j/N，其中，p _j为每个时段的第j个数据点为异常数据的概率，N为所述时段数量。In one embodiment, the process of determining the abnormal probability of each data point is as follows: obtaining the _{number nj} of abnormal data points in each time period, wherein the random forest algorithm _{is used to determine whether the jth data point is abnormal. Obtaining the probability that the jth data point is abnormal data: pj = nj} / N , wherein pj is _the probability that the jth data point in each time period is abnormal data, _and N is the number of time periods.

再者，根据聚类簇中的数据点的异常概率的平均值判断所述聚类簇是否为异常聚类簇的过程为：所述对于任一聚类簇，计算聚类簇中的每个数据点为异常数据的概率的平均值；响应于所述平均值大于阈值，确定所述聚类簇为异常聚类簇；响应于所述平均值大于阈值，确定所述聚类簇不为异常聚类簇。在本实施例中，阈值为0.3。Furthermore, the process of judging whether a cluster is an abnormal cluster according to the average value of the abnormal probability of the data points in the cluster is as follows: for any cluster, the average value of the probability that each data point in the cluster is abnormal data is calculated; in response to the average value being greater than a threshold, the cluster is determined to be an abnormal cluster; in response to the average value being greater than a threshold, the cluster is determined not to be an abnormal cluster. In this embodiment, the threshold is 0.3.

步骤S3：计算合适程度系数P _x：P _x=exp(A _x+B _x)；通过优化算法计算合适程度系数P _x的最小值以确定对应的输入长度x，记为目标输入长度x _A。Step S3: Calculate the fitness coefficient P _x : P _x =exp( A _x + B _x ); calculate the minimum value of the fitness coefficient P _x through an optimization algorithm to determine the corresponding input length x, recorded as the target input length x _A .

其中，A _x与输入数据的长度x和各异常聚类簇的数据长度，B _x与输入长度x、数据点的值及其局部熵有关，exp()为以自然常数e底的指数函数。合适程度系数P _x与A _x正相关，合适程度系数P _x与B _x正相关。Among them, A _x is related to the length x of the input data and the data length of each abnormal cluster, B _x is related to the input length x , the value of the data point and its local entropy, and exp() is an exponential function with the natural constant e as the base. The fitness coefficient P _x is positively correlated with A _x , and the fitness coefficient P _x is positively correlated with B _x .

在一个实施例中，计算A _x的公式为：，u _i为一个时段中的第i个异常聚类簇中所有数据的数据长度，M为一个时段中异常聚类簇的总数量，u _T为预定大小的数据长度。其中，A _x与各异常聚类簇中所有数据的数据长度的平方正相关。当A _x越小，说明输入数据中异常聚类簇中的数据占比越低，对输入数据进行补充得到的输出数据的准确性越高。In one embodiment, the formula for calculating A _x is: , ui is the data length of _all data in the ith abnormal cluster in a period, M is the total number of abnormal clusters in a period, and uT is the data length of a predetermined size. Among them, _Ax _is positively correlated with the square of the data length of all data in each abnormal cluster. When Ax _is smaller, it means that the proportion of data in the abnormal cluster in the input data is lower, and the accuracy of the output data obtained by supplementing the input data is higher.

再者，计算B _x的过程为：计算的电力负荷数据中的所有时段的第j个数据点对应的电力负荷值的方差σ ² _j，并对第j个数据点的方差进行归一化处理，得到各时段的第j个数据点的权重系数α _j；获得各时段的第j个数据点的局部熵S _j；对于任一数段，在所述时段中以采集数据点的时间顺序依次提取多组数据长度为所述模型输入数据的长度x的计算数据集；对于任一计算数据集，计算所述计算数据集中各数据点对应的权重系数与局部熵的乘积之和，得到所述计算数据集的波动程度值；所有计算数据集中的波动程度值的最大值为B _x。其中，B _x越小，说明一个时段内所采集的数据中，输入到模型中的输入数据的波动越小（数据点对应的方差以及局部熵的乘积较小），此时模型输出的输出数据的准确性较高。Furthermore, the process of calculating B _x is as follows: the variance σ ² _j of the power load value corresponding to the j -th data point in all time periods of the power load data is calculated, and the variance of the j -th data point is normalized to obtain the weight coefficient α _j of the j -th data point in each time period; the local entropy S _j of the j -th data point in each time period is obtained; for any number of segments, multiple groups of calculation data sets with a data length of the length x of the model input data are sequentially extracted in the time sequence of the data points collected in the time period; for any calculation data set, the sum of the product of the weight coefficient corresponding to each data point in the calculation data set and the local entropy is calculated to obtain the fluctuation degree value of the calculation data set; the maximum value of the fluctuation degree values in all calculation data sets is B _x . Among them, the smaller B _x is, the smaller the fluctuation of the input data input into the model in the data collected in a time period (the product of the variance corresponding to the data point and the local entropy is small), and the accuracy of the output data output by the model is high.

其中，各时段的第j个数据点的局部熵S _j与以第j个数据点为中心，选取时间长度为t的电力负荷数据，并通过熵计算方法（例如香农熵）获得j个数据点的局部熵S _j。The local entropy S _j of the jth data point in each time period is obtained by selecting power load data with a time length of t with the jth data point as the center and obtaining the local entropy S _j of the j data points through an entropy calculation method (such as Shannon entropy).

综上所述，当A _x越小时，P _x越小，且模型输出数据的准确性越高；当B _x越小时，P _x越小，且模型输出数据的准确性越高。当A _x越大时，P _x越大，且模型输出数据的准确性越低；当B _x越大时，P _x越大，且模型输出数据的准确性越低。In summary, when A _x is smaller, P _x is smaller, and the accuracy of the model output data is higher; when B _x is smaller, P _x is smaller, and the accuracy of the model output data is higher. When A _x is larger, P _x is larger, and the accuracy of the model output data is lower; when B _x is larger, P _x is larger, and the accuracy of the model output data is lower.

步骤S4：利用预训练好的一个电力数据处理模型对所述电力负荷数据进行处理，以得到替换异常数据的电力负荷数据。Step S4: Process the power load data using a pre-trained power data processing model to obtain power load data that replaces the abnormal data.

在一个实施例中，所述电力数据处理模型通过VAE网络对输入数据进行扩充，使得输入数据长度和输出数据长度相同。在另一个实施例中，通过GAN网络对输入数据进行扩充。In one embodiment, the power data processing model expands the input data through a VAE network so that the input data length is the same as the output data length. In another embodiment, the input data is expanded through a GAN network.

其中，输入所述电力数据处理模型的数据长度为所述目标输入长度x _A，所述异常数据为电力负荷数据中属于异常聚类簇的数据点。The data length input to the power data processing model is the target input length x _A , and the abnormal data is the data points belonging to the abnormal cluster in the power load data.

如图2所示，步骤S4包括步骤S401至步骤S403。As shown in FIG. 2 , step S4 includes step S401 to step S403 .

步骤S401：构建长度为所述目标输入长度x _A的窗口。Step S401: construct a window with a length equal to the target input length x _A.

步骤S402：对于任一异常聚类簇：响应于所述窗口的长度大于异常聚类簇的长度，使窗口覆盖所述异常聚类簇以进行处理。Step S402: For any abnormal cluster: in response to the length of the window being greater than the length of the abnormal cluster, the window is made to cover the abnormal cluster for processing.

步骤S403：对于任一异常聚类簇：响应于窗口的长度小于异常聚类簇的长度，使窗口遍历所述异常聚类簇，首次遍历时，使窗口的一部分覆盖异常聚类簇的首端；在遍历过程中，滑动窗口，滑动后的窗口的一部分覆盖未遍历的异常聚类簇。Step S403: for any abnormal cluster: in response to the length of the window being less than the length of the abnormal cluster, the window is traversed through the abnormal cluster, and during the first traversal, a portion of the window covers the head end of the abnormal cluster; during the traversal, the window is slid, and a portion of the sliding window covers the abnormal cluster that has not been traversed.

在一个实施例中，需要对一个时段内的属于异常聚类簇中的数据点进行替换（将其删除后通过非异常聚类簇的数据点生成新的数据）。当其中的一个异常聚类簇中数据的数据长度小于窗口长度时，将窗口的中心和异常聚类簇中的数据的中心对齐，并将窗口中的数据作为输入数据输入到模型中。当其中的一个异常聚类簇中数据的数据长度大于窗口长度时，使窗口的一部分覆盖异常聚类簇的首端，将窗口中的数据作为输入数据输入到模型中后，将模型的输出数据与窗口中的原本的数据进行更换，并将窗口滑动，使窗口的一部分覆盖未被处理的属于该异常聚类簇的数据，直至窗口遍历整个异常聚类簇。In one embodiment, it is necessary to replace the data points belonging to the abnormal clusters within a time period (delete them and generate new data through the data points of the non-abnormal clusters). When the data length of the data in one of the abnormal clusters is less than the window length, align the center of the window with the center of the data in the abnormal cluster, and input the data in the window into the model as input data. When the data length of the data in one of the abnormal clusters is greater than the window length, a part of the window covers the head end of the abnormal cluster, and after the data in the window is input into the model as input data, the output data of the model is replaced with the original data in the window, and the window is slid so that a part of the window covers the unprocessed data belonging to the abnormal cluster until the window traverses the entire abnormal cluster.

其中，覆盖异常聚类簇的首段的窗口部分的长度小于等于窗口总长度的二分之一。使得异常聚类簇中的数据占比较少，模型输出数据更为准确。The length of the window covering the first segment of the abnormal cluster is less than or equal to half of the total length of the window, so that the data in the abnormal cluster accounts for a smaller proportion and the model output data is more accurate.

本发明还提供了一种基于生成式人工智能的电力数据模型构建系统。如图3所示，所述系统包括处理器和存储器，所述存储器存储有计算机程序指令，当所述计算机程序指令被所述处理器执行时实现根据本发明第一方面所述的一种基于生成式人工智能的电力数据模型构建方法及系统。The present invention also provides a system for constructing a power data model based on generative artificial intelligence. As shown in FIG3 , the system includes a processor and a memory, wherein the memory stores computer program instructions, and when the computer program instructions are executed by the processor, a method and system for constructing a power data model based on generative artificial intelligence according to the first aspect of the present invention are implemented.

本领域技术人员可以理解，图3中示出的结构，仅仅是与本发明方案相关的部分结构的框图，并不构成对本发明的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 3 is merely a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the computer device of the present invention. A specific computer device may include more or fewer components than those shown in the figure, or combine certain components, or have a different arrangement of components.

所述系统还包括通信总线和通信接口等本领域技术人员熟知的其他组件，其设置和功能为本领域中已知，因此在此不再赘述。The system also includes other components familiar to those skilled in the art, such as a communication bus and a communication interface, whose configuration and functions are known in the art and thus will not be described in detail here.

在本发明中，前述的存储器可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。例如，计算机可读存储介质可以是任何适当的磁存储介质或者磁光存储介质，比如，阻变式存储器RRAM（Resistive RandomAccess Memory）、动态随机存取存储器DRAM（Dynamic Random Access Memory）、静态随机存取存储器SRAM（Static Random-Access Memory）、增强动态随机存取存储器EDRAM（Enhanced Dynamic Random Access Memory）、高带宽内存HBM（High-Bandwidth Memory）、混合存储立方HMC（Hybrid Memory Cube）等等，或者可以用于存储所需信息并且可以由应用程序、模块或两者访问的任何其他介质。任何这样的计算机存储介质可以是设备的一部分或可访问或可连接到设备。本发明描述的任何应用或模块可以使用可以由这样的计算机可读介质存储或以其他方式保持的计算机可读/可执行指令来实现。In the present invention, the aforementioned memory may be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, apparatus or device. For example, a computer-readable storage medium may be any appropriate magnetic storage medium or magneto-optical storage medium, such as a resistive random access memory RRAM (Resistive Random Access Memory), a dynamic random access memory DRAM (Dynamic Random Access Memory), a static random access memory SRAM (Static Random-Access Memory), an enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), a high-bandwidth memory HBM (High-Bandwidth Memory), a hybrid memory cube HMC (Hybrid Memory Cube), etc., or any other medium that can be used to store the required information and can be accessed by an application, a module, or both. Any such computer storage medium may be part of a device or accessible or connectable to a device. Any application or module described in the present invention may be implemented using computer-readable/executable instructions that may be stored or otherwise maintained by such a computer-readable medium.

在本说明书的描述中，“多个”、“若干个”的含义是至少两个，例如两个，三个或更多个等，除非另有明确具体的限定。In the description of this specification, "plurality" or "several" means at least two, such as two, three or more, etc., unless otherwise clearly and specifically defined.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above-described embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。The above-mentioned embodiments only express several implementation methods of the present invention, and the description thereof is relatively specific and detailed, but it cannot be understood as limiting the scope of the patent application. It should be pointed out that for ordinary technicians in this field, several variations and improvements can be made without departing from the concept of the present invention, which all belong to the protection scope of the present invention.

Claims

1. The power data model construction method based on the generated artificial intelligence is characterized by comprising the following steps of:

Collecting power load data of electric equipment in a plurality of continuous time periods;

For any period, determining the abnormal probability of each data point, and clustering according to the abnormal probability to obtain a preset number of clusters; judging whether any cluster is an abnormal cluster according to the average value of the abnormal probability of the data points in the cluster, wherein the data points in one cluster are continuous in time sequence;

Calculating a fitness coefficient P _x：P_x=exp(A_x+B_x, wherein A _x relates to the length x of input data and the data length of each abnormal cluster, B _x relates to the input length x, the value of a data point and the local entropy thereof, and exp () is an exponential function with a natural constant e; calculating the minimum value of the suitability coefficient P _x through an optimization algorithm to determine the corresponding input length x, and recording the corresponding input length x as a target input length x _A;

Processing the power load data by using a pre-trained power data processing model to obtain power load data for replacing abnormal data; the data length input into the power data processing model is the target input length x _A, and the abnormal data is data points belonging to an abnormal cluster in power load data.

2. The method for constructing a power data model based on generated artificial intelligence according to claim 1, wherein the formula for calculating a _x is: Where u _i is the data length of all data in the ith abnormal cluster in one period, M is the total number of abnormal clusters in one period, and u _T is the data length of a predetermined size.

3. The method for constructing a power data model based on generative artificial intelligence as claimed in claim 1, wherein the formula for calculating B _x comprises:

Calculating the variance sigma ² _j of the power load value corresponding to the jth data point in all the time periods in the power load data, and normalizing the variance of the jth data point to obtain a weight coefficient alpha _j of the jth data point in each time period; obtaining local entropy S _j of the j-th data point of each period;

for any number of segments, sequentially extracting a plurality of groups of calculation data sets with the data length being the length x of the model input data in the time sequence of data point collection in the period;

For any calculation data set, calculating the sum of products of the weight coefficients corresponding to all data points in the calculation data set and the local entropy to obtain a fluctuation degree value of the calculation data set;

the maximum value of the fluctuation degree values in all the calculation data sets is B _x.

4. The method for constructing a power data model based on generated artificial intelligence according to claim 1, wherein counting whether the j-th data point of each period is abnormal for all periods of time, thereby determining the probability that the j-th data point is abnormal data comprises:

obtaining the number n _j of data points of which the j-th data point of each period is abnormal;

Obtaining the probability that the jth data point is abnormal data: p _j=n_j/N, where p _j is the probability that the jth data point of each period is abnormal data, and N is the number of periods.

5. The method for constructing a power data model based on a generative artificial intelligence according to claim 1, wherein the determining whether the cluster is an abnormal cluster according to an average value of abnormal probabilities of data points in the cluster comprises:

For any cluster, calculating the average value of the probability that each data point in the cluster is abnormal data;

Determining that the cluster is an abnormal cluster in response to the average value being greater than a threshold value;

And in response to the average value being greater than a threshold, determining that the cluster is not an outlier cluster.

6. The method for constructing a power data model based on generated artificial intelligence according to claim 1, wherein the processing the power load data by using a pre-trained power data processing model comprises:

Constructing a window with the length of the target input length x _A;

for any outlier cluster:

In response to the length of the window being greater than the length of the abnormal cluster, covering the abnormal cluster with the window for processing; in response to the length of the window being less than the length of the abnormal cluster, traversing the window through the abnormal cluster for the first time, and enabling a part of the window to cover the head end of the abnormal cluster; and sliding the window in the traversal process, wherein a part of the window after sliding covers the abnormal cluster which is not traversed.

7. The power data model construction method based on the generated artificial intelligence according to claim 1, wherein the anomaly detection algorithm is a random forest anomaly detection algorithm.

8. The power data model construction method based on the generated artificial intelligence according to claim 1, wherein the optimization algorithm is a PSO optimization algorithm.

9. A power data model construction system based on generative artificial intelligence, comprising a processor and a memory, the memory storing a computer program, characterized in that the processor executes the computer program to implement the power data model construction method based on generative artificial intelligence as claimed in any one of claims 1-8.