CN111553811A - A method for identifying leakage areas in water supply network based on iterative machine learning - Google Patents
A method for identifying leakage areas in water supply network based on iterative machine learning Download PDFInfo
- Publication number
- CN111553811A CN111553811A CN202010369142.0A CN202010369142A CN111553811A CN 111553811 A CN111553811 A CN 111553811A CN 202010369142 A CN202010369142 A CN 202010369142A CN 111553811 A CN111553811 A CN 111553811A
- Authority
- CN
- China
- Prior art keywords
- nodes
- leakage
- leaking
- leak
- iteration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Probability & Statistics with Applications (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Examining Or Testing Airtightness (AREA)
Abstract
Description
技术领域technical field
本发明涉及基于迭代机器学习的供水管网泄漏区域识别方法,属于供水管网泄漏检测技术领域。The invention relates to a method for identifying leakage areas of a water supply pipe network based on iterative machine learning, and belongs to the technical field of water supply pipe network leakage detection.
背景技术Background technique
供水管网是社会的重要基础设施,在经济发展和正常生活中发挥着重要作用。由于供水管网老化和设计不合理,供水管网存在不同程度的泄漏问题。发达国家每年从水分配系统中损失约15%的纯净水,而发展中国家每年损失35%甚至高达60%的纯净水。因此准确定位泄漏是供水管网泄漏率控制需要解决的关键问题之一。The water supply network is an important infrastructure of society and plays an important role in economic development and normal life. Due to the aging and unreasonable design of the water supply pipe network, there are different degrees of leakage problems in the water supply pipe network. Developed countries lose about 15% of their purified water from their water distribution systems every year, while developing countries lose 35% or even up to 60% of their purified water every year. Therefore, accurate location of leakage is one of the key issues to be solved in the leakage rate control of water supply network.
随着智能大数据时代的到来,利用数据挖掘技术定位供水管网泄漏的研究逐渐成为热点。基于水力模型的泄漏研究方法可以分为两类:(1)基于相邻节点相似性的泄漏特性,近似地确定泄漏节点的位置;(2)预先根据泄漏特征将供水管网划分为不同区域,以识别泄漏区域。直接进行泄漏节点的定位是首先生成供水管网的泄漏样本,由于在进行节点定位时存在相似的节点,因此会降低模型的识别准确率,一般会通过两种方法进行模型准确率的提升:一种是通过预先合并相似节点从而减少不可区分节点的数目,另一种是增加延时模式,即通过增加时间节点从而增加泄漏信息的容量。直接进行泄漏区域定位的方法是预先采用聚类方法进行相似节点的聚类,从而减少相似节点的数目,将泄漏区域作为分类器的标签从而进行泄漏区域识别。With the advent of the era of intelligent big data, the use of data mining technology to locate the leakage of water supply network has gradually become a hot topic. Leakage research methods based on hydraulic models can be divided into two categories: (1) approximately determine the location of the leaking nodes based on the leakage characteristics of the similarity of adjacent nodes; (2) divide the water supply network into different areas according to the leakage characteristics in advance, to identify the leak area. Directly locating leaking nodes is to first generate leak samples of the water supply network. Since there are similar nodes when locating nodes, the recognition accuracy of the model will be reduced. Generally, the accuracy of the model will be improved by two methods: 1. One is to reduce the number of indistinguishable nodes by merging similar nodes in advance, and the other is to increase the delay mode, that is, to increase the capacity of leaking information by adding time nodes. The method of directly locating the leakage area is to use the clustering method to cluster similar nodes in advance, so as to reduce the number of similar nodes, and use the leakage area as the label of the classifier to identify the leakage area.
通过对以往研究方法的分析发现,虽然可以通过对泄漏特征相似的节点进行聚类从而提高分类器的识别精度。但聚类方法应用于供水管网时,由于聚类数目是由研究者给直接定,缺乏理论依据,并且在定位识别的过程中只考虑单个泄漏节点。然而,在实际的供水网络中,往往会出现多重泄漏。因此,本发明针对此问题,采用k-means聚类(k=2)与随机森林分类器相结合的迭代机器学习方法,为供水管网聚类时的聚类数目提供了理论依据,并且同时为识别多重泄漏区域提供了一种解决方案。对于同时发生的泄漏,在每次迭代中分析了识别的区域中中所有的泄漏节点组合类型。每个泄漏组合类型都用作随机森林分类器的类别标签。采用训练后的随机森林分类器作为泄漏区域识别模型,定位泄漏区域并识别每个泄漏区域包含的泄漏数量。Through the analysis of previous research methods, it is found that although the nodes with similar leakage characteristics can be clustered, the recognition accuracy of the classifier can be improved. However, when the clustering method is applied to the water supply network, because the number of clusters is directly determined by the researchers, there is no theoretical basis, and only a single leakage node is considered in the process of localization and identification. However, in real water supply networks, multiple leaks often occur. Therefore, in order to solve this problem, the present invention adopts an iterative machine learning method combining k-means clustering (k=2) and random forest classifier, which provides a theoretical basis for the number of clusters in the clustering of the water supply network, and at the same time Provides a solution for identifying multiple leak areas. For simultaneous leaks, all leak node combination types in the identified regions were analyzed in each iteration. Each leaky combination type is used as a class label for the random forest classifier. The trained random forest classifier is used as the leak area identification model to locate leak areas and identify the number of leaks contained in each leak area.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供基于迭代机器学习的供水管网泄漏区域识别方法,从而解决聚类方法应用于供水管网时聚类数目的不确定性并为多重泄漏的区域识别提供一种解决办法。The purpose of the present invention is to provide an iterative machine learning-based water supply pipe network leakage area identification method, so as to solve the uncertainty of the number of clusters when the clustering method is applied to the water supply pipe network and provide a solution for the area identification of multiple leaks.
本发明采用的技术方案是:根据流量与压力平衡的原理建立供水管网的水力模型;假设有l个同时发生泄漏,泄漏特征(泄漏前后传感器值的差)为ΔSl,对已经识别的泄漏区域的每个节点添加相同的泄漏系数C从而生成泄漏变化矩阵,根据已经识别的泄漏区域泄漏变化矩阵,采用k-means聚类将其聚类为两类;随后通过水力模型模拟随机生成泄漏事件,并以泄漏事件的模拟结果作为分类器的训练样本;利用样本训练特征选择的模型,采用特征选择方法-平均准确率减少(Mean Decrease Accuracy,MDA)对特征的重要性进行计算,根据特征的重要性对特征进行排序从而进行特征的筛选;将每个泄漏组合作为分类器模型的类别标签,利用所选特征训练随机森林分类器;如果模型的最终识别准确率(每次迭代的训练模型ΔSl识别准确率的乘积,Acc)大于95%,则将泄漏特征ΔSl输入训练好的随机森林分类器,然后进行下一次迭代;如果最终识别准确率小于95%,则停止迭代。输出识别的泄漏区域和每个泄漏区域包含的泄漏节点数。The technical scheme adopted by the present invention is: establishing a hydraulic model of the water supply pipe network according to the principle of flow and pressure balance; assuming that there are l leakages at the same time, the leakage characteristic (the difference between the sensor values before and after the leakage) is ΔS l . The same leakage coefficient C is added to each node of the area to generate a leakage change matrix. According to the leakage change matrix of the identified leakage area, k-means clustering is used to cluster them into two categories; then the leakage events are randomly generated through hydraulic model simulation. , and use the simulation result of the leakage event as the training sample of the classifier; use the sample to train the model of feature selection, and use the feature selection method-Mean Decrease Accuracy (MDA) to calculate the importance of the feature. Sort the features by importance to filter the features; use each leakage combination as the class label of the classifier model, and use the selected features to train the random forest classifier; if the final recognition accuracy of the model (training model ΔS per iteration) l The product of the recognition accuracy, Acc) is greater than 95%, then the leakage feature ΔS l is input into the trained random forest classifier, and then the next iteration is performed; if the final recognition accuracy is less than 95%, the iteration is stopped. Outputs the identified leak areas and the number of leak nodes each leak area contains.
基于迭代机器学习的供水管网泄漏区域识别方法通过以下步骤进行:The iterative machine learning-based method for identifying leakage areas in water supply network is carried out through the following steps:
(1)有l个节点同时发生泄漏,泄漏特征为ΔSl;(1) There are l nodes leaking at the same time, and the leakage characteristic is ΔS l ;
(2)在第(β-1)次迭代中识别的泄漏区域为每个泄漏区域内存在个泄漏节点,i=1,2,...,w;(2) The leakage area identified in the (β-1)th iteration is exists within each leaked region leaking nodes, i=1,2,...,w;
(3)从w个泄漏区域内选择其中一个泄漏区域泄漏区域内包含个泄漏节点;对泄漏区域内的每个节点添加相同的泄漏系数C从而生成泄漏变化矩阵 (3) Select one of the leakage areas from the w leakage areas spill area contains leaking nodes; The same leakage coefficient C is added to each node within to generate a leakage change matrix
(4)根据泄漏变化矩阵采用k-means聚类将泄漏区域聚类为两类,分别为区域和区域其余未聚类的(w-1)个区域及其包含的泄漏节点数目不变,则第β次迭代一共有(w+1)个区域;(4) According to the leakage change matrix Use k-means clustering to classify leaky regions Clustering is divided into two categories, namely regions and area The remaining unclustered (w-1) regions and the number of leaking nodes they contain remain unchanged, then the β-th iteration has a total of (w+1) regions;
(5)产生第β次迭代泄漏节点的组合类型;对于未聚类的(w-1)个区域,其区域内部的泄漏节点数目保持不变,对于包含个泄漏节点的区域和区域其所有的泄漏组合有种,分别为:0个泄漏节点在个泄漏节点在区域1个泄漏节点在个泄漏节点在区域个泄漏节点在0个泄漏节点在区域因此对于本次迭代,一共有个不同的标签;(5) Generate the combination type of leaked nodes in the βth iteration; for unclustered (w-1) regions, the number of leaked nodes inside the region remains unchanged. region of leaking nodes and area All its leak combinations have species, respectively: 0 leaking nodes are in leaking nodes in the
(6)生成第β次迭代的泄漏样本;随机从区域中选择个节点,区域中选择个节点,对于未聚类的(w-1)个区域,分别从区域中选择个节点,从而产生l个同时泄漏的节点,记为一个泄漏样本;一共产生ε个不同的泄漏样本;泄漏样本的集合称为T,特征的总数为NPQ,NP个压力传感器和NQ个流量传感器,其中NPQ=NP+NQ;(6) Generate leaky samples of the βth iteration; randomly from the region choose nodes, regions choose nodes, for unclustered (w-1) regions, respectively from the region choose nodes, As a result, l nodes that leak at the same time are recorded as a leak sample; a total of ε different leak samples are generated; the set of leak samples is called T, and the total number of features is N PQ , N P pressure sensors and N Q flow sensors , where N PQ =N P +N Q ;
(7)以泄漏事件的水力模拟结果作为分类器的训练样本;采用MDA进行分类器模型特征的重要性计算,随后根据特征的重要性进行排序,以训练模型准确率不减小的原则,从非重要到重要的顺序进行特征数量的删减;所述MDA的计算分为随机森林分类器的训练及特征的平均准确率减少的计算;(7) Take the hydraulic simulation result of the leakage event as the training sample of the classifier; use MDA to calculate the importance of the classifier model features, and then sort according to the importance of the features, so that the accuracy of the training model does not decrease, from The number of features is deleted in the order from unimportant to important; the calculation of the MDA is divided into the training of the random forest classifier and the calculation of the reduction of the average accuracy of the feature;
随机森林分类器的训练过程如下所示:The training process of a random forest classifier is as follows:
(a)对于第β次迭代,在第(β-1)次迭代中识别的区域组合为 识别出的泄漏区域内存在个泄漏节点,i=1,2,...,w;定义l是总的泄漏节点数目;然后根据泄漏区域的泄漏矩阵将泄漏区域聚类为区域和区域两部分,泄漏区域组合为对于包含个泄漏节点的区域和区域其所有的泄漏组合有种,其他未聚类的(w-1)个区域包含的泄漏节点数目保持不变,则本次迭代有个随机森林分类器标签,泄漏节点的组合类型分别为0个泄漏节点在区域个泄漏节点在区域1个泄漏节点在区域个泄漏节点在区域个泄漏节点在区域0个泄漏节点在区域随机从区域中选择个节点,区域中选择个节点,区域中选择个节点,其中从而产生l个同时泄漏的节点,记为一个泄漏样本;假设存在ε个不同的泄漏样本,泄漏样本的集合称为T,特征的总数为NPQ,其中:NP个压力传感器和NQ个流量传感器,NPQ=NP+NQ);(a) For the βth iteration, the combination of regions identified in the (β-1)th iteration is Identified leak area memory exists leaky nodes, i = 1, 2, ..., w; definition l is the total number of leaking nodes; then according to the leaking area The leakage matrix of will leak area cluster into regions and area In two parts, the leakage area is combined as for containing region of leaking nodes and area All its leak combinations have The number of leaking nodes contained in other unclustered (w-1) regions remains unchanged, then this iteration has random forest classifier labels, the combination type of leaking nodes is 0 leaking nodes are in the region leaking nodes in the
(b)采用从泄漏样本中重复抽样的方法创建训练子集Ttr1,Ttr2,…,TtrM,M是分类树的数量,从泄漏样本T中随机选择Ttri的训练子集,每个分类树Ttri的训练子集的数目为ε,与总的泄漏样本的大小一致,因此,分类树Ttri的训练子集会有重复的样本。泄漏样本T中未被抽中的那部分叫做out-of-bag(OOB),用来评估每棵树的准确性;因为随机森林分类器的准确性随着分类树数目的增加而增加并趋于一个常数,因此选择默认值M=500;(b) Create training subsets T tr1 ,T tr2 ,…,T trM by repeated sampling from leaked samples, where M is the number of classification trees, randomly select training subsets of T tri from leaked samples T, each The number of training subsets of the classification tree T tri is ε, which is consistent with the size of the total leaked samples. Therefore, the training subset of the classification tree T tri will have duplicate samples. The part of the leaked sample T that is not drawn is called out-of-bag (OOB) and is used to evaluate the accuracy of each tree; because the accuracy of the random forest classifier increases as the number of classification trees increases and converges is a constant, so choose the default value M=500;
(c)对于训练子集Ttri的每个节点,1≤i≤500,从NPQ个特征中随机选择m个子特征来创建分类树,1≤m≤NPQ;m的默认值为用来计算每个特征的基尼指数;给定训练子集Ttri和连续特征ND,D=1,2,...,m,训练子集Ttri有f类样本,类别h内有|fh|样本;特征ND有r个不同的值;然后,将这些值从小到大排序,并将它们标记为R={R1,R2,…,Rr},划分点t可以划分ND为两个子集和其中为包含不大于t的值的样本,为包含大于t的值的样本,相邻的值分别为Re和Re+1;其中在[Re,Re+1]中的所有值都有相同的分割结果,所以有(r-1)个分割点是候选分割点;训练子集Ttri和特征点t处的基尼系数为(c) For each node of the training subset T tri , 1≤i≤500 , randomly select m sub-features from NPQ features to create a classification tree, 1≤m≤NPQ ; the default value of m is It is used to calculate the Gini index of each feature; given a training subset T tri and continuous features N D , D=1,2,...,m, the training subset T tri has f class samples, There are |f h | samples in class h; feature N D has r distinct values; then, sort these values from small to large and label them as R = {R 1 ,R 2 ,...,R r }, The dividing point t can divide ND into two subsets and in is a sample containing values not greater than t, For the samples containing values greater than t, the adjacent values are Re and Re +1 respectively; in which all values in [ Re , Re +1 ] have the same segmentation result, so there is (r- 1) The split points are candidate split points; the Gini coefficient at the training subset T tri and the feature point t is
其中:式中,pf表示样本属于f类的概率,表示随机选取的样本被误分类的概率;基尼指数越大,样本被错误分类的可能性越大;选择t的最小的Gini指数和相应的特征进行分割,然后,构造每个分支来重复上述过程;Where: In the formula, p f represents the probability that the sample belongs to class f, Represents the probability of a randomly selected sample being misclassified; the larger the Gini index, the greater the probability of the sample being misclassified; select the smallest Gini index of t and the corresponding feature for segmentation, and then construct each branch to repeat the above process ;
(d)每个分类树均有助于训练子集Ttri的识别精度,500棵分类树的平均识别准确率是训练模型的识别准确率 (d) Each classification tree contributes to the recognition accuracy of the training subset T tri , and the average recognition accuracy of 500 classification trees is the recognition accuracy of the training model
MDA的计算过程如下所示:The calculation process of MDA is as follows:
对于随机森林分类器模型的每个分类树Ttri,1≤i≤500,使用OOB数据计算OOB误差,记为OOBb1;随机打乱特征F处的袋外样本数据,1≤F≤NPQ,再次计算出袋外误差,记为OOBb2;假设森林中有500棵树,则特征F的重要性表示为: For each classification tree T tri of the random forest classifier model, 1≤i≤500, use the OOB data to calculate the OOB error, denoted as OOB b1 ; randomly scramble the out-of-bag sample data at the feature F, 1≤F≤N PQ , and calculate the out-of-bag error again, denoted as OOB b2 ; assuming that there are 500 trees in the forest, the importance of feature F is expressed as:
(8)将个泄漏节点的组合类型作为分类器模型的类别标签,利用所选特征训练随机森林分类器;如果模型第β次迭代的最终识别准确率Accβ,即每次迭代的训练模型识别准确率的乘积大于95%,则将泄漏特征输入训练好的随机森林分类器,然后进行下一次迭代;如果最终识别准确率小于95%,则停止迭代;输出识别的泄漏区域和每个泄漏区域包含的泄漏节点数目。(8) will The combination type of the leaked nodes is used as the class label of the classifier model, and the random forest classifier is trained with the selected features; if the final recognition accuracy Acc β of the βth iteration of the model is the product of the recognition accuracy of the training model for each iteration If it is greater than 95%, input the leak feature into the trained random forest classifier, and then proceed to the next iteration; if the final recognition accuracy rate is less than 95%, stop the iteration; output the identified leak area and the leak nodes contained in each leak area number.
本发明有如下有益效果:The present invention has the following beneficial effects:
这种基于迭代机器学习的供水管网泄漏区域识别方法,对于每一次迭代,选择其中一个已经识别的泄漏区域,采用k-means聚类将其聚类为两类,将泄漏节点的所有组合类型作为随机森林分类器模型的标签,随后根据泄漏节点的组合类型对泄漏区域的节点随机添加泄漏系数从而生成泄漏样本,采用生成的泄漏样本进行分类器模型的训练。模型在训练的过程中考虑了特征的选择从减少模型训练时所需要的特征样本。将泄漏特征输入经过训练的分类器模型从而输出识别的泄漏区域及其包含的泄漏节点的数目,重复以上步骤直至最终识别准确率小于95%即结束迭代。从而输出识别的泄漏区域及其包含的泄漏节点的数目。当单节点泄漏时,二分类聚类解决了聚类方法应用于供水管网的泄漏检测时聚类数目的不确定性,为聚类的数目提供了理论依据,并且减少了所需的试算次数;将聚类及单标签分类器相结合的技术应用于多漏点的泄漏区域的识别定位,可以识别泄漏区域及其包含的泄漏节点的数目。This iterative machine learning-based method for identifying leaking areas in water supply network, for each iteration, select one of the identified leaking areas, use k-means clustering to cluster them into two categories, and classify all the combined types of leaking nodes. As the label of the random forest classifier model, then according to the combination type of the leaked nodes, the leak coefficient is randomly added to the nodes in the leak area to generate leak samples, and the generated leak samples are used to train the classifier model. In the process of training, the model considers the selection of features to reduce the feature samples required for model training. Input the leak feature into the trained classifier model to output the identified leak area and the number of leak nodes contained therein, and repeat the above steps until the final identification accuracy rate is less than 95%, and the iteration ends. The number of leaked regions identified and the leaky nodes they contain is thus output. When a single node leaks, the binary clustering solves the uncertainty of the number of clusters when the clustering method is applied to the leak detection of the water supply network, provides a theoretical basis for the number of clusters, and reduces the required trial calculation The number of times; the technology of combining clustering and single-label classifier is applied to the identification and positioning of the leaking area with multiple leaking points, and the leaking area and the number of leaking nodes it contains can be identified.
附图说明Description of drawings
图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.
图2是供水管网的水力模型图。Figure 2 is a hydraulic model diagram of the water supply network.
图3是压力和流量测点的位置。Figure 3 shows the location of the pressure and flow measurement points.
图4是随机森林分类器的训练过程。Figure 4 shows the training process of the random forest classifier.
图5是包含泄漏节点101泄漏区域的识别过程。FIG. 5 is a process of identifying the leaking area containing leaky node 101 .
图6是识别单节点发生泄漏的区域:(a)传统方法确定的泄漏区域;(b)采用迭代机器学习法确定的泄漏区域。Figure 6 is the identification of the leakage area of a single node: (a) the leakage area determined by the traditional method; (b) the leakage area determined by the iterative machine learning method.
图7包含泄漏节点65和281的区域的识别过程Figure 7 Identification process of the region containing leaky nodes 65 and 281
图8包含泄漏节点132和406的区域的识别过程Figure 8 Identification process for the region containing leaky nodes 132 and 406
图9是识别两个同时发生泄漏的区域:(a)包含节点65和281的泄漏区域;(b)包含节点132和406的泄漏区域。Figure 9 is the identification of two simultaneous leakage regions: (a) the leakage region containing nodes 65 and 281; (b) the leakage region containing nodes 132 and 406.
具体实施方式Detailed ways
下面将结合本发明中的附图,对本发明的技术方案进行清楚、完整的描述。The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the present invention.
如附图1所示,根据流量与压力平衡的原理建立供水管网的水力模型;假设有l个节点同时发生泄漏,泄漏特征(泄漏前后传感器值的差)为ΔSl,对已经识别的泄漏区域的每个节点添加相同的泄漏系数C从而生成泄漏变化矩阵,根据已经识别的泄漏区域泄漏变化矩阵,采用k-means聚类将其聚类为两类;随后通过水力模型模拟随机生成泄漏事件,并以泄漏事件的模拟结果作为分类器的训练样本;利用样本训练特征选择的模型,根据特征的重要性对特征进行排序从而进行特征的筛选;将每个泄漏组合作为分类器模型的类别标签,利用所选特征训练随机森林分类器;如果模型的最终识别准确率(每次迭代的训练模型识别准确率的乘积,Acc)大于95%,则将泄漏特征ΔSl输入训练好的随机森林分类器,输出泄漏区域和每个泄漏区域包含的泄漏节点数,然后进行下一次迭代;如果最终识别准确率小于95%,则停止迭代。As shown in Figure 1, the hydraulic model of the water supply pipe network is established according to the principle of flow and pressure balance; assuming that there are l nodes leaking at the same time, the leakage characteristic (the difference between the sensor values before and after the leak) is ΔS l . The same leakage coefficient C is added to each node of the area to generate a leakage change matrix. According to the leakage change matrix of the identified leakage area, k-means clustering is used to cluster them into two categories; then the leakage events are randomly generated through hydraulic model simulation. , and use the simulation result of the leakage event as the training sample of the classifier; use the sample to train the model of feature selection, sort the features according to the importance of the features to filter the features; use each leakage combination as the class label of the classifier model , using the selected features to train the random forest classifier; if the final recognition accuracy of the model (the product of the recognition accuracy of the training model at each iteration, Acc) is greater than 95%, then the leakage feature ΔS l is input into the trained random forest classifier , output the leak area and the number of leak nodes contained in each leak area, and then proceed to the next iteration; if the final recognition accuracy rate is less than 95%, stop the iteration.
实施例Example
第一步、根据流量与压力平衡的原理建立供水管网的水力模型,本发明采用的是EPANET建立水力模型。如附图2所示,由一个水库(1),一个蓄水池(2),375个节点(3)和469个管段(4)组成。如附图3所示,测点由3个流量计(Q1-Q3)与18个压力测点(P1-P18)组成,假设传感器测量值被均匀的零均值高斯误差所破坏,其振幅分别为传感器样本的残差平均值的1.5%。基本需水量为148L/s。最大需水量为162.8L/s,最小需水量为118.4L/s。由于泄漏量太小不会引起传感器的变化,而泄漏量太大会让居民首先发现,因此本发明选取泄漏系数在0.5-1.2之间。The first step is to establish the hydraulic model of the water supply pipe network according to the principle of flow and pressure balance. The present invention adopts EPANET to establish the hydraulic model. As shown in Figure 2, it consists of a reservoir (1), a reservoir (2), 375 nodes (3) and 469 pipe sections (4). As shown in Figure 3, the measuring points are composed of 3 flowmeters (Q1-Q3) and 18 pressure measuring points (P1-P18). Assuming that the sensor measurement value is destroyed by a uniform zero-mean Gaussian error, its amplitudes are 1.5% of the mean of the residuals of the sensor samples. The basic water demand is 148L/s. The maximum water demand is 162.8L/s, and the minimum water demand is 118.4L/s. Since the leakage amount is too small, the sensor will not change, while the leakage amount is too large for residents to find out first, so the present invention selects the leakage coefficient between 0.5-1.2.
第二步、本发明假设供水管网有两种泄漏类型,一种是单泄漏节点,另一种是两个节点的泄漏。In the second step, the present invention assumes that there are two types of leakage in the water supply pipe network, one is a single leakage node, and the other is a leakage of two nodes.
第三步、当供水管网发生泄漏报警时,初始迭代从整个供水管网开始;The third step, when a leak alarm occurs in the water supply network, the initial iteration starts from the entire water supply network;
第四步、进行泄漏区域的聚类划分,特征的选择及多级随机森林分类器模型的训练,随机森林分类训练模型如附图4所示。The fourth step is to perform cluster division of the leaked area, selection of features and training of the multi-level random forest classifier model. The random forest classification training model is shown in FIG. 4 .
单节点泄漏single node leak
当节点101发生4L/s泄漏时,每次只选取上次迭代中识别出的泄漏区域的一个节点来生成泄漏样本。When a 4L/s leak occurs at node 101, only one node in the leak area identified in the previous iteration is selected each time to generate leak samples.
选择其中一个已经识别的泄漏区域,对泄漏区域内的节点随机添加C=0.8的泄漏系数从而生成泄漏变化矩阵。采用k-means将泄漏区域聚类为两部分。One of the identified leak areas is selected, and a leak coefficient of C=0.8 is randomly added to the nodes in the leak area to generate a leak change matrix. K-means was used to cluster the leaked regions into two parts.
在每次迭代中,最后一次迭代中识别的泄漏区域的每个节点在泄漏系数范围内随机产生30个泄漏样本。因此,在第一次迭代中,节点总数为375,则共有11250个独立的泄漏样本。随着迭代次数的增加,候选泄漏区域的面积减小,用于模型训练的泄漏样本数量减少。In each iteration, 30 leak samples are randomly generated within the leak coefficient range for each node of the leak region identified in the last iteration. Therefore, in the first iteration, the total number of nodes is 375, and there are 11250 independent leak samples. As the number of iterations increases, the area of candidate leaky regions decreases, and the number of leaky samples used for model training decreases.
通过特征选择,可以减少每次迭代识别泄漏所需的特征。如附图5中的迭代2最多需要9个传感器,整个迭代需要10个不同的传感器,传感器的数目均小于传感器总数。With feature selection, you can reduce the number of features needed to identify leaks at each iteration. For example,
当节点101发生4L/s泄漏时,其迭代过程如附图5所示,每个子迭代的识别准确率分别为99.99%、99.82%、99.25%、97.35%,则第四次迭代的最终识别准确率为99.99%×99.82%×99.25%×97.35%=96.44%。对于第五个子迭代,精度为96.67%,最终识别准确率为为96.44%×96.67%=93.22%<95%,则迭代终止;如附图6(b)所示,确定泄漏节点101(5)所在的区域(Z1)。总的迭代次数为4次。When a 4L/s leak occurs at node 101, the iterative process is shown in Figure 5. The recognition accuracy of each sub-iteration is 99.99%, 99.82%, 99.25%, and 97.35%, respectively. The final recognition of the fourth iteration is accurate The rate is 99.99%×99.82%×99.25%×97.35%=96.44%. For the fifth sub-iteration, the accuracy is 96.67%, and the final recognition accuracy is 96.44% × 96.67% = 93.22% < 95%, then the iteration is terminated; as shown in Fig. 6(b), determine the leaking node 101(5) in the zone (Z1). The total number of iterations is 4.
与传统方法相比(聚类总数预先给出,最终聚类数由试算法确定),传统方法的试验计算次数明显多于本文方法,本发明对泄漏区有更具体的分析。按传统方法若以识别准确率为95%作为训练模型标准,可识别的区域总数为18个。如果传统方法从1开始每次以1个区域为增加量进行聚类试算,则计算总数为17。但对于迭代法。由表2可知,迭代次数为4次,迭代次数小于传统算法的试算次数。虽然传统方法对泄漏区域的识别也有很好的效果,但是识别出的包含泄漏节点101(5)的泄漏区域(Z2)(如附图6(a)所示)比本文方法的泄漏区域(Z1)面积较大。相比之下,本文提出的方法不仅可以减少迭代法的试验计算总数,而且可以通过消除每次迭代中的无泄漏区域来缩小可识别区域的面积。Compared with the traditional method (the total number of clusters is given in advance, and the final number of clusters is determined by the trial algorithm), the number of experimental calculations of the traditional method is obviously more than that of the method in this paper, and the present invention has a more specific analysis of the leakage area. According to the traditional method, if the recognition accuracy rate is 95% as the training model standard, the total number of identifiable regions is 18. If the traditional method starts from 1 and performs the clustering trial calculation in increments of 1 area at a time, the total number of calculations is 17. But for iterative method. It can be seen from Table 2 that the number of iterations is 4, and the number of iterations is less than the number of trials of the traditional algorithm. Although the traditional method also has a good effect on the identification of the leakage area, the identified leakage area (Z2) containing the leakage node 101(5) (as shown in FIG. ) is larger. In contrast, the method proposed in this paper not only reduces the total number of trial computations for the iterative method, but also reduces the area of identifiable regions by eliminating leak-free regions in each iteration.
两个节点同时发生泄漏Two nodes leak at the same time
一个双节点的泄漏组合出现在节点65和281,节点65的泄漏流量是3.6L/s,节点281的泄漏流量是2.9L/s;另一个泄漏组合发生在节点132和406,节点132的泄漏流量是3.0L/s,节点406的泄漏流量是4.7L/s。A two-node leak combination occurs at nodes 65 and 281, the leak flow at node 65 is 3.6L/s, and the leak flow at node 281 is 2.9L/s; another leak combination occurs at nodes 132 and 406, and the leak at node 132 The flow is 3.0 L/s and the leakage flow at node 406 is 4.7 L/s.
对于每次迭代,选择其中一个已经识别的泄漏区域,对泄漏区域内的节点随机添加C=0.8的泄漏系数从而生成泄漏变化矩阵。采用k-means聚类将泄漏区域聚类为两部分。For each iteration, one of the identified leakage regions is selected, and a leakage coefficient of C=0.8 is randomly added to the nodes in the leakage region to generate a leakage change matrix. K-means clustering was used to cluster the leaked regions into two parts.
每个泄漏节点组合在泄漏系数范围内产生30个泄漏系数组合。同样,第一次迭代随机产生11250个独立的泄漏样本。每次迭代的样本量随着泄漏区域节点的增加而逐渐减少。例如,如果在最后一次迭代中识别的泄漏区域有375个节点,那么将会产生11250个泄漏样本,按比例算法,如果在最后一次迭代中识别的泄漏区域有50个节点,那么将会产生1500个泄漏样本。对于两个节点同时泄漏的迭代法的具体识别过程如附图7和附图8所示。附图7和附图8显示了不同的识别结果。Each leak node combination yields 30 leak factor combinations within the leak factor range. Likewise, the first iteration randomly generated 11,250 independent leak samples. The sample size at each iteration decreases gradually as the number of nodes in the leaky region increases. For example, if the leak area identified in the last iteration had 375 nodes, then 11,250 leak samples would be generated, and a proportional algorithm, if the leak area identified in the last iteration had 50 nodes, would yield 1,500 leak samples leaked samples. The specific identification process of the iterative method for simultaneous leakage of two nodes is shown in FIG. 7 and FIG. 8 . Figures 7 and 8 show different recognition results.
对于两个节点的同时泄漏,在每次迭代中对两个同时泄漏的模型进行训练时,特征选择也可以减少不必要的特征,但随着迭代次数的增加,相对于单节点泄漏的效果降低。如附图7所示,迭代1需要5个传感器,迭代4需要13个传感器,如附图8所示,迭代1需要5个传感器,迭代4需要12个传感器。对于整个迭代,附图7显示整个迭代需要13个不同的传感器,而附图8显示整个迭代需要17个不同的传感器。距离较近的相邻的泄漏组合所需的传感器总数小于与距离较远的泄漏组合所需的传感器总数。尽管所选传感器的数目都小于总传感器的数目,但与单个节点泄漏相比,效果有所下降。For simultaneous leakage of two nodes, feature selection can also reduce unnecessary features when training two models with simultaneous leakage in each iteration, but as the number of iterations increases, the effect relative to single-node leakage decreases . As shown in Figure 7,
如附图7所示,当节点65和281发生泄漏时,前四次各子迭代的识别准确率分别为99.41%、98.83%、98.57%、99.76%,得到的最终识别准确率为99.41%×98.83%×98.57%×99.76%=96.61%。对于第五个子迭代,精度为97.22%,最终识别准确率为96.61%×97.22%=93.92%<95%,迭代停止;确定了泄漏区域和泄漏区域内的泄漏数量。As shown in Figure 7, when the nodes 65 and 281 leak, the recognition accuracy rates of the first four sub-iterations are 99.41%, 98.83%, 98.57%, and 99.76%, respectively, and the final recognition accuracy is 99.41%× 98.83%×98.57%×99.76%=96.61%. For the fifth sub-iteration, the accuracy was 97.22%, the final recognition accuracy was 96.61% × 97.22% = 93.92% < 95%, the iteration was stopped; the leak area and the number of leaks within the leak area were determined.
如附图8所示,当节点132(9)和406(8)发生泄漏时,前七次各子迭代的精度分别为99.41%、99.52%、99.64%、99.78%、98.88%、99.36%、99.67%,最终识别准确率为96.32%。第8次迭代的识别准确率为97.12%,则最终识别准确率为96.32%×97.12%=93.55%<95%,迭代停止;确定了泄漏区域和各泄漏区域内的泄漏数量。As shown in Figure 8, when the nodes 132(9) and 406(8) leak, the accuracies of the first seven sub-iterations are 99.41%, 99.52%, 99.64%, 99.78%, 98.88%, 99.36%, 99.67%, and the final recognition accuracy is 96.32%. The recognition accuracy rate of the 8th iteration is 97.12%, then the final recognition accuracy rate is 96.32%×97.12%=93.55%<95%, the iteration stops; the leak area and the number of leaks in each leak area are determined.
如附图9(a)所示,当节点65(6)和281(7)发生泄漏时,本发明所识别的泄漏区域为Z3,如附图9(b)所示,当节点132(9)和406(8)发生泄漏时,本发明所识别的泄漏区域为Z4和Z5。对于节点65(6)和281(7),由于这两个同时发生泄漏的节点距离很接近,因此很难区分它们。而对于节点132(9)和406(8),由于它们之间的距离较远,因此132(9)和406(8)的泄漏特性比节点65(6)和281(7)更明显,更容易区分,从而节点132(9)和406(8)的迭代次数也比较多。As shown in FIG. 9(a), when leakage occurs at nodes 65(6) and 281(7), the leakage area identified by the present invention is Z3. As shown in FIG. 9(b), when node 132(9) ) and 406(8) when leakage occurs, the leakage areas identified by the present invention are Z4 and Z5. For nodes 65(6) and 281(7), it is difficult to distinguish between the two simultaneously leaking nodes due to their close distance. And for nodes 132(9) and 406(8), the leakage characteristics of nodes 132(9) and 406(8) are more obvious and more obvious than nodes 65(6) and 281(7) due to the greater distance between them. It is easy to distinguish, so nodes 132(9) and 406(8) have more iterations.
本发明提出了基于迭代机器学习的供水管网泄漏区域识别方法。在每次迭代时,首先采用k-means将其中一个已经识别的泄漏区域聚类为两部分,然后确定本次迭代所有的泄漏节点的组合类型,每个泄漏节点的组合类型作为随机森林分类器训练模型的一个类别标签,随后根据泄漏节点的组合类型对泄漏区域的节点随机添加泄漏系数从而生成泄漏样本,采用生成的泄漏样本进行分类器模型的训练。如果训练的模型满足迭代标准,输入泄漏特征进行下一次迭代,如果不满足则迭代结束,输出泄漏区域和每个泄漏区域包含的泄漏节点数目。将该方法应用于一个供水网络并对其性能进行了评价,结果表明,该方法能较好地识别发生泄漏的区域及其包含的泄漏节点数目,并且提高了泄漏检测的效率和准确性。The invention proposes a method for identifying the leakage area of a water supply pipe network based on iterative machine learning. In each iteration, k-means is used to cluster one of the identified leak areas into two parts, and then the combination type of all leak nodes in this iteration is determined, and the combination type of each leak node is used as a random forest classifier A class label of the training model is trained, and then leak coefficients are randomly added to the nodes in the leak area according to the combination type of leak nodes to generate leak samples, and the generated leak samples are used to train the classifier model. If the trained model satisfies the iterative criteria, input the leaky features for the next iteration, if not, the iteration ends, and output the leaky area and the number of leaky nodes contained in each leaky area. The method is applied to a water supply network and its performance is evaluated. The results show that the method can better identify the leaking area and the number of leaking nodes it contains, and improve the efficiency and accuracy of leak detection.
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010369142.0A CN111553811B (en) | 2020-05-02 | 2020-05-02 | Water supply pipe network leakage area identification method based on iterative machine learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010369142.0A CN111553811B (en) | 2020-05-02 | 2020-05-02 | Water supply pipe network leakage area identification method based on iterative machine learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111553811A true CN111553811A (en) | 2020-08-18 |
| CN111553811B CN111553811B (en) | 2022-09-20 |
Family
ID=72001791
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010369142.0A Active CN111553811B (en) | 2020-05-02 | 2020-05-02 | Water supply pipe network leakage area identification method based on iterative machine learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111553811B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115389121A (en) * | 2022-08-24 | 2022-11-25 | 中国核动力研究设计院 | Electric valve leakage mode identification method and device based on random forest |
| CN115628776A (en) * | 2022-10-25 | 2023-01-20 | 杭州电子科技大学 | Water supply pipe network abnormal data detection method |
| CN116797051A (en) * | 2023-08-24 | 2023-09-22 | 青岛海洋地质研究所 | Ocean carbon leakage point number evaluation method based on multi-distance spatial cluster analysis |
| EP4597063A1 (en) * | 2024-01-30 | 2025-08-06 | Grohe AG | System and method for detecting leakages in a fluid-bearing structure |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160356666A1 (en) * | 2015-06-02 | 2016-12-08 | Umm Al-Qura University | Intelligent leakage detection system for pipelines |
| CN107506781A (en) * | 2017-07-06 | 2017-12-22 | 浙江工业大学 | A kind of Human bodys' response method based on BP neural network |
| JP2019028839A (en) * | 2017-08-01 | 2019-02-21 | 国立研究開発法人情報通信研究機構 | Classifier, classifier learning method, classifier classification method |
| WO2020041204A1 (en) * | 2018-08-18 | 2020-02-27 | Sf17 Therapeutics, Inc. | Artificial intelligence analysis of rna transcriptome for drug discovery |
-
2020
- 2020-05-02 CN CN202010369142.0A patent/CN111553811B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160356666A1 (en) * | 2015-06-02 | 2016-12-08 | Umm Al-Qura University | Intelligent leakage detection system for pipelines |
| CN107506781A (en) * | 2017-07-06 | 2017-12-22 | 浙江工业大学 | A kind of Human bodys' response method based on BP neural network |
| JP2019028839A (en) * | 2017-08-01 | 2019-02-21 | 国立研究開発法人情報通信研究機構 | Classifier, classifier learning method, classifier classification method |
| WO2020041204A1 (en) * | 2018-08-18 | 2020-02-27 | Sf17 Therapeutics, Inc. | Artificial intelligence analysis of rna transcriptome for drug discovery |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115389121A (en) * | 2022-08-24 | 2022-11-25 | 中国核动力研究设计院 | Electric valve leakage mode identification method and device based on random forest |
| CN115389121B (en) * | 2022-08-24 | 2023-10-24 | 中国核动力研究设计院 | Electric valve leakage mode identification method and device based on random forest |
| CN115628776A (en) * | 2022-10-25 | 2023-01-20 | 杭州电子科技大学 | Water supply pipe network abnormal data detection method |
| CN116797051A (en) * | 2023-08-24 | 2023-09-22 | 青岛海洋地质研究所 | Ocean carbon leakage point number evaluation method based on multi-distance spatial cluster analysis |
| CN116797051B (en) * | 2023-08-24 | 2023-11-14 | 青岛海洋地质研究所 | Ocean carbon leakage point number evaluation method based on multi-distance spatial cluster analysis |
| EP4597063A1 (en) * | 2024-01-30 | 2025-08-06 | Grohe AG | System and method for detecting leakages in a fluid-bearing structure |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111553811B (en) | 2022-09-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111553811B (en) | Water supply pipe network leakage area identification method based on iterative machine learning | |
| CN111753101B (en) | A Knowledge Graph Representation Learning Method Integrating Entity Description and Type | |
| Hu et al. | Novel leakage detection and water loss management of urban water supply network using multiscale neural networks | |
| CN106022518B (en) | A Prediction Method of Pipeline Damage Probability Based on BP Neural Network | |
| CN108388559A (en) | Name entity recognition method and system, computer program of the geographical space under | |
| McManamay et al. | Updating the US hydrologic classification: an approach to clustering and stratifying ecohydrologic data | |
| CN114022812A (en) | A Multi-target Tracking Method for DeepSort Water Surface Floating Objects Based on Lightweight SSD | |
| CN115062109B (en) | Entity relationship joint extraction method based on entity-to-attention mechanism | |
| CN113269352B (en) | Urban waterlogging monitoring and early warning methods, systems and media based on mobile Internet | |
| CN109783979A (en) | A method for optimizing the layout of leakage monitoring sensors under semi-supervised conditions of urban water supply network | |
| CN105808689A (en) | Drainage system entity semantic similarity measurement method based on artificial neural network | |
| CN116258504B (en) | Bank customer relationship management system and method thereof | |
| CN116542170A (en) | Drainage pipeline siltation disease dynamic diagnosis method based on SSAE and MLSTM | |
| CN113642772A (en) | Logging reservoir identification and prediction method based on machine learning | |
| CN118088945A (en) | A method for detecting and locating leakage in urban water supply pipelines | |
| CN113970073B (en) | A ResNet-based method for precise location of water supply network leaks | |
| CN118705555A (en) | A method and device for monitoring water supply network leakage based on deep learning model | |
| CN117540277B (en) | A lost circulation warning method based on WGAN-GP-TabNet algorithm | |
| CN117540302A (en) | Heat supply pipe network leakage fault detection method and system based on random forest | |
| CN105631465A (en) | Density peak-based high-efficiency hierarchical clustering method | |
| CN119762877B (en) | A point cloud classification and segmentation method and system based on a global mask autoencoder with fused voxels. | |
| CN114880538A (en) | Attribute graph community detection method based on self-supervision | |
| CN108388769A (en) | Protein Functional Module Identification Method Based on Edge-Driven Label Propagation Algorithm | |
| CN117473344A (en) | Multidimensional time sequence clustering method combining neural network and self-organizing mapping network | |
| CN109657034A (en) | Address similarity calculating method and its system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |