GB2628750A - Water ingress detection in wastewater networks - Google Patents
Water ingress detection in wastewater networks Download PDFInfo
- Publication number
- GB2628750A GB2628750A GB2301386.5A GB202301386A GB2628750A GB 2628750 A GB2628750 A GB 2628750A GB 202301386 A GB202301386 A GB 202301386A GB 2628750 A GB2628750 A GB 2628750A
- Authority
- GB
- United Kingdom
- Prior art keywords
- wastewater
- asset
- data
- rainfall
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- E—FIXED CONSTRUCTIONS
- E03—WATER SUPPLY; SEWERAGE
- E03F—SEWERS; CESSPOOLS
- E03F1/00—Methods, systems, or installations for draining-off sewage or storm water
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- E—FIXED CONSTRUCTIONS
- E03—WATER SUPPLY; SEWERAGE
- E03F—SEWERS; CESSPOOLS
- E03F2201/00—Details, devices or methods not otherwise provided for
- E03F2201/20—Measuring flow in sewer systems
Landscapes
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Hydrology & Water Resources (AREA)
- Sewage (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
A computer-implemented method for detecting water ingress (INI, or infiltration and inflow) at a wastewater asset of a wastewater network comprising training an ensemble of machine learning (ML) models for the wastewater asset. Each model is configured to predict flow through the asset and is trained on a different training environmental and wastewater flow feature set, with at least one of the feature sets being associated with one or more types of water ingress. The models are scored based on one or more ML model performance metrics and a model is selected with the best score. The associated feature set is identified and ingress is detected based on whether it is identified with one or more types of water ingress. Further disclosed are apparatuses and systems utilising the method and a model trained by the method.
Description
WATER INGRESS DETECTION IN WASTEWATER NETWORKS Technical Field [0001] The present application relates to apparatus, systems and method(s) for detecting water ingress in wastewater networks.
Background
[0002] Wastewater networks include a plurality of wastewater assets (e.g. manholes, wastewater pumping stations and the like) interconnected by a plurality of wastewater pipes (e.g. sewer pipes, storm water drains, and the like) that receives wastewater which flows through the pipes under gravity to a waste water treatment works and the like. Wastewater includes storm water, sewerage, and/or any other wastewater run-off from roads, land, farms, homes and/or business premises that enter the wastewater network via gutters and/or private sewer/storm water drains.
[0003] The plurality of pipes in a wastewater network are meant to be sealed wastewater pipes so no external water can get in and wastewater can get out of the pipe.
In reality, wastewater networks, wastewater assets, manholes and all connections between individual small wastewater pipes that make up the wastewater network may not be perfectly sealed, components such as non-return valves and the like may fail, and/or may degrade over time. For example, wastewater pipe seals and/or wastewater pipes may crack over time due to ground movement and/or non-return valves may fail allowing water ingress into the wastewater pipe network. In certain circumstances, the wastewater pipes can become perforated and permeable and start to seep water from the outside. This is typically referred as infiltration and inflow (TNT) or water ingress.
[0004] Water ingress can occur through, without limitation, for example ground water levels being at certain height and getting into wastewater network, or when portions of wastewater networks are close to sea or rivers, there may be overflow wastewater pipes with non-return valves/mechanisms, which can malfunction or wear out, into which sea or river water may enter the wastewater network. Water ingress is a major problem especially with sea or river water ingress, which increases pH of wastewater and makes it harder to be treated, as well as, erodes wastewater pipes, pumping and equipment at wastewater treatment works (e.g. due to saline properties of sea water). Similar issues occur with ground water and/or flood water ingress through cracked wastewater pipes and seals and the like.
[000s] Water ingress is a large problem for wastewater and stormwater utilities over the world as such. Conventional wastewater networks may rely on an operator of the wastewater network to schedule routine maintenance of all wastewater pipes, seals and assets and other components/mechanisms throughout the year to minimise the occurrence water ingress and damage to the wastewater network. However, it can be costly, slow and difficult to pinpoint water ingress sites and/or target areas where water ingress is occurring within a wastewater network. This can lead to significant maintenance costs and/or public works when wastewater pipes and/or wastewater components such as non-return valves/pumps and other wastewater components completely fail. Such occurrences require major overhaul of sections of wastewater pipe networks such as, for example, pumping sealant into wastewater pipes to seal from inside, digging up and replacing wastewater pipes, tracking down and detecting overflow and/or non-return valve mechanisms going out to the sea/rivers have failed or are leaking and the like.
[0006] There is a desire for an improved wastewater management system that accurately, efficiently, and predictability detects water ingress and assists in pinpointing/targeting areas within wastewater networks and/or wastewater assets for the efficient management and maintenance of said networks and assets to reduce and/or prevent unnecessary water ingress and the resulting waster ingress damage and the like.
Summary
[0007] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter; variants and alternative features which facilitate the working of the invention and/or serve to achieve a substantially similar technical effect should be considered as falling into the scope of the invention disclosed herein.
[0008] In a first aspect, the present disclosure provides a computer-implemented method of detecting water ingress at a wastewater asset of a wastewater network, the method comprising: training an ensemble of ML models for the wastewater asset, each ML model configured for predicting wastewater flow through the wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics; selecting a trained ML model from the ensemble of trained ML models with the best score; identifying the training environmental and wastewater flow feature set used to train the selected trained ML model; detecting one or more types of water ingress for the wastewater asset based on whether the identified training environmental and wastewater flow feature set is associated with one or more types of water ingress.
[0009] As an option, the computer-implemented method further including, wherein each of the plurality of training environmental and wastewater flow feature sets includes wastewater flow data associated with the wastewater asset, rainfall data associated with the wastewater asset and a unique set of one or more water ingress datasets from a plurality of water ingress datasets.
[apt:no] As another option, the computer-implemented method further including, wherein each set of water ingress datasets comprising at least one or more from the group of: river level data; tidal data; ground water level data; flood water level data; any other type of environmental data affecting wastewater flow through a wastewater asset; any other type of environmental data associated with water ingress affecting wastewater flow through a wastewater asset; and any other environmental data external to the wastewater asset affecting the flow through the wastewater asset.
[oon] As a further option, the computer-implemented method further including, wherein the one or more types of water ingress comprising at least one from the group of: river water ingress; tidal water ingress; ground water ingress; flood water ingress; and any other type of water ingress affecting wastewater flow through a wastewater asset.
[0012] As an option, the computer-implemented method further including, wherein scoring each trained ML model and selecting a trained ML model further comprising: measuring prediction accuracy for each trained ML model of the ensemble of trained ML models; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics associated with measurement prediction accuracy; ranking all the trained ML models based on the scoring; and selecting the highest or topmost ranked trained ML model.
[0013] As another option, the computer-implemented method further including, wherein scoring each trained ML model and selecting a trained ML model further comprising: scoring and ranking each of the trained ML models of the ensemble 5 of trained ML models based on root mean squared error and mean squared error; selecting the highest or topmost ranked trained ML model.
[0014] As a further option, the computer-implemented method further including, wherein the training environmental and wastewater flow feature set comprises rainfall data associated with the wastewater asset, wherein the rainfall data associated with the wastewater asset is calculated based on a combination of first rainfall data corresponding to a first rainfall area the wastewater asset is located within, and one or more other rainfall data corresponding to rainfall areas adjacent to the first rainfall area.
[0015] As an option, the computer-implemented method further including, wherein calculating the rainfall data associated with the wastewater asset further comprising: calculating a hyper-local rainfall data at the location of the wastewater asset based a weighted combination of the first rainfall estimate and the one or more other rainfall data in relation to the location of the wastewater asset within the first rainfall area and the relative location of the wastewater asset to each of the one or more other rainfall areas.
[0016] As another option, the computer-implemented method further including, further comprising calculating the hyper-local rainfall data associated with the wastewater asset further comprising performing at least one of: a multivariate interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a three-dimensional interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a tri-linear interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a tri-cubic interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; or any other numerical, estimation or interpolation method for estimating the hyper-local rainfall data at the location of the wastewater asset based on at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas.
[oo17] As a further option, the computer-implemented method further including, wherein calculating the hyper-local rainfall estimate associated with the wastewater asset based on an interpolation method further comprising: dividing a rainfall grid area in which the wastewater asset is located within into quadrants; identifying the grid area quadrant of the rainfall grid area that the wastewater asset is located within; selecting at least three rainfall grid areas adjacent to the identified grid area quadrant the wastewater asset is located within; calculating a rectangle formed from the centers of the at least three rainfall grid areas and the rainfall grid area the wastewater asset is located within, wherein the wastewater asset is located within said rectangle; projecting the location of the wastewater asset onto each of the line segments or edges of the rectangle based on orthogonally projecting lines from the wastewater asset to each line segment or edge to form intersection locations on each line segment or edge for estimating intersection rainfall dataset estimates for said line segments or edges; calculating, for each line segment or edge of the rectangle, an intersection rainfall estimate dataset based a linear interpolation using distances between each center of the grid areas corresponding to said each line segment or edge and the intersection location for said each line segment and the corresponding rainfall datasets associated with said centers the grid areas; calculating, for each projection line, an intermediate rainfall estimate dataset based a linear interpolation using distances between each pair of intersection locations on said each projection line and said wastewater asset and the corresponding intersection estimate rainfall datasets associated with said intersection locations on said each projection line; and calculating a hyperlocal rainfall dataset for said wastewater asset based on averaging the intermediate rainfall estimate datasets.
[0m8] As an option, the computer-implemented method further including, wherein training each ML model of the ensemble of ML models further comprising performing training of said each ML model based on using an ML algorithm to train model parameters defining said each ML model for predicting mean wastewater flow through the wastewater asset based on the corresponding training environmental and wastewater flow feature set, wherein the corresponding training environmental and wastewater flow feature set comprises data representative of corresponding historical wastewater measurement data for the wastewater asset and historical environmental data comprising either: a) a historical rainfall data associated with the wastewater asset; or b) both historical rainfall data associated with the wastewater asset and one or more water ingress datasets.
[0019] As another option, the computer-implemented method further including normalising the historical wastewater measurement data based on the maximum and minimum capacity of the wastewater asset.
[0020] As an option, the computer-implemented method further including, processing the timeseries normalised historical wastewater measurement data for the wastewater asset to be synchronised with either: a) the timeseries rainfall data of the historical environmental data associated with the wastewater asset; or b) the timeseries rainfall data of the historical environmental data associated with the wastewater asset and one or more timeseries water ingress datasets of the historical environmental data associated with the wastewater asset.
[0021] As another option, the computer-implemented method further including, wherein training further comprising: performing hyperparameter tuning using the ML algorithm based on training a plurality of sets of ML models using different combinations of hyperparameters, each set of ML models comprising a mean ML model, wherein: the mean ML model is trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data and/or water ingress data as input; scoring and ranking each of the trained ML models of the plurality of sets of ML models based on root mean squared error and mean squared error; selecting the best ranked trained ML model; and generating the final trained ML model for predicting means wastewater flow using the selected mean trained ML model.
[0022] As a further option, the computer-implemented method further including, wherein training each ML model of the ensemble of ML models further comprising performing training of said each ML model based on using an ML algorithm to train model parameters defining said each ML model for predicting minimum and maximum thresholds associated with wastewater flow through the wastewater asset based on the corresponding training environmental and wastewater flow feature set comprising data representative of historical wastewater measurement data for the wastewater asset and historical environmental data comprising either: a) a historical rainfall data associated with the wastewater asset; or b) both historical rainfall data associated with the wastewater asset and one or more water ingress datasets.
[0023] As an option, the computer-implemented method further including, further comprising normalising the historical wastewater measurement data based on the maximum and minimum capacity of the wastewater asset.
[0024] As another option, the computer-implemented method further including, further comprising processing the timeseries normalised historical wastewater measurement data for the wastewater asset to be synchronised with either: a) the timeseries rainfall data of the historical environmental data associated with the wastewater asset; or b) the timeseries rainfall data of the historical environmental data associated with the wastewater asset and one or more timeseries water ingress datasets of the historical environmental data associated with the wastewater asset.
[0025] As an option, the computer-implemented method further including, wherein training further comprising: performing hyperparameter tuning using the ML algorithm based on training a plurality of sets of ML models using different combinations of hyperparameters, each set of ML models comprising a mean ML model, a minimum ML model and a maximum ML model, wherein: the mean ML model is trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data as input; the minimum ML model is trained and configured for predicting the time series minimum values in the normalised historical wastewater measurement data based on at least rainfall data as input; and the maximum ML model is trained and configured for predicting the time series maximum values in the normalised historical wastewater measurement data based on at least rainfall data as input; scoring and ranking each of the trained ML models of the plurality of sets of ML models based on root mean squared error and mean squared error; selecting the best ranked trained ML model; selecting the corresponding minimum and maximum trained ML models from the set of ML models that the selected best ranked trained ML model belongs; generating the final trained ML model for predicting minimum and maximum wastewater thresholds based on using the selected minimum and maximum trained ML models.
[0026] As an option, the computer-implemented method further including, wherein training further comprising: performing hyperparameter tuning of the ML algorithm based on training a plurality of ML models using different combinations of hyperparameters associated with the ML algorithm and training dataset, wherein each comprises a mean ML model trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data as input; scoring and ranking each of the trained mean ML models of the plurality of ML models based on root mean squared error and mean squared error model performance metrics; selecting the best ranked trained mean ML model; using the hyperparameters of the selected best ranked trained mean ML model to generate a corresponding minimum and maximum trained ML models, wherein: the minimum ML model is trained and configured for predicting the time series minimum values in the normalised historical wastewater measurement data based on at least rainfall data as input; and the maximum ML model is trained and configured for predicting the time series maximum values in the normalised historical wastewater measurement data based on at least rainfall data as input; generating the final trained ML model for predicting minimum and maximum wastewater thresholds based on using the corresponding minimum and maximum trained ML models.
[0027] As another option, the computer-implemented method further including, wherein the hyperparameters associated with the training dataset include a set of rainfall data time windows, wherein each rainfall data time window corresponds to, for each current rainfall data instance, inputting during training or inference the current rainfall data instance and a plurality of preceding rainfall data instances within said each rainfall data time window.
[0028] As an option, the computer-implemented method further including, wherein the historical rainfall data is a timeseries dataset with a time interval M between datapoints, and the historical wastewater measurement data is a timeseries dataset with a time interval N between datapoints, where M>=N, further comprising generating a synchronised historical wastewater measurement dataset that forms a timeseries dataset with a time interval M between datapoints based on calculating the mean, minimum and maximum for each i-th datapoint from those datapoints of the historical wastewater measurement data falling between the (i-1)-th datapoint and the i-th datapoint within said each time interval M, wherein the training dataset comprises the mean, minimum and maximum values of the synchronised historical wastewater measurement dataset.
[0029] As an option, the computer-implemented method further including, further comprising: performing a first data clean-up of the normalised synchronised historical wastewater measurement dataset based on: performing statistical analysis of the normalised synchronised historical wastewater measurement dataset for identifying blocks of outlier datapoints; generating a first clean wastewater measurement dataset based on removing the identified outlier datapoints from the normalised synchronised historical wastewater measurement dataset; and generating a first rainfall dataset based on removing the corresponding rainfall datapoints associated with the identified outlier datapoints from the historical rainfall data; performing second data clean-up of the first clean wastewater measurement dataset based on: performing further statistical analysis to analyse long and short-term average behaviour of the first clean wastewater measurement dataset for identifying, based on a ruleset, inaccurate of discontinuous measurement data for interpolation or removal; generating a second clean wastewater measurement dataset based on filtering the identified measurement data using interpolation or removal; and generating a second rainfall dataset based on removing the corresponding rainfall datapoints associated with the removed datapoints from the first clean wastewater measurement dataset from the historical rainfall dataset; performing a third data clean-up of the second clean wastewater measurement dataset based on: identifying from the second clean wastewater measurement dataset exclusion events comprising one or more of: a) blockage and sensor fault events; b) rainfall events; c) dry weather events; and/or d) other feature events causing noisy or spurious data; generating a clean wastewater measurement dataset based on removing the blockage and sensor fault events and other feature events causing noise or spurious data from the second clean wastewater measurement dataset; and generating a clean rainfall dataset based on removing the corresponding rainfall datapoints associated with the removed identified outlier datapoints from the historical rainfall data; and generating the training dataset based on the clean wastewater measurement dataset and the clean third rainfall dataset.
[0030] As another option, the computer-implemented method further including" further comprising generating a dry weather dataset for the wastewater asset based on removing identified rainfall events from the clean wastewater measurement dataset.
[0031] As an option, the computer-implemented method further including, further comprising: training a dry weather ML model based on using an ML algorithm to train model parameters defining the dry weather ML model for predicting minimum and maximum dry weather thresholds associated with wastewater flow through the wastewater asset for use in water ingress detection based on a training dry weather dataset comprising data representative of the generated dry weather dataset; and training a wet weather ML model based on using the ML algorithm to train model parameters defining the wet weather ML model for predicting minimum and maximum wet weather thresholds associated with wastewater flow through the wastewater asset for use in water ingress detection based on a training dataset comprising data representative of the clean wastewater measurement dataset and the clean third rainfall dataset associated with the wastewater asset; forming a trained ML model based on the trained dry weather ML model and trained wet weather ML model, wherein the trained ML model is configured to predict minimum and maximum wastewater thresholds, where the predicted minimum wastewater threshold comprises a combination of the predicted minimum dry weather threshold and the predicted minimum wet weather threshold, and the predicted maximum wastewater threshold comprises a combination of the predicted maximum dry weather threshold and the predicted maximum wet weather threshold.
[0032] As another option, the computer-implemented method further including, performing statistical analysis of the normalised synchronised historical wastewater measurement dataset for identifying blocks of outlier datapoints further comprising: generating a histogram dispersion graph for the normalised synchronised historical wastewater measurement dataset; identifying the outlier blocks, if any, in the histogram dispersion graph based on comparing the histogram dispersion graph with an ideal histogram data pattern associated with the wastewater asset; generating the first clean wastewater dataset based on removing any identified outlier blocks from the normalised synchronised historical wastewater measurement dataset.
[0033] As an option, the computer-implemented method further including, wherein the ML algorithm comprising at least one from the group of: regression learning algorithm; neural network; extreme gradient boost regressor algorithm; Adaptive Boosting algorithm; Gradient boosting algorithm; any other statistical classification meta-algorithm; any other ML algorithm suitable for training model parameters of an ML model for tracking the behaviour of wastewater flow through a wastewater asset and for predicting data representative of a minimum wastewater threshold and maximum wastewater threshold for said wastewater asset.
[0034] As another option, the computer-implemented method further including, wherein the ML algorithm comprises a regression learning algorithm based on one or more of: extreme gradient boost regressor algorithm; Adaptive Boosting algorithm; Gradient boosting algorithm; any other statistical classification meta-algorithm, boosting algorithm or regression algorithm suitable for training model parameters of an ML model for tracking the behaviour of wastewater flow through a wastewater asset and for predicting data representative of a minimum wastewater threshold and maximum wastewater threshold for said wastewater asset.
[0035] As an option, the computer-implemented method further including, further comprising: performing water ingress detection at each of a plurality of wastewater assets of the wastewater network according to the computer-implemented of any preceding claim; determining a set of wastewater assets connected together in a daisychain having a correlation above a threshold correlation with said each type of water ingress; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.
[0036] As another option, the computer-implemented method further including, wherein pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress further comprising: determining a set of wastewater assets connected together in a daisychain; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.
[0037] As an option, the computer-implemented method further including, wherein the set of wastewater assets in the daisychain comprise a first wastewater asset upstream of the other wastewater assets in the daisychain and a last wastewater asset in the daisychain downstream of all the other wastewater assets in the daisychain.
[0038] In a second aspect of this specification, there is disclosed a water ingress detection apparatus for detecting water ingress at one or more of a plurality of wastewater assets of a wastewater network, the water ingress detection apparatus comprising: a water ingress machine learning, ML, unit configured for: training an ensemble of ML models for each wastewater asset, each ML model configured for predicting wastewater flow through said each wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets associated with said each wastewater asset, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics; selecting a trained ML model from the ensemble of trained ML models having the best score; and identifying the training environmental and wastewater flow feature set used to train the selected trained ML model for said each wastewater asset; a water ingress detection unit configured for: detecting one or more types of water ingress for said each wastewater asset based on whether the identified training environmental and wastewater flow feature set for said each wastewater asset is associated with one or more types of water ingress.
[0039] As an option, the water ingress detection apparatus further including, wherein the ML unit and detection unit are configured for implementing the corresponding steps of the computer-implemented method according to the first aspect and/or any of the features and/or options in relation to the first aspect.
[0040] As another option, the water ingress detection apparatus further including, wherein the water detection apparatus further configured for: performing water ingress detection at each of a plurality of wastewater assets of the wastewater network by applying the water ingress ML detection unit to each wastewater asset; determining for each wastewater asset of the plurality of wastewater assets the correlation of the type of water ingress affecting said each wastewater asset; and for each type of water ingress detected in the plurality of wastewater assets, pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress.
[0041] As a further option, the water ingress detection apparatus further including a pinpointing analysis unit configured for: pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress by: determining a set of wastewater assets connected together in a daisychain having a correlation above a threshold correlation with said each type of water ingress; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.
[0042] As another option, the water ingress detection apparatus further including, wherein the set of wastewater assets in the daisychain comprise a first wastewater asset upstream of the other wastewater assets in the daisychain and a last wastewater asset in the daisychain downstream of all the other wastewater assets in the daisychain.
[0043] In a third aspect of this specification, there is disclosed an apparatus comprising one or more processors, a memory and a communication interface, the one or more processors connected to the memory and communication interface, wherein the apparatus is configured to implement the computer-implemented method according to the first aspect and/or any of the features and/or options in relation to the first aspect and/or second aspect.
[0044] In a fourth aspect of this specification, there is disclosed a wastewater management system comprising: a wastewater network comprising a plurality of wastewater assets, wherein each wastewater asset comprises a sensor for measuring data representative of wastewater passing through said each wastewater asset; and an water ingress detection apparatus according to the second aspect and/or any of the features and/or options in relation to the first aspect and/or second aspect; wherein: the water ingress detection apparatus is configured for detecting water ingress at one or more wastewater assets of the plurality of wastewater assets of said wastewater network.
[0045] In a fifth aspect of this specification, there is disclosed a computer-readable medium comprising data or instruction code, which when executed on a 25 processor, causes the processor to implement the computer-implemented method according to the first aspect and/or any of the features and/or options in relation to the first aspect.
[0046] In a sixth aspect of this specification, there is disclosed a machine learning model configured for predicting wastewater flow for a wastewater asset of a wastewater network given rainfall data and/or one or more water ingress data as input and obtained according to the computer-implemented method according to the first aspect and/or any of the features and/or options in relation to the first aspect.
[0047] According to a seventh aspect of this specification, there is disclosed a non-transitory tangible computer-readable medium comprising data or instruction code, which when executed on one or more processor(s), causes at least one of the one or more processor(s) to perform the steps of the method of: detecting water ingress at a wastewater asset of a wastewater network, the method further comprising: training an ensemble of ML models for the wastewater asset, each ML model configured for predicting wastewater flow through the wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; measuring prediction accuracy for each trained ML model of the ensemble of trained ML models; selecting a trained ML model from the ensemble of trained ML models having the best prediction accuracy; identifying the training environmental and wastewater flow feature set used to train the selected trained ML model; detecting one or more types of water ingress for the wastewater asset based on whether the identified training environmental and wastewater flow feature set is associated with one or more types of water ingress.
[0048] As an option, the non-transitory tangible computer-readable medium comprising data or instruction code, which when executed on one or more processor(s), causes at least one of the one or more processor(s) to perform the steps of the computer-implemented method according to the first aspect and/or any of the features and/or options in relation to the first aspect.
Brief Description of the Drawings
[0049] Embodiments of the invention will be described, by way of example, with reference to the following drawings, in which: [0050] Figure is illustrates an example ML wastewater management system according to some embodiments of the invention; [0051] Figure ib illustrates an example ML water ingress detection apparatus for detecting water ingress in ML wastewater management system of figure to according to some embodiments of the invention; [0052] Figure lc illustrates an example water ingress detection process according to some embodiments of the invention; [0053] Figure id illustrates an example water ingress target area process for use with water ingress detection process of figure lc according to some embodiments of the invention; [0054] Figure ie illustrates an example water ingress pinpointing process for use with water ingress target area process of figure id according to some embodiments of the invention; [0055] Figure if illustrates an example ML model for use in ML wastewater management system of figure la, ML water ingress apparatus of figure ib and/or processes of figures lc to le according to some embodiments of the invention; [0056] Figure ig illustrates another example ML model for use in ML wastewater management system of figure ia, ML water ingress apparatus of figure ib and/or processes of figures ic to ie according to some embodiments of the invention; [0057] Figure ih illustrates yet another example ML model for use in ML wastewater management system of figure ia, ML water ingress apparatus of figure ib and/or processes of figures ic to ie according to some embodiments of the invention; [0058] Figure ii illustrates a further example ML model for use in ML wastewater management system of figure ia, ML water ingress apparatus of figure ib and/or processes of figures ic to ie according to some embodiments of the invention; [0059] Figure 2 illustrates an example data processing pipeline according to some embodiments of the invention; [0060] Figure 3 illustrates an example ML model training and generation process according to some embodiments of the invention; [0061] Figure 4 illustrates an example ML water ingress detection process according to some embodiments of the invention; [0062] Figure 5 illustrates an example histogram plot of wastewater measurement data according to some embodiments of the invention; [0063] Figure 6 illustrates another example histogram plot of wastewater measurement data according to some embodiments of the invention; [0064] Figure 7 illustrates an example exclusion event detection process according to some embodiments of the invention; [0065] Figure 8a illustrates an example hyper-local rainfall calculation diagram according to some embodiments of the invention; [oo66] Figures 8b-8e illustrates an example hyper-local rainfall calculation according to some embodiments of the invention; [0067] Figure 9 illustrates an example plot illustrating an example ML model output for normal wastewater flow or level for a wastewater asset of a wastewater network according to some embodiments of the invention; [oo68] Figure ma illustrates an example of ML wastewater management system of figure la for water ingress detection according to some embodiments of the invention; [0069] Figure mb illustrates an example plot representing wastewater flow through a wastewater treatment works pumping station in the ML wastewater management system of figure ma according to some embodiments of the invention; [0070] Figure ioc illustrates an example scoring and ranking of multiple ML model ensembles for wastewater treatment works trained against different types of possible water ingress datasets according to some embodiments of the invention; [0071] Figure Ind illustrates an example river level flow plots that may provide possible sources of water ingress to the wastewater treatment works of ML wastewater management system of figure ma according to some embodiments of the invention; [0072] Figure be illustrates an example plot of the output of an ML model trained against a first river source in relation to wastewater flow through wastewater treatment works of ML wastewater management system of figure ma; [0073] Figure iof illustrates an example plot of the output of an ML model trained against a second river source in relation to wastewater flow through wastewater treatment works of ML wastewater management system of figure ma; [0074] Figure na illustrates another example scoring and ranking of multiple ML model ensembles for wastewater treatment works of ML wastewater management system ina when trained, at another period of time, against different types of possible water ingress datasets according to some embodiments of the invention; [0075] Figure nb illustrates an example plot of the output of an ML model trained against a rainfall data only in relation to wastewater flow through wastewater 5 treatment works of ML wastewater management system of figure iia; [0076] Figure tic illustrates an example plot of the output of an ML model trained against a tidal source in relation to wastewater flow through wastewater treatment works of ML wastewater management system of figure iia; [0077] Figure 12 illustrates a computing system according to some embodiments of the invention; [0078] Figure 13 illustrates a computer readable medium according to some embodiments of the invention.
[0079] Common reference numerals are used throughout the figures to indicate similar features.
Detailed Description
[oo8o] Figure la illustrates an example machine learning (ML) wastewater management system 100 according to some embodiments of the invention. The ML wastewater management system 100 includes a wastewater network 102 including a plurality of wastewater assets 104a-104m and 105a-105p (also known as sewer assets or sites) connected together via wastewater pipes 106a-106n (also known as sewer/storm water pipes and/or drains), which form the wastewater network 102. The plurality of wastewater assets 104a-104m each include at least one sensor of a plurality of sensors 108a-108m. Not every wastewater asset in the wastewater network 102 is necessarily sensored, for example a group of wastewater assets 105a-105p of the plurality of wastewater assets 104a-1mm and 105a-m5p may be unsensored, where in this example, the group of wastewater assets 105a-105p of wastewater network 102 are shown to be unsensored. Although a group of wastewater assets 105a-ici5p are shown to be unsensored, this is by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that wastewater network 102 may configured to include sensored and unsensored wastewater assets, or include wastewater assets that are all sensored as the application demands.
[oolli] Each of the plurality of wastewater assets 104a-1o4m and 1n5a-105p are connected to one or more other of the plurality of wastewater assets 1o4a-1o4m and io5a-io5p via one or more wastewater pipes io6a-io6n. For example, wastewater asset io4a is connected to unsensored wastewater asset io5a via a wastewater pipe io6a, wastewater asset in4d is connected to wastewater asset 1n4b via a wastewater pipe 1o6c, wastewater asset 1o4c is connected to wastewater asset 1o4b via a wastewater pipe io6b, wastewater asset io4i is connected to another wastewater asset 104j via a wastewater pipe io6i, and so on. Although the wastewater network 102 has a plurality of sensored wastewater assets. In this example, each wastewater asset io4i of the plurality of wastewater assets 1o4a-104m includes at least one sensor 1081 of the plurality of sensors io8a-io8m, each sensor io8i of the corresponding wastewater asset 104i being configured to provide sensor measurements in relation to wastewater flowing through the corresponding wastewater asset io4i. The at least one sensor io8i of a wastewater asset io6i is configured for performing time series data measurements associated with wastewater flow (e.g. Sewer Level Measurement (SLM) data, Sewer Flow data, Sewer Flow Velocity Data), or an amount of wastewater, passing through the wastewater asset io4i. Each of the time series data measurements associated with wastewater flow produced by each sensor 1o8i may be timestamped and stored as historical wastewater flow measurements/data for use in training one or more ML models for predicting wastewater flows and the like.
[0082] Each of the sensors io8a-io8m may comprise or represent any type of sensor configured for measuring an amount or flow of wastewater at the corresponding wastewater asset. For example, a sensor may comprise at least one sensor from the group of: a wastewater level sensor; a wastewater flow sensor; a wastewater pressure sensor; a current pumping sensor; and/or any other sensor configured for performing measurements associated with the wastewater flow through a wastewater asset.
[0083] In this example, the wastewater network 102 receives wastewater comprising storm water, sewer water, and/or any other wastewater run-off from roads, land, farms, homes and/or business premises (not shown) that enter the via gutters and/or sewer/storm water drains (not shown) into the wastewater pipes in6a-1o6n of wastewater network 102. Each wastewater asset 104a-104k may be a manhole and/or human accessible section or site of the wastewater network 102 and/or a wastewater pumping station 1o8m for directing the wastewater via pipes io6a-io6n to water treatment works for treatment. One or more of the wastewater assets io4c or io4j may have an overflow mechanism/pipe 107a or 107b to prevent wastewater from flooding out of one or more wastewater assets 1n4a-m4m in the event of excessive wastewater flow caused by environmental events such as, without limitation, for example storms and/or excessive rainfall or manmade events such as, without limitation, for example burst drinking water mains/pipes and the like, where the wastewater may flood and contaminate land, homes, businesses, and the like. In this example, wastewater asset 1o4c has an overflow pipe io7a that allows overflow wastewater to exit the wastewater network 102 via a river 116 and wastewater asset lortj has an overflow pipe 1o7b that allows overflow wastewater to exit wastewater network 102 via the sea n8.
[00841 The overflow mechanisms/pipes 1o7a, 107b are only meant to be used in emergencies or extreme events where the wastewater network 102 may be overwhelmed. The overflow mechanisms/pipes 107a, 107b also have various components (not shown) such as, without limitation, for example non-return valves or mechanisms that allow wastewater to egress from the overflow mechanisms/pipes 1o7a, lo7b into the river n6 and sea n8 but prevent river water and/or sea water from entering the overflow mechanisms/pipes 1o7a, lo7b. These overflow mechanisms/pipes io7a,1071) need to be maintained to prevent the flow of river water and/or sea water into wastewater network 102 via one or more wastewater assets 1o4a104m, wastewater pipes 1o6a-1o6p, and/or flowing towards wastewater asset 1o4m, which may be a wastewater treatment works with pumps and other machinery and the like. However, the non-return valves or mechanisms may fail or deteriorate over time causing seepage and/or a flood of river water / sea water to flow through the one or more wastewater assets 1o4a-1o4m of wastewater network 102 via overflow mechanisms / pipes lo7a, lo7b. The wastewater water ingress detection apparatus no may be configured for detecting water ingress from outside the wastewater network 102 into the wastewater pipes 106a-m6p by monitoring the wastewater sensors in8a-108m of each of the wastewater assets 1n4a-m4m and ensuring one or more types of water ingress are able to be detected and identify a target area of one or more wastewater assets 1o4a-lo4m and/or wastewater pipes 1o6-1(36p in which each type of water ingress is likely to be occurring. Thus, maintenance crews may be sent to one or more target areas to inspect and/or maintain/repair the corresponding wastewater pipes 1o6a-lo6p and/or overflow mechanisms 1o7a, 6:37b within each target area. This provides the advantage of minimising waste ingress into the wastewater network 102 and extending the lifetime of the wastewater assets 1o4a-m4m, wastewater pipes 1o6a- 106p, overflow mechanisms 107a, 10713, and/or pumps and other machinery of wastewater treatment works of wastewater asset 104m and the like.
[oo85] The wastewater network 102 further includes a ML water ingress detection apparatus no including a ML unit noa, a data ingestion unit nob, and an water ingress detection unit noc (WTDT) connected together and configured for receiving and processing environmental data 112 (e.g. rainfall, river levels, tidal levels, and/or ground water levels) and wastewater measurement data 114 from the plurality of wastewater assets 104a-104m. Environmental data 112 includes river level data, groundwater level data, flood water data, tidal data and the like, which may be generated from corresponding river level sensors/services 1o9a-109b, groundwater sensors/services 109c-109d, floodwater sensors/services io9e, tidal sensors/services 109f-109h and the like located in and around the area of the wastewater network 102 and one or more of wastewater assets 104a-104n of the wastewater network 102. The data ingestion unit nob receives wastewater measurements 114 from wastewater assets 104a-lo4m and environmental data 112 from the rainfall sensors/services and/or other types of environmental data sensors/services lo9a-lo9h.
[oo86] The wastewater measurements 114 for each wastewater asset 104a of the plurality of wastewater assets 1o4a-io4m that is received by the data ingestion unit may be stored as historical wastewater measurements associated with each wastewater asset 104a of the plurality of wastewater assets 104a-104m. Similarly, the environmental data 112 received from the rainfall sensors/services and/or other types of environmental data sensors/services lo9a-lo9h associated with wastewater network and/or wastewater assets 1o4a-lo4m may be used to generate historical environment data including rainfall data and other types of historical environmental data associated with each of the wastewater assets 104a-104m.
[oo87] The historical wastewater measurements associated with each wastewater asset lima of the plurality of wastewater assets 104a-104m, and the historical environment data including rainfall data and other types of historical environmental data associated with each of the wastewater assets 104a-104m may be used to generate a different training environmental and wastewater flow feature set for each of a plurality of training environmental and wastewater flow feature sets for said each wastewater asset 104a, where all of the plurality of training environmental and wastewater flow feature sets includes rainfall data associated with the wastewater asset 104a, but at least one of the plurality of training environmental and wastewater flow feature sets includes at least one type of environmental dataset (e.g. river level data, groundwater data, flood water data, tidal level data, and other external water ingress datasets) that is associated with one or more types of water ingress. Each of the plurality of training environmental and wastewater flow feature sets are different and include a unique combination of one or more historical environmental data associated with wastewater asset 104a.
[oo88] For example, a first or ground truth training environmental and wastewater flow feature set for wastewater asset 104a may include historical wastewater measurements measured from the sensor io8a of wastewater asset io4a and historical rainfall data associated with wastewater asset 104a. A second training environmental and wastewater flow feature set for wastewater asset u34a may include historical wastewater measurements measured from the sensor io8a of wastewater asset 104a, historical rainfall data associated with wastewater asset mita, and historical river level data associated with a river source/sensor location. A third training environmental and wastewater flow feature set for wastewater asset u34a may include historical wastewater measurements measured from the sensor io8a of wastewater asset to4a, historical rainfall data associated with wastewater asset lap, and historical groundwater level data. A fourth training environmental and wastewater flow feature set for wastewater asset 104a may include historical wastewater measurements measured from the sensor 108a of wastewater asset 104a, historical rainfall data associated with wastewater asset)(ma, and historical flood level data associated with a flood water source. A fifth training environmental and wastewater flow feature set for wastewater asset 1o4a may include historical wastewater measurements measured from the sensor 108a of wastewater asset 104a, historical rainfall data associated with wastewater asset 104a, and historical tidal level data associated with a tidal source/sensor location. A sixth training environmental and wastewater flow feature set for wastewater asset intla may include historical wastewater measurements measured from the sensor 1o8a of wastewater asset io4a, rainfall data associated with wastewater asset 104a, and the combination of historical river level data and historical tidal level data. Such that each of the plurality of training environmental and wastewater flow feature sets are different and include a unique combination one or more types of historical environmental data including, without limitation, historical rainfall data, historical river level data, historical groundwater data, historical flood data, historical tidal level data and other types of water ingress datasets associated with wastewater asset 104a.
[0089] ML unit ma may make use of the generated plurality of training environmental and wastewater flow feature sets associated with each wastewater asset for use in training a corresponding plurality of ML models for each of the ML model ensembles moa-mom associated with each wastewater asset fo4p-fo4m. For example, the ML unit in may receive a generated plurality of training environmental and wastewater flow feature sets associated with wastewater asset fo4a, each of which may be used to train one of the ML models of the ML model ensemble moa of wastewater asset fo4a. Each ML model may be trained to predict wastewater flow through wastewater asset fo4a based on a particular training environmental and wastewater flow feature sets of the plurality of training environmental and wastewater flow feature sets for wastewater asset fo4a. Thus, the ML unit ma is configured for training a plurality of ML models in an ML model ensemble moa for each wastewater asset fo4a of the plurality of wastewater assets fo4a-fo4m. Each ML model being trained for predicting wastewater flow (e.g. predicted mean wastewater flow levels, and/or minimum and maximum wastewater thresholds/levels) through said wastewater asset fo4a in response to environmental data 112 as input. Each ML model in the ML model ensemble moa for said wastewater asset fo4a being trained on a different training environmental and wastewater flow feature set of the plurality of training environmental and wastewater flow feature sets for wastewater asset io4a.
[0090] The data representative of the resulting trained ML model ensembles 120a-120M for each of the wastewater assets fo4a-fo4m are passed to WTDT unit floc for assessment and identification of one or more types of water ingress that may be occurring for each of the wastewater assets fo4a-fo4m. The assessment for each wastewater asset io4a-fo4m may be based on scoring the trained ML models in each of the corresponding ML model ensemble 102a-102M using, for example, ML model performance metrics or prediction accuracy (e.g. root mean square error (RAISE), mean square error (MSE), or mean absolute error (MAE)) and the like. The best scoring or topmost ranked trained ML model in each corresponding ML model ensemble may be selected for the corresponding wastewater asset. Then, for each selected trained ML model for each wastewater asset, the corresponding particular training environmental and wastewater flow feature set used to train that selected trained ML model is analysed to identify whether any types of environmental datasets associated with water ingress were used in training the trained ML model, and if so, water ingress is detected for that wastewater asset and also the types of wastewater ingress are identified based on those water ingress datasets used in the training environmental and wastewater flow feature set.
[oo91] For example, for wastewater asset 104a, each of the plurality of ML models in the ML model ensemble izoa for wastewater asset to4a is scored and ranked, where the topmost ranked or best scoring ML model of the wastewater asset to4a is selected. Then the selected ML model of the ML model ensemble i2oa is analysed to identify whether water ingress is likely to be occurring, and if so, the type of water ingress that is detected to be occurring in relation to wastewater asset iorta. The particular training environmental and wastewater flow feature set used to train the selected ML model for wastewater asset 104a is analysed to identify those historical environmental datasets that were used to train said selected ML model. If the historical environmental dataset only included rainfall data for wastewater asset 104a, then it may be determined water ingress is not occurring. However, if the historical environmental dataset includes any other type of historical environmental datasets such as, without limitation, for example one or more of historical river level, historical groundwater level, historical flood water level, historical tidal level and/or other historical water ingress datasets, then the WIDT may identify and detect that water ingress is occurring for wastewater asset 1o4a. The types of water ingress datasets in the training environmental and wastewater flow feature set for the selected ML model may then be indicated and reported accordingly. This type of analysis, scoring, ranking is performed for each of the ML model ensembles 12oa-12om of each of the wastewater assets 1o4a-1o4m, or for a subset of wastewater assets of the plurality of wastewater assets 1o4a-1o4m depending on the application. Thus, for each wastewater asset 1o4a of the plurality of wastewater assets 104a-104m and/or a subset thereof, one or more types of water ingress may be detected and identified by WIDT unit noc based on the best performing ML model of each of the ML model ensembles 120a-120M associated with each of the wastewater assets 104a-104m.
[oo92] In another example, the ML models in each ML model ensemble izoa may be trained for each wastewater asset 104a based on different training environmental and wastewater flow feature set. Thus, each wastewater asset 104a has a plurality of ML models built and scored with rainfall data only, and/or rainfall data and river level data, rainfall data and tidal levels, rainfall data and ground water levels for that particular wastewater asset 104a or site, and the like. By measuring the prediction accuracy or scoring each of the ML models in the ML model ensemble izoa, each of the trained ML models may then be ranked based on the scoring. This means there is a listing of trained ML models that have been trained on various types of environmental data including water ingress data, i.e. ML models trained with / without river level data, ML models trained with / without tidal patterns, ML models trained with/without ground water, thus the best ML model may be identified based on the topmost scoring ML model, which means this is the ML model that more accurately predicts or correlates with its training environmental and wastewater flow feature set. Thus, by looking at the training environmental and wastewater flow feature set of the best performing ML model in the list or ranking of ML models, the water ingress may be detected and also the type of water ingress may be identified based on whether water ingress datasets are included in the training environmental and wastewater flow feature set of the best ML model.
[0093] The WIDT unit floc may then use the identified types of water ingress for each of the plurality of wastewater assets 104a-1o4m to pinpoint those wastewater assets 1o4a-to4m having the best trained ML model or ML model with the highest correlation (e.g. best scoring or ML performance metric) in relation to each type of water ingress. Thus, one or more wastewater assets 104a-104b may be identified to have the best scoring trained ML model associated with one or more types of water ingress. This information may be used to pinpoint a target water ingress area comprising one or more wastewater assets 104a-104b that are likely to be affected by a particular type of water ingress or one or more of the same types of water ingress. These target water ingress areas may be used to assist maintenance personnel for narrow down and identify the wastewater pipes / wastewater assets / components / pumps etc. associated with the target water ingress area that may require maintenance to reduce and/or prevent the identified type of water ingress.
[0094] Figure tb is a schematic diagram illustrating an example ML water ingress detection apparatus no according to some embodiments. Reference is made to wastewater network 102 of figure ta. The ML water ingress detection apparatus no includes ML unit ttoa, data ingestion unit nob, and water ingress detection unit 110e, which may be communicatively coupled together. The ML water ingress detection apparatus 110 may also be implemented using one or processing units, memory and/or communication interfaces, which are connected together and configured to implement the functionality of ML unit noa, data ingestion unit nob, and water ingress detection unit (WIDT) floc. The ML unit noa is configured for using a plurality of trained ML model ensembles 120a-120M for predicting wastewater flow through each of the corresponding wastewater assets 104a-104m in response to environmental data 112 including, without limitation, for example one or more types of environmental data from the group of: time series rainfall data 112a, time series river level data 112b, a time series tidal level data 112C, time series ground water level 112d, time series flood water data, and/or any other time series water ingress data associated with other sources of water ingress and the like etc. Each of the trained ML model ensembles 12oa-12om each include a plurality of corresponding ML models i2oa-a to 120a-s, 12(3i-a to 120i-s, nom-a to i2om-s for corresponding wastewater assets 1o4a-io4m. Each plurality of ML models 12oi-a to 1201-s in an ML model ensemble 1201 of wastewater asset 104i may be trained on different training environmental and wastewater measurement data feature sets associated with wastewater asset 104i.
[0095] The data ingress unit nob may receive environmental data from the group of: time series rainfall data 112a, time series river level data 112b, a time series tidal level data n2c, time series ground water level 112d, time series flood water data, and/or any other time series water ingress data associated with other sources of water ingress and the like etc., and form a set of historical environment data for use by ML unit noa in training one or more ML models and the like. As well, the data ingress unit nob may receive wastewater measurement data 114a-114m from corresponding wastewater sensors 108a-108m of wastewater assets 104a-104m, which may be stored as historical wastewater measurement data 114a-n4m associated with each of the wastewater assets 1o4a-1o4m. The training environmental and wastewater measurement data feature sets may be generated based on the stored historical environmental data 112 and the stored historical wastewater measurement data associated with each of the wastewater assets 104a-104m. Each of the training environmental and wastewater measurement data feature sets includes historical wastewater data 114i associated with wastewater asset 104i (e.g., historical minimum, maximum and/or mean wastewater flow measurements/data), and a different combination of the historical rainfall data set 112a associated with wastewater asset io4i and one or more types of historical environmental data instances 112b-112e associated with water ingress.
[0096] For example, the ML unit noa may be configured to train an ML model izoi-a of the plurality of ML models 12oi-a to 120i-s of ML model ensemble 12oi by iteratively applying the corresponding training environmental and wastewater measurement data feature set as a training dataset to an ML algorithm configured to predict, in each iteration, mean wastewater flow or minimum and maximum wastewater thresholds of wastewater flowing through wastewater asset io4i. In each iteration, these predictions are compared with corresponding historical mean, minimum and/or maximum wastewater measurement data 114i, and the parameters/weights and the like of the ML algorithm is updated based on the comparison. This process is repeated until the predictions substantially match the corresponding historical wastewater measurement data for wastewater asset io4i (e.g. matching within an error threshold or maximum number of iterations is reached, etc.).
The final updated ML algorithm may be scored to determine the prediction accuracy (e.g. based on RMSE or MSE) of the trained ML model 12oi-a configured for predicting wastewater mean or minimum and maximum thresholds in relation to wastewater asset 1041.
[0097] In operation, the ML water ingress detection apparatus no is configured to receive via data ingestion unit nob data representative of at least: a) wastewater measurements 114 associated with the wastewater flow from each of the sensors ro8aro8n of wastewater assets ro4a-ro4m, and b) environmental data 112 associated with each of the wastewater assets ro4a-ro4m. The environmental data 112 may be timestamped and provided at regular time intervals (e.g. 5 minutes, 10 minutes, 15 minutes, 30minutes, hourly, and the like) and may include, without limitation, for example current rainfall data n2a, river level data 112b, groundwater level data 112C, flood water data 112d, tidal data 112e and/or any other type of water ingress dataset and the like. The environmental data 112 includes rainfall data n2a, river level data 112b, groundwater level data inc. flood water data 112d, tidal data 112e and/or any other type of water ingress data and the like. The river level data 112b, groundwater level data 112C, flood water data 112d, tidal data 112e may be generated from corresponding river level sensors/services 109a-109b, groundwater sensors/services 109c-109d, floodwater sensors/services 1o9e, tidal sensors/services 1o9f-1o9h from corresponding sources and the like located in and around the area of the wastewater network 102 and one or more of wastewater assets 104a-1o4m of the wastewater network 102.
[0098] Each of the wastewater assets 104a-104m is associated with corresponding trained ML model ensemble 120a-12om trained by ML unit noa. Each ML model ensemble 12oi of the ML model ensembles 12oa-12om includes a plurality of ML models 12oi-a to 120i-s, each of which is trained to predict wastewater flow through the corresponding wastewater asset 104i using a corresponding plurality of unique training environmental and wastewater measurement data feature sets associated with wastewater asset 1041. Each training environmental and wastewater measurement data feature set includes a unique set of historical time series timestamped environmental data 112 (e.g. a unique combination or permutation of rainfall data, tidal data, ground water data, flood water data, and/or river level data and the like) and historical time series timestamped wastewater measurement data 114 associated with the wastewater asset 104i (or historical minimum, maximum, and/or mean time series timestamped wastewater measurement data derived from historical time series timestamped wastewater measurement data), which may be retrieved from data ingestion unit nob.
in particular, each training environmental and wastewater measurement data feature set includes wastewater measurement data associated with the wastewater asset io4i and an environmental dataset that includes either: a) rainfall data associated with the wastewater asset 1o4i; or b) rainfall data associated with the wastewater asset io4i and one or more other environmental datasets associated with types of water ingress from the group of: river level data 112b, groundwater level data 112C, flood water data 112d, tidal data 112e and/or any other type of water ingress dataset and the like. Each of the plurality of training environmental and wastewater measurement data feature set for wastewater asset 104i is unique and different.
[0099] Each of the plurality of trained ML models 12oi-a to 1201-s of trained ML model ensemble 1201 for wastewater asset 1o4i is trained and configured for predicting, in wastewater flow through wastewater asset io4i. The predicted wastewater flow may include, without limitation, for example mean wastewater flow, minimum and maximum wastewater thresholds associated with the expected wastewater flow through the corresponding wastewater asset 104i, and/or any other type of wastewater flow that is capable of being predicted when environmental data 112 is input to the corresponding ML model i2oi-a of the plurality of ML models i2oi-a to 120i-s of ML model ensemble 12oi for the wastewater asset 104i.
[ooloo] For each of the plurality of wastewater assets 1o4a-1o4m, the ML unit noa may send the corresponding trained ML model ensembles 120a-120M to the water ingress detection unit noc. For each trained ML model ensemble 1201 of the trained ML model ensembles 120a-12om, the water ingress detection unit noc may determine which of the plurality of ML models 12oi-a to izoi-s of the trained ML model ensemble 12oi has the most accurate prediction accuracy. The ML model izoi-a of the plurality of ML models 120i-a to 1201-S of the trained ML model ensemble 1201 having the most accurate prediction accuracy is selected. Determining or selecting the ML model with the best prediction accuracy may be determined by scoring each of the plurality of trained ML models 12oi-a to 120i-S of the ML model ensemble 120i for predicting wastewater flow through wastewater asset 104i using the corresponding training environmental and wastewater measurement data feature sets. Scoring may include scoring each of the plurality of trained ML models izoi-a to 120i-s of the ML model ensemble 12oi using one or more ML performance metrics (e.g. root mean square error (RMSE), mean square error (MSE), or mean absolute error (MAE)). The plurality of trained ML models 12oi-a to 1201-5 of the ML model ensemble 12oi are then ranked according to their scores. The water ingress detection unit noc then selects the trained ML model of the ensemble of ML models 12oi for said wastewater asset io4i that has the best or the topmost ranked score, e.g. for RMSE or MSE performance metrics, the best scores are the lowest scores. The water ingress detection unit floc then analyses the training environmental and wastewater measurement data feature set used to train the selected trained ML model 1201 for wastewater asset io4i to identify the historical environmental dataset included any environmental datasets associated with one or more types of water ingress. The water ingress detection unit noc determines from the training environmental and wastewater measurement data feature set whether water ingress is affecting the wastewater asset io4i, and also the type of water ingress based on the type of water ingress datasets used to train the selected ML model moi-a of wastewater asset 104i. This is performed for each of the trained ML models 120a-102M of each of the wastewater assets 1o4a-104i. Thus, the water ingress detection unit no may output data representative of an indication of whether one or more wastewater assets are affected by water ingress, and also an indication of the types of water ingress in relation to those wastewater assets determined to be affected by water ingress.
[00101] When training each ML model 12014 of the plurality of ML models 1201-a to 120i-s of the ML model ensemble 1201 for wastewater asset 1o4i, the ML unit noa may generate each trained ML model 12oi-i by performing a grid search or ML model search over a plurality of sets of hyperparameters used by the selected ML algorithm or process selected for training the model parameters (e.g. weights/coefficients) for the resulting ML model 1201-1 for the ML model ensemble 12oi of wastewater asset io4i. Thus, when training each ML model 12oi-i, multiple ML models may be generated for based on different sets of hyperparameters using the same training environmental and wastewater measurement data feature set, where the ML model from these multiple ML models resulting in a minimum error (e.g. root mean squared error (RMSE), mean square error (MSE), or mean absolute error (MAE) or other appropriate loss function) being be selected as the ML model 120i-i for inclusion into the ML model ensemble 12oi for that wastewater asset io4i.
[00102] The hyperparameters are those parameters, settings, coefficients used by the ML algorithm that are selected and set prior to training the model parameters that make up an ML model. Each set of hyperparameters may include, without limitation, for example: I) a particular selected time windowing of the environmental data 112 of the training environmental and wastewater measurement data feature set, the time window representing the amount of historical environmental data up to the present environmental data 112 used for that training instance, which will be input or applied to the ML algorithm used for training the model parameters of the ML model; and 2) depending on the type of ML algorithm (e.g. regression, neural network, and/or other ML algorithm) used for generating the ML model, the ML algorithm hyperparameters such as, without limitation, for example the base estimator, maximum number of estimators, train-test split ratio, learning rate in optimization algorithms (e.g. gradient descent, etc.), choice of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer, etc.), choice of activation function in a neural network (NN) layer (e.g. Sigmoid, ReLU, Tanh, etc.), choice of cost or loss function the model will use (e.g. RMSE, MSE, etc.), number of hidden layers in a NN, number of activation units in each layer, drop-out rate/probability in NN, number of iterations (epochs) in training, number of clusters in a clustering task, kernel or filter size in convolutional layers, pooling size, batch size, and/or any other parameter or value that is decided before training begins and whose values or configuration does not change when training ends.
[00103] Selecting an appropriate set of hyperparameters (or hyperparameter tuning) may be performed using various optimisation and search algorithms as is well known by a skilled person such as, without limitation, for example, grid search (e.g. testing all possible combinations of hyperparameters), randomized search (e.g. testing as many combinations of hyperparameters as possible), informed search (e.g. testing the most promising combinations of hyperparameters as possible), and/or evolutionary algorithms such as genetic algorithms (e.g. using evolution and natural selection concepts to select hyperparameters) and/or any other hyperparameter tuning algorithm as is well known by the skilled person. The resulting hyperparameters may be used for training the final ML model 1201-1 used in the trained ML model ensemble 1201 of wastewater asset 104i for detecting water ingress and the like.
[00104] Figure lc illustrates a water ingress detection process 130 for use by ML water ingress detection apparatus 110 of figures la and lb in detecting water ingress at a wastewater asset 104a of a wastewater network 102 of ML wastewater management system 100. Reference numerals of figures la and lb may be reused in figures lc for similar or the same features and/or components and the like. The water ingress detection process 130 may be performed by ML unit noa and IATIDT unit noc for each of the plurality of wastewater assets 104a-104m. The water ingress detection process 130 may be performed in parallel and/or in series for each wastewater asset 104a-104m for determining whether water ingress and the type of water ingress at each of wastewater assets 104a-104m and the like. The water ingress detection process 130 for detecting water ingress each wastewater asset 104a may include the following steps of: [00105] In step 131, training a plurality of ML models to form an ML model ensemble 120a for the wastewater asset 104a, where each ML model is configured for predicting wastewater flow through the wastewater asset 104a. Each ML model of the plurality of ML models is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets, where at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress.
[00106] As described with reference to figures la and lb, the plurality of training environmental and wastewater flow feature sets includes a first training environmental and wastewater flow feature sets comprising historical wastewater flow data associated with the wastewater asset 104a and historical rainfall data associated with the wastewater asset 104a, and each of the remaining training environmental and wastewater flow feature sets of the plurality of training environmental and wastewater flow feature sets comprises historical wastewater flow data associated with the wastewater asset 104a and historical rainfall data associated with the wastewater asset 104a and a unique set of one or more historical water ingress datasets from a plurality of water ingress datasets.
[00107] For example, each set of one or more historical water ingress datasets may include at least one or more from the group of: historical river level data; historical tidal data; historical ground water level data; historical flood water level data; any other type of historical environmental data affecting wastewater flow through a wastewater asset; any other type of historical environmental data associated with water ingress affecting wastewater flow through a wastewater asset; and/or any other historical environmental data external to the wastewater asset affecting the flow through the wastewater asset.
[00108] Each of the training environmental and wastewater flow feature sets comprises historical rainfall data associated with the wastewater asset in4a. The historical rainfall data associated with the wastewater asset 104a may be calculated based on a combination of first historical rainfall data corresponding to a first rainfall area the wastewater asset 104a is located within, and one or more other historical rainfall data corresponding to rainfall areas adjacent to the first rainfall area. Alternatively or additionally, calculating the historical rainfall data associated with the wastewater asset in4a may further include calculating a hyper-local historical rainfall data at the location of the wastewater asset io4a based a weighted combination of the first historical rainfall data and the one or more other historical rainfall data in relation to the location of the wastewater asset io4a within the first rainfall area and the relative location of the wastewater asset 1o4a to each of the one or more other rainfall areas.
[00109] In step 132, scoring each trained ML model of the ML model ensemble i2oa based on one or more ML model performance metrics. The ML model performance metrics may be associated with measurement prediction accuracy. The ML model performance metrics may include, without limitation, for example root mean squared error (RMSE), mean squared error (MSE), and/or mean absolute error (MAE), and any other model performance metric or statistic used for evaluating the performance or prediction accuracy of the ML model.
[oono] In step 133, selecting a trained ML model from the trained ML model ensemble 12oa with the best score.
[oom] In step 134, identifying the training environmental and wastewater flow feature set used to train the selected trained ML model.
[00112] In step 135, detecting one or more types of water ingress for the wastewater asset io4a based on whether the identified training environmental and wastewater flow feature set is associated with one or more types of water ingress.
[00113] For example, the steps 132 and 133 of scoring each trained ML model and selecting a trained ML model may further include: measuring prediction accuracy for each trained ML model of the trained ML model ensemble; scoring each trained ML model of the ML model ensemble based on one or more ML model performance metrics associated with measurement prediction accuracy; ranking all the trained ML models based on the scoring; and selecting the highest or topmost ranked trained ML model.
[00114] Alternatively or additionally, the steps 132 and 134 of scoring each trained ML model and selecting a trained ML model may further include: scoring and 10 ranking each of the trained ML models of the ensemble of trained ML models based on RMSE, MSE and/or MAE; selecting the highest or topmost ranked trained ML model.
[00115] Figure id illustrates a water ingress targeting process 140 for use by water ingress apparatus no of figures la and ib in pinpointing areas of a wastewater network 102 affected by water ingress. Reference numerals of figures la and lb may be reused in figures lc for similar or the same features and/or components and the like.
The waster ingress targeting process 140 includes the following steps of: [oon6] In step 141, performing the water ingress detection process 130 at each of a plurality of wastewater assets io4a-io4m of the wastewater network 102. For example, the water ingress ML detection unit/apparatus no of wastewater network 102 may perform the water ingress detection process 130 for each of the wastewater asset 104a-104m.
[00117] In step 142, determining for each wastewater asset of the plurality of wastewater assets the correlation of the type of water ingress affecting said each wastewater asset.
[00118] In step 143, for each type of water ingress detected in the plurality of wastewater assets io4a-1o4m, pinpointing one or more target areas associated one or more wastewater assets 104a-1o4m having the highest correlation for said each type of water ingress.
[00119] Figure le illustrates a water ingress pinpointing process 145 for use by step 143 of water ingress targeting process 140 of figure id in pinpointing areas of a wastewater network 102 affected by water ingress. Reference numerals of figures la and lb may be reused in figures lc for similar or the same features and/or components and the like. The waster ingress pinpointing process 145 includes, for each type of water ingress, the following steps of: [00120] In step 146, determining a set of wastewater assets connected together in a daisychain having a correlation above a threshold correlation associated with said each type of water ingress. For example, the set of wastewater assets in the daisychain may include a first wastewater asset upstream of the other wastewater assets in the daisychain and a last wastewater asset in the daisychain downstream of all the other wastewater assets in the daisychain.
[00121] In step 147, ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain.
[03122] In step 148, selecting one or more wastewater assets having the highest correlation in relation to said type of water ingress within the set of wastewater assets in the daisychain.
[00123] In step 149, determining a target area for said type of water ingress covering said selected one or more wastewater assets of the set of wastewater assets in the daisychain, and outputting data representative of said determined target area for said type of water ingress.
[03124] Figure if illustrates an example j-th trained ML model 120i-j for the plurality of trained ML models of ML model ensemble 120i of wastewater asset 104i of wastewater network 102. In this case, the ML model 1201-j may have been trained to jointly predict the minimum and maximum wastewater thresholds 1221j-1 and 1221j-2 based on applying current environmental data 112 associated with wastewater asset 104i as input to the ML model 120i-j. The ML model i2oi-j has been trained based on historical environmental training data instances and corresponding historical wastewater flow data (e.g., historical minimum, mean and maximum wastewater flow data) for predicting minimum and maximum wastewater thresholds 122i,j-1 and 122i,j2.
[00125] Figure ig illustrates another example j-th trained ML model 1201-j for the plurality of trained ML models of ML model ensemble 12th of wastewater asset 104i 30 of wastewater network 102. In this case, the ML model 1201-j may have been trained to predict the mean wastewater flow122i,j-3 based on applying current environmental data 112 associated with wastewater asset 104i as input to the ML model 1201-j. The ML model 1201.-j has been trained based on historical environmental training data instances and corresponding historical wastewater flow data associated with historical mean wastewater flow data for predicting the mean wastewater flow 1221j-3.
[00126] Figure ih illustrates another example j-th trained ML model 1201-j for the plurality of trained ML models of ML model ensemble 1201 of wastewater asset 1041 of wastewater network 102. In this case, the ML model 120i-j may be built or formed from multiple ML models 1201+1 and 1201+2 in which each of the ML models 1201+1 and 1201+2 are trained separately to predict the minimum and maximum wastewater thresholds 1221j-1 and 1221j-2, respectively, based on applying current environmental data 112 associated with wastewater asset 104i as input to the ML models 120i-j-1 and 1201-j-2.
[00127] Figure le illustrates a further example trained ML model system 150 that includes a further example j-th trained ML model 120i-j for the plurality of trained ML models of ML model ensemble 1201 of wastewater asset 104i of wastewater network 102. In this case, the ML model 1201-j may be built or formed from multiple ML models 152i and 154i, where ML model 152i is configured to be a dry weather ML model 152i and ML model 154i is configured to be a wet weather ML model 1541. The dry weather ML model 152i includes ML models 152i-j-a and 1521-j-b that have been trained separately on historical dry weather environmental training data instances and corresponding wastewater levels for predicting minimum and maximum dry weather wastewater thresholds 153i,j-i and 153i,j-2, respectively, based on environmental data 112 associated with wastewater asset io4i as input. In this example, ML model 154i is a wet weather ML model 154i that includes ML models 1541-j-a and 154i-j-b that are trained separately on historical wet weather environmental data instances for predicting minimum and maximum wet weather wastewater thresholds 155i,j-i and 155i,j-2, respectively, based on current environmental data 112 associated with wastewater asset 104i as input.
[00128] The historical environmental data may be analysed to identify those portions of the historical environmental data that correspond with rainfall data indicating dry weather conditions, these identified dry weather portions may be extracted from the historical environmental data to form the historical dry weather environmental training data instances. The historical dry weather environmental training data instances along with corresponding historical wastewater measurements (e.g., historical minimum, mean and maximum dry weather wastewater flow data) may be used in training the ML models 152i-j-b and 152i-j-b. Similarly, the historical environmental data may be analysed to identify those portions of the historical environmental data that correspond with rainfall data excluding dry weather conditions, these identified wet weather portions may be extracted from the historical environmental data to form the historical wet weather environmental training data instances. The historical wet weather environmental training data instances along with corresponding historical wastewater measurements (e.g. historical minimum, mean and maximum wastewater flow data) may be used for training the ML models 154i-j-b and 154i-j-b.
[oo129] The predicted minimum dry weather wastewater threshold 153ij-i is combined with the predicted minimum wet weather wastewater threshold 155ij-1 to form the predicted minimum wastewater threshold 1221j-1, which is output from ML model 1201-j. In this example, the predicted minimum dry weather wastewater threshold 153i,j-i is added to the predicted minimum wet weather wastewater threshold 155i,j-1 to form the predicted minimum wastewater threshold 122i,j-1. The predicted maximum dry weather wastewater threshold 153i,j-2 is combined with the predicted maximum wet weather wastewater threshold 1551,j-2 to form the predicted maximum wastewater threshold 122ij-2, which is output from ML model 120i-j. In this example, the predicted maximum dry weather wastewater threshold 153i,j-2 is added to the predicted maximum wet weather wastewater threshold 155i,j-2 to form the predicted maximum wastewater threshold 1221j-2.
[oc13o] Although several example ML model structures have been described with reference to figures one of these may be used as the basis structure for the ML models of each of the ML model ensembles 12oa-i2om of each of the wastewater assets lo4a-14m to ensure accuracy in the water ingress detection based on the scoring and ranking of the ML models from each of the ML model ensembles 12oa-i2om. It is to be appreciated that the ML models of each of the ML model ensembles 120a-120M may be based on the same ML model structure that can be configured for predicting wastewater flow through the corresponding wastewater assets 104a-104m. Although several example ML models structures have been described with reference to figures la to 1i, this is by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that any combination of ML model structures may be used as the basis structure for generating the required ML model ensembles izoi-izom each of the wastewater assets 104a-io4m.
[00131] Examples of ML algorithms or processes that may be used include or may be based on, by way of example only but is not limited to, any ML algorithm or process that can train model parameters on a labelled and/or unlabelled time series datasets for generating a trained ML model that can track the behaviour of time series data for making predictions thereon. Some examples of ML algorithms may include or be based on, by way of example only but is not limited to, one or more ML algorithms associated with regression learning or ensemble meta-algorithms, Adaptive Boosting (AdaBoost), Gradient boosting, extreme Gradient boosting (XGBoost), bootstrap aggregating, CoBoost, BrownBoost, random forests, decision tree learning, association rule learning, data mining algorithms/methods, artificial neural networks (NNs), deep NNs, deep learning, deep learning ANNs, convolutional NNs, support vector machines (SVMs), one or more combinations thereof or modifications thereto and the like and/or any other suitable ML algorithm as the application demands.
[00132] For example, an ML model for the ML model ensemble 1201 of wastewater asset 104i may be generated using an ML algorithm associated with regression learning or ensemble meta-algorithms, AdaBoost, Gradient boosting, extreme Gradient boosting and the like. Such a chosen MT. algorithm along with carefully selected hyperparameters such as, without limitation for example, the base estimator, maximum number of estimators, train-test split ratio, learning rate in optimization algorithms (e.g. gradient descent, etc.), choice of optimization algorithm and any other suitable hyperparameter may be used along with a corresponding training environmental and wastewater flow data feature set comprising data representative of historical wastewater measurement data ii4a for wastewater asset 104i and corresponding environmental data 112 associated with wastewater asset lozii to train model parameters for the ML model for the ML model sensemble 1201 of wastewater asset 104i.
[00133] Referring back to figures la and 1b, the data ingestion unit nob is configured for receiving environmental data instances 112 (e.g. rainfall data, river level data, ground water level data, flood level data, tidal data) and wastewater measurements 114a-n4m from each of the corresponding wastewater assets 104a1o4m. The data ingestion unit nob includes a communication interface (Cl) for receiving the environmental data 112 when it is available (e.g. periodic or aperiodic rainfall, river level, ground water level, flood level and/or tidal level measurements) and the wastewater measurements 114a-114m from the sensors 1o8a-1o8m of each of the wastewater assets 104a-104m. This data is fed to the ML unit 110a and water ingress detection unit Hoc as historical data for training and/or as required.
[00134] The water ingress detection unit noc is configured for detecting water ingress at one or more of the wastewater assets 104a-104m based identifying whether the environmental dataset in the training environmental and wastewater measurement data feature set for the corresponding best performing ML model of each of ML model ensembles 120a-120M of the wastewater assets io4a-104m has one or more types of water ingress datasets. When the water ingress detection unit Hoc detects whether water ingress is occurring at a wastewater asset io4i of the wastewater network 102 such as, for example, one or more of river water ingress, ground water ingress, flood water ingress, tidal water ingress and the like, the water ingress detection unit Hoc may send a notification or alert to an operator apparatus or console for alerting an operator monitoring the wastewater network 102 of the water ingress the one or more wastewater assets 1o4a-104m and/or highlight the likely areas or pinpoint areas of wastewater network 102 in which the water ingress may be occurring. This provides the advantage of early scheduling and deployment of maintenance personnel for repairing and removing the water ingress and thus restoring or returning said wastewater asset io4a back to a normal behaviour. Alternatively or additionally, the water ingress detection unit Hoc may automatically communicate with a maintenance network/system for scheduling and deploying maintenance personnel for restoring or returning said wastewater asset 104a back to normal behaviour.
[00135] Figure 2 illustrates a data cleaning pipeline process zoo for performing data clean-up and processing for use in generation of each of the plurality of trained ML models i2oa-a-i2oa-s, 12oi-a to 120i-s, 120m-a to 120M-s of corresponding ML model ensembles 120a-120M of ML wastewater management system 100. it is important to be able to train each of these ML models 120a-a-120a-s, 120i-a to 120i-s, 120m-a to 120M-S for predicting mean and/or min/max wastewater thresholds for each of the corresponding wastewater assets io4a-io4m using the same type of ML model structure and also individualised training environmental and wastewater data feature sets. An individualised training environmental and waster data feature set for a wastewater asset 104i includes timestamped historical wastewater measurements 114i for that wastewater asset 104i and also corresponding timestamped historical environmental data u4i associated with that wastewater asset io4i. However, the raw historical wastewater measurements u4i from the sensor io8i of a wastewater asset 104i can include spurious or noisy data associated with, without limitation, for example sensor failure, misaligned sensors, uncalibrated sensors, change of sensors, blockages and other sensor or water asset / water ingress anomalies and the like. In addition, raw historical wastewater measurements u4i may also have a higher time resolution with sensing time intervals/instances of the sensor 108i being in the region of 1 min or 5 min between sensing measurements as compared with environmental data 114i associated with the wastewater asset 104i in which environmental data instances of the various environmental data data sets (e.g. rainfall, river level, ground water level, flood level, tidal level, and other water ingress datasets), may be received in the region of 10 min, 15 min, 1/2 hour, hourly, daily or weekly or typically any other time interval greater than that of the measurement sensor 108i.
[00136] Given this, the raw timestamped historical wastewater measurements need to be normalised, cleaned (e.g. spurious data removed) and also synchronised in time with the timestamped environmental data associated with the wastewater asset 104i to ensure each of the resulting trained ML models izoi-a to 120i-s of ML model ensemble izoi may track the normal behaviour of the wastewater asset 1041 over time for predicting wastewater flow (e.g. appropriate mean and/or minimum and maximum wastewater thresholds) in relation to environmental data instances corresponding to each trained ML models 120i-a to 1201-s of ML model ensemble 120i for use in scoring and ranking said trained ML models 120i-a to 1201-s of ML model ensemble 1201 for detecting and/or pinpointing whether water ingress is affecting and the types of water ingress affecting wastewater asset 104i. For each wastewater asset 104i of the plurality of wastewater asset 104a-104m, a data cleaning pipeline process 200 is performed to generate suitable historical data for use in creating a unique training environmental and wastewater measurement data feature set for use in training the ML models 120i-a to 1201-s of ML model ensemble 1201 of said each wastewater asset 104i. The data cleaning pipeline process 200 may include the following steps of: [00137] In operation 202, the historical wastewater measurements 114i from the sensor of the wastewater asset 104i is processed to: [00138] a) normalise the magnitude of the historical wastewater measurements 114i to form normalised historical wastewater measurements (e.g. convert each wastewater measurement in the time series data to a percentage or fractional value between ro-il or other appropriate value range based on the maximum and minimum measurement range the sensor 108i maybe calibrated to perform); and [00139] b) synchronise the time series normalised historical wastewater measurement data time resolution to the time resolution of a selected type of time series environmental data 112a-112e (e.g. rainfall data 112a). For example, for each time interval between environmental data instances of the selected type of environmental data 112a, generating a mean/max/minimum wastewater data instance for those normalised historical wastewater measurement data instances falling within the environmental data time interval to form a synchronised set of normalised historical wastewater instances, each instance having a mean, minimum and maximum normalised value for that time interval.
[00140] In operation 204, processing the synchronised set of normalised historical wastewater data instances to remove noisy data, spurious sensor measurements, blockages and the like and identify other events within the data based 15 on: [00141] a) identify, using statistical analysis, and remove outlier blocks of data from the synchronised set of normalised historical wastewater data instances. For example, a dispersion graph may be formed to identify the outlier blocks for removal. For example this may be performed based on performing statistical analysis on the synchronised set of normalised historical wastewater data instances such as, without limitation, for example generating a histogram of the synchronised set of normalised historical wastewater data instances for the wastewater asset 104i and identifying whether the statistical outlier blocks of the histogram data based on comparison with an idealised historical data pattern. The identified outlier blocks may be removed from the synchronised set of normalised historical wastewater data to form a first clean synchronised set of normalised historical wastewater data.
[00142] b) filter the first dean synchronised set of normalised historical wastewater data using long-term and short-term statistical averages and a ruleset for identifying inaccurate or discontinuity in the measurements. For example, null values between data instances may be interpolated or long series of null values may form a discontinuity for removal. This may form a second dean synchronised set of normalised historical wastewater data.
[00143] c) identify exclusion events from the second clean synchronised set of normalised historical wastewater data that affect the accuracy or continuity of the measurements from sensor io8i, such as, without limitation, for example: i) noisy data, spurious sensor measurements, blockages and the like; ii) rainfall events; iii) dry weather events; and/or iv) other feature events. Remove the noisy data, spurious sensor measurements, blockages and the like from the second clean synchronised set of normalised historical wastewater data to form a clean synchronised set of normalised historical wastewater data.
ro01441 d) generating a dry weather dataset for the wastewater asset based on removing the portions of the clean synchronised set of normalised historical wastewater data associated with rainfall events from the clean synchronised set of normalised historical wastewater data to form the dry weather dataset. Alternatively or additionally, generating the dry weather dataset for the waste based on including those portions of the clean synchronised set of normalised historical wastewater data associated with dry weather events into the dry weather dataset. That is, the dry weather dataset comprises the clean synchronised set of normalised historical wastewater data excluding rainfall events and/or including dry weather events.
[00145] In operation 206, updating environmental data instances to correspond to the clean synchronised set of normalised historical wastewater data instances by removing those environmental data instances that do not coincide with the timestamps of the clean synchronised set of normalised historical wastewater data instances. The other types of environmental data instances i12b-ii2e (e.g. river level data 112b, ground water level data 112C, flood level data 112d, tidal level data 112e, and/or other water ingress data sets) may also be updated to be synchronised with the time intervals of the selected type of environmental data 112a (e.g. rainfall). In addition, updating environmental data instances to correspond to the clean synchronised set of normalised historical wastewater data instances, may further include generating dry weather environmental data instances by including only those environmental data instances that coincide with the timestamps of corresponding synchronised set of normalised historical wastewater data instances contained within the dry weather dataset. Furthermore, the updated environmental data may include further processing various types of environmental data such as rainfall data 112a to estimate a more accurate or hyper-local rainfall dataset based on the location of the wastewater asset io4i within the area associated with the rainfall data and a plurality of other rainfall datasets from adjacent areas to the rainfall area the wastewater asset tozii is located within. For example, this may be based on performing a multivariate interpolation (e.g. three dimensional, tri-linear interpolation or nearest neighbour interpolation) to determine the hyper-local rainfall dataset at the location of the wastewater asset 1o4i based on the rainfall dataset covering the area the wastewater asset 104i is located in and other rainfall datasets associated with adjacent rainfall areas to the rainfall area the wastewater asset 1o4i is located within. In another example, the rainfall dataset for a wastewater asset 1o4i may be updated to a hyper-local rainfall dataset for said wastewater asset 1o4i based on identifying the three adjacent rainfall areas with the rainfall area the wastewater asset 104i located within that are closest to the wastewater asset 1o4i, and performing an interpolation and averaging process using the rainfall data of the identified three rainfall areas and the rainfall data of the rainfall area the wastewater asset 1o4i is located within to estimate a hyper-local rainfall dataset for wastewater asset 1o4i. Thus, the determined hyperlocal rainfall dataset for each wastewater asset 1o4i may be used in place of the rainfall dataset associated with the rainfall area said each wastewater asset 104i is located within. The updated environmental data instances may form a set of historical environmental training data instances. The dry weather environmental data instances may form a set of historical dry weather environmental training data instances.
[00146] The clean synchronised set of normalised historical wastewater data instances (which include normalised mean, min and max data instances) for a wastewater asset 1o4i and the updated environmental data instances (or set of historical environmental training data instances) may be used to form one or more individualised training environment and wastewater measurement data feature sets for the wastewater asset 104i.
[00147] For example, individualised training environment and wastewater measurement data feature sets for the wastewater asset 104i may be based on creating plurality of different training environmental and wastewater measurement feature sets each of which include clean synchronised set of normalised historical wastewater data for said wastewater asset 104i and corresponding rainfall data or hyper-local rainfall data for said wastewater asset mill, and further including either: a) one water ingress dataset from the group of: river level data, ground water data, flood water data, tidal level data, and other types of water ingress datasets; or b) two or more combinations of water ingress datasets from the group of: river level data, ground water data, flood water data, tidal level data, and other types of water ingress datasets. At least one of the plurality of training environmental and wastewater measurement feature sets only includes clean synchronised set of normalised historical wastewater data for said wastewater asset 104i and corresponding rainfall data or hyper-local rainfall data for said wastewater asset 1041. Each of the plurality of training environmental and wastewater measurement feature sets are unique. That is there no duplicate combinations of rainfall data, wastewater measurement data, and types of water ingress datasets. All of the plurality of training environmental and wastewater measurement feature sets for wastewater asset 104i include rainfall data and clean synchronised normalised wastewater measurement for wastewater asset 1041. The number of training environmental and wastewater measurement feature sets in the plurality of training environmental and wastewater measurement feature sets is equal to the number of ML models 120i-a-120i-s of ML model ensemble 120i for wastewater asset 104i. Thus, each of the ML models 120i-a to 1201-s of the ML model ensemble 1201of wastewater asset 104i has a corresponding unique training environmental and wastewater measurement feature set. This is performed for each of the wastewater assets 104a-104m.
[00148] As an option, the dry weather dataset (which includes normalised mean, min and max data instances from the set of normalised historical wastewater data instances corresponding to dry weather) and the updated dry weather environmental data instances (or set of historical dry weather environmental data instances) may be similarly used to form an individualised dry weather training dataset for the wastewater asset 104i.
[00149] In operation 208 generating for wastewater asset 104i a plurality of trained ML models 120i-a to 1201-s of the ML model ensemble 1201 for the wastewater asset 104i, each ML model 120i-a of the plurality of trained ML models 120i-a to 120i-S is configured to predict mean or minimum and maximum thresholds for the wastewater asset based on: training model parameters using an ML algorithm (e.g. regression based algorithm) and the corresponding unique training environmental and wastewater measurement feature set for a plurality of ML models associated with said each ML model 120i-a by performing a hyperparameter grid search, where the model parameters for each ML model are trained by the ML algorithm for a particular set of hyperparameters for predicting mean, maximum and/or minimum wastewater thresholds based on the individualised training dataset for the wastewater asset 104i.
The clean synchronised normalised wastewater measurement for wastewater asset 104i included in the training environmental and wastewater measurement feature sets for ML model 120i-a includes at least one of the mean data instances, maximum data instances, or minimum data instances of the cleaned synchronised set of normalised historical wastewater data instances for wastewater asset 104i and the corresponding updated environmental data instances.
[00150] Alternatively or additionally, generating the trained ML model 120i-a may further include training a wet weather ML model and training a dry weather ML model to predict minimum and maximum thresholds for the wastewater asset 104i.
The wet weather ML model may be trained based on: training model parameters using an ML algorithm (e.g. regression based algorithm) and the corresponding unique training environmental and wastewater measurement feature set for a plurality of ML models associated with said each ML model 120i-a by performing a hyperparameter grid search, where the model parameters for each ML model are trained by the ML algorithm for a particular set of hyperparameters for predicting mean, maximum and/or minimum wet weather wastewater thresholds based on the corresponding unique training environmental and wastewater measurement feature set associated with rainfall/wet weather. The corresponding unique training environmental and wastewater measurement feature set associated with rainfall/wet weather includes at least one of the mean data instances, maximum data instances, or minimum data instances of the cleaned synchronised set of normalised historical wastewater data instances and the corresponding updated environmental data instances (including rainfall). The dry weather ML model may be trained based on: training model parameters using an ML algorithm (e.g. regression based algorithm) and the corresponding unique dry weather training environmental and wastewater measurement feature set associated with said each ML model 120i-a for a plurality of ML models by performing a hyperparameter grid search, where the model parameters for each ML model are trained by the ML algorithm for a particular set of hyperparameters for predicting mean, maximum and/or minimum dry weather wastewater thresholds based on the individualised dry weather training dataset for the wastewater asset 104i. The corresponding unique dry weather training environmental and wastewater measurement feature set includes data representative of the dry weather dataset (which includes normalised mean, minimum and maximum data instances from the set of normalised historical wastewater data instances corresponding to dry weather) and corresponding dry weather environmental data instances (or set of historical dry weather environmental data instances). The trained ML model 120i-a includes the trained dry weather ML model and trained wet weather ML model, where the minimum predicted wastewater threshold for the trained ML model 120i-a is formed based on a combination of the minimum predicted wet weather wastewater threshold and minimum predicted dry weather threshold, and the maximum predicted wastewater threshold for the trained ML model 120i-a is formed based on a combination of the maximum predicted wet weather wastewater threshold and maximum predicted dry weather threshold.
[00151] Figure 3 illustrates an ML model generation process 300 for use in operation 208 of data processing pipeline zoo for building/generating an ML model 120i-a for the ML model ensemble 120i of wastewater asset 104i that is capable of tracking and predicting the behaviour of wastewater flow through the wastewater asset 104i given environmental data as input to the ML model 120i-a. The ML model 120i-a is configured for predicting data representative of mean or minimum and maximum wastewater thresholds for the wastewater asset 1041. It is assumed that the ML algorithm for training the model parameters of all of the ML models izoi-a to 1201-s of ML model ensemble 1201 of wastewater asset 1041 has already been chosen (e.g. regression, AdaBoost, Gradient Boost, extreme Gradient Boost and/or NN and the like). The ML model generation process 300 for building/generating an ML model 120i-a of ML model ensemble 1201 includes the following steps of: [00152] In step 302, selecting a set of hyperparameter ranges for use with the ML algorithm for performing a hyperparameter grid search, where a plurality of ML models are trained over the various combinations of hyper parameters in the set of hyperparameter ranges.
[00153] In step 304, training model parameters for a plurality of ML models using the chosen ML algorithm and various combinations of hyperparameters of the set of hyperparameter ranges and the corresponding unique training environmental and wastewater measurement feature set for the ML model 120i-a of ML model ensemble 1201 generated in operations 202-206 and/or as described herein. A hyperparameter grid search (or any other hyperparameter tuning algorithm) may be performed for generating by the ML algorithm model parameters for a plurality of ML models using all combinations of hyperparameters of the set of hyperparameter ranges. Each of the plurality of ML models may be trained to predict the mean, maximum and/or minimum wastewater thresholds based on the corresponding unique training environmental and wastewater measurement feature set for the ML model 120i-a. The corresponding unique training environmental and wastewater measurement feature set for ML model 120i-a includes at least one of the mean data instances, maximum data instances, or minimum data instances of the cleaned synchronised set of normalised historical wastewater data instances and the corresponding updated environmental data instances associated with rainfall data and also any one or more water ingress datasets that may be included. It is noted that when the hyperparameter grid search is configured to try every combination of hyperparameters in the sets of hyperparameter ranges, this will result in the optimal combination of values for the hyperparameters that may be used to train the ML model 120i-a. Other hyperparameter tuning algorithms may be faster but at the expense of reducing the likelihood the optimal combination of hyperparameters is found for training the resulting ML model 12oi-a. For the application of water ingress detection, it is important to determine the optimal hyperparameters for use in training the ML model 120i-a, which should reduce inaccuracies in predicting data representative of the mean or minimum and maximum wastewater thresholds, which can impact how accurately water ingress (e.g. reducing false positives etc.) in the network are detected.
[00154] In step 306, scoring and ranking the plurality of trained ML models based on ML model performance statistics such as, without limitation, for example 20 minimising RMSE and/or MSE or other ML performance metric. This orders the plurality of ML models according to ML model performance.
[o0155] In step 308, selecting the best performing trained ML model from the ranked ML models based on minimising RMSE, MSE and/or MAE.
[00156] In step 310, building the final ML model 120i-a for predicting data representative of the mean or minimum and/or maximum wastewater thresholds using the hyperparameters of the selected trained ML model and the corresponding unique training environmental and wastewater measurement feature set for the ML model 120i-a.
[00157] In step 312, the final trained ML model 1201 for the ML model ensemble 120i of wastewater asset to4i maybe included into the ML model ensemble 120i along with the corresponding unique training environmental and wastewater measurement feature set. The ML model performance statistics for the final trained ML model 1201 may be also determined based on the corresponding unique training environmental and wastewater measurement feature set. The ML model performance statistics may be based on, without limitation, for example minimising RMSE, MSE, and/or MAE or other ML performance metric associated with model prediction accuracy. This may be used by the water ingress detection unit noc.
[00158] Figure 4 illustrates a water ingress detection process 400 for use in water ingress detection unit line and/or step 312 of ML model generation process 300 for detecting whether water ingress and the type of water ingress occurs at wastewater asset 104i of wastewater network 102. The water ingress detection process 400 may be used for each of the wastewater assets 104a-104m for any water ingress at each of those wastewater assets 104a-io4m. The ML model ensemble 120i of wastewater asset 104i includes a plurality of trained ML models 12oi-a to 120i-s, each of which has been trained using ML model generation process 300 using corresponding unique training environmental and wastewater measurement feature set of a plurality of unique training environmental and wastewater measurement feature sets associated with the plurality of trained ML models 12oi-a to 120i-s and wastewater asset io4i. Each ML model 12oi-a of the plurality of ML models 12oi-a to 12oi-s have been trained to predict data representative of mean or minimum and maximum wastewater thresholds for the wastewater asset io4i given corresponding environmental data that is input for each time instance. The water ingress detection process 400 includes the following steps of: [03159] In step 402, receiving a plurality of trained ML models 12oi-a to 1201-5 of an ML model ensemble 12oi for the i-th wastewater asset io4i. The plurality of ML models 12°i-a to 12oi-s may have been built using ML model generation process 300 and/or other processes as described herein using the corresponding plurality of unique training environmental and wastewater measurement feature sets created therefor.
[ootho] In step 404, scoring each of the plurality of trained ML models 12oi-a to 120i-s using the corresponding plurality of corresponding unique training environmental and wastewater measurement feature set to generate an ML model performance statistic for each of the plurality of trained ML models 120i-1201-S The ML model performance statistic may be, without limitation, for example minimising RMSE, MSE and/or MAE or other ML performance metric/statistic associated with prediction accuracy of an ML model.
[oothi] In step 406, ranking the scored plurality of trained ML models 12oi-a to 1201-s. For example, generating an ordered list of the plurality of trained ML models 120i-a to 1201-s that is ordered based on the corresponding score of each of the trained ML models 120i-a to 120i-s.
[00162] In step 408, selecting the best scoring ML model of the ML model ensemble 120i for the i-th wastewater asset 1041 and determine whether i-th wastewater asset 104i affected by water ingress based on the corresponding training dataset of the best scoring ML model. For example, selecting the trained ML model from the ranked ML models or ordered list of trained ML models 120i-a to 1201-s that has the best prediction accuracy or is the topmost ranked ML model in the list, e.g., minimises RMSE, MSE, and/or MAE. Determining whether i-th wastewater asset 104i is affected by water ingress based on examining the environmental training datasets used in the unique training environmental and wastewater measurement feature set used to train the selected trained ML model. Storing data indicative of whether water ingress occurs for the i-th wastewater asset and, if so, the one or more types of water ingress based on the types of environmental data sets associated with water ingress within the unique training environmental and wastewater measurement feature set used to train said selected ML model.
[00163] In step 410, checking whether another wastewater asset should be assessed for water ingress, if another wastewater asset is to be analysed for water ingress (e.g. Y), the proceed to step 412 otherwise (e.g. N) proceed to step 414.
[00164] In step 412, proceeding to the next (i+i)-th wastewater asset and proceed to step 402 for receiving a plurality of trained ML models of an ML model ensemble for the (i+i)-th wastewater asset.
[00165] In step 414, sending an indication or alert of any detected water ingress and types of water ingress and/or location of the water ingress for those wastewater 25 assets affected by water ingress to wastewater management system 100 for arranging maintenance, updating and/or repair of wastewater network, corresponding pipes/equipment and/or wastewater assets.
[00166] Referring to figures la to 1i, 2, 3 and 4, an example embodiment of the ML water management system 100, ML water ingress detection apparatus 110, ML model ensembles 120a-120m and data processing pipeline 200, ML generation process 300 and water ingress detection process 400 are now described with respect to the sensors 108a-108i being configured to measure wastewater levels (e.g. sewer levels measurements). Although this example embodiment describes the specifics of using sensors to8a-to8i of sensing wastewater levels, this is by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that the following example embodiment may be applied to any other type of sensor that may be used in one or more other wastewater assets to4a-to4m such as, without limitation, for example flow meters, temperature sensors, pressure sensors, current sensors or power sensors related to pumps within wastewater network 102, and/or other monitors/sensors that have output analogue data and/or any other sensor and the like as the application demands.
[83167] Operation zoz of pipeline process zoo may be performed, where wastewater measurement data n4 that is ingested by data ingestion unit nob is a time series sensor data stream that is received from sensors to8a-to8i in wastewater assets to4a-to4m of wastewater/storm water network 102. The wastewater measurement data may be a historical time series data set and/or current real-time wastewater measurement dataset, where the following is applicable to both. In this example, the sensors to8a-to8m are level or flow sensors. Each time series data stream comprises wastewater level or flow measurement data and time stamps for each measurement. The sensors to8a-to8m may be measuring the wastewater levels or flows at a first time resolution such as, without limitation, for example once every 1, 2, 5, or to minutes and/or any other appropriate time period of N units of time (e.g. minutes). The wastewater measurement data n4 may first be normalised to represent a capacity of the wastewater asset to4i as a percentage or a fractional number in the range [o, 1], and/or any other value. This may be performed for each wastewater measurement data 114i from each sensor to8i by determining the minimum measurement and maximum measurement the sensor to8i may be calibrated to perform and then normalising each wastewater measurement data n4i based on the minimum/maximum measurements of the corresponding sensor to8i. For example, the wastewater sensor measurement data may be provided in units of centimeters (cm) or other units as the application demands, where the sensor to8i where metadata in the sensor reading may provide the invert bottom as o and top or max sensor reading as 5 metres, or other value such as X metres, X>o, which may be used to translate the measured levels into a percentage. If the sensor pre-processes or normalises the sensor data, then normalisation is not necessary, but in other situations when the sensor data is simply provided as an analogue reading then the normalisation processing is performed on the wastewater measurement data 114i at data ingestion unit nob prior to processing, training, and/or detection of water ingress anomalies.
[om68] For example, training of each of the ML models 12oi-a to 12oi-s for a ML model ensemble 1201 may be performed using the capacity percentages, where the data that is ingested is normalised and converted to a percentage based on, for example, empty level of the wastewater asset io4i and full level of wastewater asset. These may be used to convert the ingested wastewater measurements 114i into a capacity percentage (%). Although capacity percentage is used herein, this is by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that the wastewater measurement data 114i from each wastewater asset 104i may be normalised in any other manner and the like as the application demands.
[00169] Once the wastewater measurement data n4i from sensor 1o8i of wastewater asset 104i is ingested and normalised, it may be processed and segmented or synchronised into a second time resolution such as, for example, 15, 20, 30 minute, hourly, daily periods and/or any other appropriate time period of M units of time (e.g. minutes), where the time period N may be smaller than the time period M. Time period M may coincide with the time period resolution of the time series environmental data associated with wastewater asset 104i. Thus, the time series normalised wastewater measurement data 114i may be synchronised in time with the time series environmental data. In this example, the environmental data 112 includes rainfall data 112a and the time period M is set to the time period resolution that rainfall data (12a is received and ingested by data ingestion unit nob. This may be dictated by an external operator such as the weather office/organisations rainfall measurement services. For example, rainfall data may be provided at 15 minute periods, so time period M may be set to 15 minute periods. Given this, the normalised wastewater measurement data (14i from sensor 1o8i is segmented into M time periods (e.g. 15 minute periods) by calculating from the ingested normalised wastewater measurement data n4i the maximum, mean (e.g. average) and minimum level or flow reading over each time period M (e.g. each 15 minute period). Thus a synchronised normalised wastewater measurement dataset is formed for wastewater asset 104i that includes three different time series wastewater measurement datasets comprising maximum, minimum and mean wastewater measurement levels or flows over each time period M. [00170] Operation 204 of pipeline process 200 may be performed, where the synchronised normalised wastewater measurement dataset is processed and cleaned-up to ensure the best correlation and learning may be achieved by each of the ML models 120i-a to 1201-s of ML model ensemble 1201 of the wastewater asset 104i, which further improves the predictions and the like. Essentially, this process and clean-up of data eliminates incorrect data and/or impossible types data from the synchronised normalised wastewater measurement dataset. The clean-up processes may be performed on the synchronised normalised wastewater measurement dataset for wastewater asset 104i based on the following sequence of events: Item].) identify and remove outlier blocks of data from the synchronised normalised wastewater measurement dataset is formed for wastewater asset 104i using a dispersion graph (e.g. Figures 5 and 6); Item 2) filter the remaining data of the synchronised normalised wastewater measurement dataset is formed for wastewater asset 104i; Item 3) identify exclusion events that affect the accuracy or continuity of the measurements from sensor 108i, blockages and the like; and Item 4) forming a dry weather dataset based on those portions of the cleaned synchronised normalised wastewater measurement dataset corresponding to dry weather events (or excluding rainfall events). The cleaned synchronised normalised wastewater measurement dataset and corresponding environmental data and/or dry weather dataset may be used to form a plurality of individualised or unique training environmental and wastewater measurement feature sets for use in performing ML to generate and build the corresponding plurality of trained ML models 120i-a to 1201-s of ML model ensemble 120i for predicting data representative of mean or minimum and maximum wastewater thresholds for wastewater asset 104i when given the corresponding environment data instances (e.g. rainfall data, river level data, ground water data, flood water data, tidal data in relation to wastewater asset 1o4i) as input.
[oorm] For example, Item I) of the clean-up process may be configured to plot the mean values of the synchronised normalised wastewater measurement dataset on a dispersion graph for determining the normal dispersion of the wastewater asset 104i, where any blocks or portions of data that are widely outside (or are outliers) of the normal dispersion are removed from the synchronised normalised wastewater measurement dataset. In this example, as the dispersion graph is used to identify blocks of the mean values of synchronised normalised wastewater measurement dataset that are outliers, the corresponding blocks of maximum and minimum values of synchronised normalised wastewater measurement dataset are also removed. An example dispersion graph is illustrated in Figure 5.
[00172] Figure 5 illustrates a histogram dispersion graph 500 for an example synchronised normalised wastewater measurement dataset 502 (e.g. original data) for 6 months of sensor level measurements in which the mean values of the synchronised normalised wastewater measurement dataset are shown above rainfall data 504 (rainfall in mm). The synchronised normalised wastewater measurement dataset 502 is illustrated on a capacity plot 501 with the y-axis being a capacity percentage of the wastewater asset 104i and is plotted at 15 minute intervals along the x-axis over a 6 month period from August 2021 to January 2022. The rainfall data 504 is illustrated below on a rainfall plot 503 with the y-axis in mm of rainfall for every 15 minute interval, and rainfall is plotted at 15 minute intervals along the x-axis over the 6 month period from August 2021 to January 2022. The histogram dispersion graph 506 plots bins representing the number of occurrences of the synchronised normalised wastewater measurement dataset 502 on the y-axis and the capacity percentage (Capacity (%)) of each occurrence bin along the x-axis. As can be seen, the histogram dispersion graph indicates the majority of data sits at around 16-18 % capacity and falls away cleanly either side the 16-18% capacity with no outlier blocks or bins of data.
[00173] Figure 6 illustrates a histogram dispersion graph 600 for another example of synchronised normalised wastewater measurement dataset that has outlier blocks of measurements. The histogram dispersion graph 602 plots bins representing the number of occurrences of the example synchronised normalised wastewater measurement dataset on the y-axis and the capacity percentage (Capacity (%)) of each occurrence bin along the x-axis. As can be seen, the histogram dispersion graph indicates the majority of data 602 sits at around 6-8 % capacity and falls away cleanly either side the 6-8% capacity but has a second spike of bins representing one or more outlier blocks or bins of data 604 around the 20% capacity. These identified outlier blocks 604 indicate the sensor 108i is picking up incorrect sensor readings within the chamber of the wastewater asset 104i for the periods of time of the time series data associated with the data within these bins.
[00174] Outlier blocks in the synchronised normalised wastewater measurement dataset may occur based on phenomena within the location of the wastewater asset 104i of the wastewater network 102 (also referred to as storm water or sewer network). For example, the sensor 108i may measure something else in the wastewater asset 104i other than data representative of the wastewater levels or flow of wastewater passing through wastewater asset 104i that it should not be measuring. Phenomena that the sensor 1o8i should not be measuring, but may do so due to misalignment or debris in the wastewater asset 104i include, without limitation, for example iron steps/metal steps within the wastewater asset io4i allowing maintenance crew ingress/egress from the asset, the sensor beam may pick-up the side walls of the asset or other structural element of the asset 104i (e.g. some concreate obstacle inside the chamber where measurement is taking place), or even debris stuck in the wastewater asset laid in the path of the sensor beam and the like. These phenomena typically show up on the histogram dispersion graph as outlier blocks of data above and below the tails of the normal histogram dispersion graph shape. Thus, if there are data blocks or histogram bins outside the normal distribution range of the dispersion graph for the wastewater asset 104i, then these blocks of data or histogram bins are identified as outlier blocks, where the corresponding mean, maximum and minimum value blocks of the synchronised normalised wastewater measurement dataset are removed from the time series dataset.
[00175] Thus, identified outlier blocks are removed from the synchronised normalised wastewater measurement dataset to form a first cleaned synchronised normalised wastewater measurement dataset, where the corresponding mean, maximum and minimum value blocks of the synchronised normalised wastewater measurement dataset have been removed.
[00176] An example of performing Item 2) comprises filtering the first cleaned wastewater measurement dataset output from Item 1) based on performing statistical analysis and filtering the dataset to remove periods of null data, periods of impossible data and the like. This is because each sensor io8i may provide sensor readings that are not plausible, but which may not have been identified as outliers in item 1). Sets of rules may be used to determine which pieces of data are plausible and not plausible, and which of the determine pieces of data may be modified and/or removed. For example, in some cases implausible readings between plausible readings (e.g. high impossible levels or Null data e.g. o) may be modified by imputing or interpolating the implausible data based on data values around the implausible data value in the time series dataset. For example, interpolate between a previous and a next measurement data value in the time series dataset, the implausible data value is an isolated incident, but not if there is a prolonged period of implausible data values within the time series dataset, which may instead be removed. Although capacity is given as a percentage, which is greater than or equal to o, negative values may occur due to the sensor not being calibrated, thus negative data may be removed, and/or the entire dataset may be shifted and renormalized to remove negative data. Various rules may be defined for removing such sensor data. Thus, the first cleaned synchronised normalised wastewater measurement dataset may be analysed and filtered to remove implausible values in the first cleaned synchronised normalised wastewater measurement dataset to form a second cleaned synchronised normalised wastewater measurement dataset.
[00177] An example of performing Item 3) comprises identifying within the second cleaned synchronised normalised wastewater measurement dataset exclusion events that affect the accuracy or continuity of the measurements from sensor 108i, e.g. blockages, sensor changes, wastewater asset 104i cleaning and the like. Various phenomena may occur within the wastewater assets 104a-104m of wastewater network 102, in which they can occasionally get cleaned out (e.g. jetting) and dirt is cleaned from sewer and/or wastewater asset 104i. This may mean that the chord level starts from a different point lowering the wastewater level in one or more wastewater assets 104a-104m. Other phenomena include, without limitation, for example broken sensors, sensors being replaced with sensors having different calibration or even different type of sensor within the wastewater measurement dataset, changes to a pump set in which a portion of the wastewater network 102 (e.g. sewer) operates differently and/or any other phenomena that causes periods of the measurement data to be inconsistent with the remaining periods of the measurement data. These phenomena are classed as exclusion events and are identified and removed from the second cleaned synchronised normalised wastewater measurement dataset. These are exclusion events or periods may be labelled and removed from the second cleaned synchronised normalised wastewater measurement dataset. The labelling may be performed to enable a second manual check to be performed, should this be necessary.
[00178] A set of rules may be defined for identifying exclusion events, which are periods of time that are inconsistent with the rest of the measurement data, which are then labelled as exclusion events. For example, rules may be determined based on analysing daily and weekly mean levels on a rolling basis and identifying periods where the daily and weekly mean values sit outside the normal ranges. These periods may be identified as exclusion periods and used to form the set of rules identifying exclusion events for that particular wastewater asset.
[00179] The data covered by each exclusion event is removed from the dataset going forward. The exclusion event processing may include performing statistical analysis on the second cleaned synchronised normalised wastewater measurement dataset including, without limitation, for example defining various different exclusion rules that analyse statistics including averages per day, maximums and/or minimums per day, averaging statistics across one or more weeks, one or more months one or more years, to identify periods of time that are inconsistent with the regular hourly mean, daily mean, weekly mean, yearly mean and the like.
[00180] Exclusion rules may further include rules that are configured to identify non-consistent periods of data much longer than previous stage in the sequence, e.g. months at a time where that wastewater asset 104i/sensor 108i was being serviced, or some reason a different sensor is being used, where earlier data before the new or serviced sensor is labelled as an exclusion event and is not generally usable due to different calibration or different sensor etc. Thus, the exclusion rules and statistical analysis is used to identify exclusion events and remove these from the second cleaned synchronised normalised wastewater measurement dataset to form the cleaned synchronised normalised wastewater measurement dataset. Typically, exclusion processing analyses the most recent second cleaned synchronised normalised wastewater measurement dataset first and then moves backwards in the time series dataset to identify any regular patterns and whether these regular patterns are associated with exclusion events based on the statistical analysis and exclusion rules.
[00181] This exclusion event processing is performed so that each of the ML models 12th-a to 120i-s of ML model ensemble 12th is trained to track the normal wastewater level or flow behaviour through wastewater asset 104i rather than blockages or other exclusion events. Given this, wastewater measurement data typically can have some exclusion events/periods such as blockages for a few days or weeks, sensor recalibration, identified parts of data that are not consistent with what sensor 108i and wastewater asset 104i has been operating for the majority of its service time, and/or other exclusion events or periods. The exclusion event processing identifies and removes exclusion events/periods from the second cleaned synchronised normalised wastewater measurement dataset to form the cleaned synchronised normalised wastewater measurement dataset, which includes the remaining mean, minimum and maximum values after having those mean, minimum and maximum values removed during the sequence of processes of Items 1), 2) and 3).
[00182] Figure 7 illustrates plots 700 of an example second cleaned synchronised normalised wastewater measurement dataset 702 for a wastewater asset 104i and also a rainfall dataset 704 associated with the wastewater asset 104i prior to exclusion event processing. The second cleaned synchronised normalised wastewater measurement dataset 702 is illustrated on a capacity plot 701 with the y-axis being a capacity percentage of the wastewater asset 104i and is plotted at 15 minute intervals along the x-axis over a 5 year period between Jan 2017 to Jan 2022. The rainfall data 704 is illustrated below on a rainfall plot 703 with the y-axis in mm of rainfall for every 15 minute interval, and rainfall is plotted at 15 minute intervals along the x-axis over the 6 month period between Jan 2017 to Jan 2022.
[00183] The exclusion event process is performed by analysing the example second cleaned synchronised normalised wastewater measurement dataset 70 2 for the wastewater asset 104i from the most recent data and going backwards in time for identifying, based on the exclusion rules, which portions of the time series dataset is a regular pattern and hence an exclusion event 706a, 706b, 706c and which portions of the time series dataset are not a regular pattern or do not fit the exclusion rules and is likely data that exhibits normal sensor behaviour 708a, 7o8b, 708c. In this example, exclusion events/periods 706a, 7o6b and 706c have been identified by the exclusion processing for the time periods. The exclusion event/period 706a for 12 Sep 2020 to 05 Jan 2021 in which the time series data 702 exhibit a blockage exclusion pattern occurring. The exclusion event/period 706b for 26 Oct 2018 to 15 Mar 2019 in which the time series data 702 exhibits a changed sensor for a period of time or minor blockage exclusion pattern. The exclusion event/period 706c for 01 Jan 2017-22 March 2022 exhibits a sensor failure and new sensor being inserted type exclusion pattern. Thus, in this example, the identified exclusion events/periods 706a, 7o6b and 706c in the data of the example second cleaned synchronised normalised wastewater measurement dataset 702 are removed to form a cleaned synchronised normalised wastewater measurement dataset, which includes the remaining mean, minimum and maximum values after having those mean, minimum and maximum values of the exclusion periods 708a, 7o8b and 708c removed.
[00184] The synchronised environmental data instances 112a-112e (e.g. rainfall data 112a, river level data 112b, ground water level data 112C, flood level data 112d, tidal level data 112e, and/or other water ingress data sets) may also be updated based on the remaining time series data points of the cleaned synchronised normalised wastewater measurement dataset (i.e. the exclusion event periods and the like are removed from the synchronised environmental data instances 112a-112e).
[00185] As an example of performing Item 4), once the cleaned synchronised normalised wastewater measurement dataset of the sensor io8i for the wastewater asset 104i has been determined, a dry weather flow dataset for wastewater asset 104i may be determined from the corresponding cleaned synchronised normalised wastewater measurement dataset. This may also be performed by the data ingestion unit nob. The dry weather flow dataset for wastewater asset 104i is determined by removing from the cleaned synchronised normalised wastewater measurement dataset, which is timestamped, every day's worth of data when it is raining, and also zero, one, two, three or R days after rain depending on the behaviour of the wastewater asset 1o4i, where R may be chosen depending on how rainfall affects the flow of wastewater through asset 104i and may continue for o, 1, 2 or R days after the rainfall until the wastewater flow subsides to what is considered a normal dry weather flow for that wastewater asset 1o4i. This takes into account delayed sensor readings where rainfall rained but the rain is still going through the wastewater network 102, 1, 2, 3 or R days after the rainfall. The value R may be statistically and/or empirically determined and/or manually changed. The resulting dataset with is called the dry weather dataset for the wastewater asset 104i, where only the mean, minimum and maximum levels or flow for days when there is no rain affecting the wastewater asset 1041 of the wastewater network 102. Dry weather dataset may be used to train a dry weather ML model for predicting/calculating minimum and maximum wastewater flows during periods of dry weather. The dry weather ML model may be used as the dry weather ML model 152i in the ML model system 150 of figure [oo186] Alternatively or additionally, the dry weather dataset for wastewater asset 104i may be further processed into an average dry flow dataset for each day of the week (e.g. Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday) is calculated, in which the mean values of the dry weather dataset for each of the same days of the week over the dry weather dataset are averaged at each of the M interval time units for that day, where in this example, M=15 minutes. This builds an average dry weather profile for an average week for every day. Thus, all of the Mondays in the dry weather datasets gets added together at every M time interval (e.g. 15 minute period) and averaged, similarly for all of the Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and Sundays.
[oc187] The dry weather profile for every day of an average week may be further optimised by analysing the dry weather patterns over the dry weather dataset and selecting the best weeks and/or months, and only performing the above averaging process in which the mean values of the dry weather dataset for each of the same days of the week over only the best weeks/months of the dry weather dataset are averaged at each of the Minterval time units for that day, where in this example, M=15 minutes.
Dry weather profile for every day of an average week may form another type of dry weather dataset that can also be used to train a dry weather ML model for predicting/calculating minimum and maximum wastewater flows during periods of dry weather. The dry weather ML model may be used as the dry weather ML model 152i in the ML model system 150 of figure ii.
[o3188] Figure 8a is a schematic diagram illustrating an example rainfall grid 8130 for calculating, for example, a hyper-local rainfall data estimates R, R or Rk at wastewater assets 104i, 104j or 104k when given rainfall data R, in the rainfall area 8o2a the wastewater asset 104i, 104j or 104k is located within and rainfall data L-129 in the rainfall grid areas 802b-802i that are adjacent to rainfall grid area 8o2a. In this example, the center of each grid area 802a-802i is illustrated as being associated with the corresponding rainfall data R, to R9. Although rainfall data R, in rainfall grid area 802a may be used as the rainfall data R-Rk for wastewater assets 104i-104k located in rainfall grid area 8o2a, given that rainfall in adjacent rainfall grid areas 802b-802i may contribute to the rainfall data R,-Rk for each of wastewater assets 104i-104k depending on closeness of these assets 104i-104k to each of the adjacent grid areas 802b-802i, it may be more accurate to estimate hyper-local rainfall estimates R,-Rk that relates to the wastewater asset's 104i-104k location within rainfall grid area 8o2a based on at least three or more rainfall data 12,-129 of adjacent grid areas 8021.)-802i. This may improve the accuracy of the trained ML models 120i-a to 1201-s of ML model ensemble 1201 for wastewater asset 104i in predicting mean and/or maximum and minimum wastewater thresholds and the like, where the trained ML models 120i-a to 1201-s are scored and ranked based on ML performance metrics associated with prediction accuracy for use in detecting water ingress and/or pin pointing areas of the wastewater network 102 with high likelihood of water ingress for maintenance crews to investigate and repair.
That said, should rainfall data R-R9 for rainfall grid areas 802b-802i not be available then the rainfall data R, in the rainfall grid area 8o2a that the wastewater assets 104i-104k are located may be used.
[03189] For example, the hyper-local rainfall data R at wastewater asset 104i may be determined based on the rainfall datasets 12,-129, which may be received from an external operator such as a weather service provider/weather office/organisation of a country that the wastewater asset 104i is located. The weather service provider may have a weather service that uses measuring apparatus (e.g. rainfall meters or satellite or radar systems) configured for estimating the rainfall on a per X km2 basis over regular/periodic time intervals of M time units (e.g. M=15 minutes) each. For example, the weather office (MET office) may make rainfall predictions over a predetermined grid covering a country, state, county or geographic area divided into grid square areas of a certain area such as, without limitation, for example a 1 km area, 1.5 km area, 2km area or any X km area, X>o. Each of the sensors io8a-io8m or wastewater assets 104a-ro4m may be located within one or more of the grid square areas. The following may be applied to any of the wastewater assets io8a-io8m of wastewater network 102 to estimate/calculate a hyper-local rainfall dataset for each of the wastewater wastes io8a-io8m.
[001901 In figure 8a, the wastewater sensor io8i for wastewater asset io4i is located within rainfall grid area 8o2a. A hyper-local rainfall estimate at wastewater asset io4i may be calculated when given rainfall data R, in the rainfall grid area 8o2a that the wastewater asset io4i is located within, and rainfall data R2-R9 in the rainfall grid areas 8o2b-8o2i that are adjacent to rainfall grid area 8o2a. For example, the calculation calculates a weighted combination of at least three of the rainfall data R1-R9 based on how close the wastewater asset 1o4i is to each of the adjacent corresponding rainfall grid areas 8o2b-8o2i and where the wastewater asset 1o4i is located within the rainfall grid area 8o2a. For example, the calculation of the hyper-local rainfall data R, at wastewater asset 1o4i may be based on performing a multivariate interpolation (e.g. two or three dimensional, tri-cubic or tri-linear interpolation, or nearest neighbour interpolation) or any other interpolation /averaging method or process to determine the hyper-local rainfall datasetRiat the location of the wastewater asset 104i based on the rainfall dataset R, covering the rainfall grid area 8o2a the wastewater asset 104i is located in and at least three other rainfall datasets 122-R9 associated with adjacent rainfall grid areas 8o2b-8o2i to the rainfall grid area 8o2a the wastewater asset 1o4i is located within. For example, for the wastewater asset 1o4i that is located within the rainfall grid area 8o2a, it may be determined how close the wastewater asset 1o4i is located to the borders of each of the adjacent and/or diagonally adjacent rainfall grid areas 8o2b-8o2i, and based on these distances, or proportions thereof, calculating a weighting of the rainfall data R,-R9 for use in estimating the hyper-local rainfall data R; at the wastewater asset 1o4i location.
[00191] The hyper-local rainfall data R; that is calculated for wastewater asset 104i may be used in place of the historical rainfall data R, when training the ML models 1201-a to moi-s of ML model ensemble 1201 and/or as an estimate of the current rainfall at a particular time instance when input to trained ML models noi-a to 1201-s of ML model ensemble 120i for predicting mean or minimum and maximum wastewater thresholds, which are scored, ranked and used in detecting whether water ingress is occurring and the type of water ingress occurring at wastewater asset 1o4i. The hyper-local rainfall data 12; and Rk may also be calculated in a similar manner for wastewater assets 104j and 104k, and/or any of wastewater assets 1o4a-to4m with sensors 1o8a-1o8k and applied when training corresponding ML models of the ML model ensembles moa-mom for predicting corresponding mean or minimum and maximum wastewater thresholds and the like.
[00192] Although the rainfall data R2-R9 of all grid areas 8o2b-8o2i adjacent to grid area 8o2a may be used, along with rainfall data 121 of grid area 8o2a, in calculating the hyperlocal rainfall data Rh and/or Rk for wastewater assets 1o4i, 104j and/or 104k, if may be unnecessary to use all rainfall data R2-12,9 of grid areas 8o2b-8o2i, rather a selection of the grid areas 8o2b-8o2i adjacent to grid area 8o2a that are closest to each of the wastewater assets 104a-104k may be applied to estimate / calculate the corresponding hyperlocal rainfall R; and/or Rk for each of wastewater assets 104i, 104j, and/or 104k. For example, as illustrated in Figure 8a, the center of each grid area 8o2a-8o2i may be assumed to have rainfall data R, to and each grid areas 8o2a- 8o2i may be divided into four equal quadrants. in this example, figure 8a illustrates the grid area 8o2a being divided into grid quadrants 8o2a-1 to 8o2a-4. Each of the wastewater assets 104i-104k located within grid area 8o2a may be identified to be located within one of the grid quadrants 8o2a-1 to 8o2a-4. In this example, wastewater asset 104i is located within grid quadrant 8o2a-2, wastewater asset 104j is located within grid quadrant 802-3, and wastewater asset 104k is located within grid quadrant 8o2a-1. Once a grid quadrant of a grid area that a wastewater asset is located within is identified, then the rainfall data of those grid areas adjacent to the grid quadrant of the grid area the wastewater asset is located within are selected for use, along with the rainfall data of the grid area the wastewater asset is located within, in the calculation / estimate of the hyperlocal rainfall data associated with that wastewater asset.
[00193] For example, for wastewater asset 104k located within grid area 8o2a, it can be identified that wastewater asset 104k is located within grid quadrant 8o2a-1 of the grid area 8o2a. Given this, then the rainfall data R2, R3 and R4 of those grid areas 8o2b, 8o2c and 8o2d adjacent to grid quadrant 8o2a-1 are selected for use, along with rainfall data Ri of grid area 8o2a, in estimating / calculating the hyperlocal rainfall data estimate Rk of wastewater asset 104k within grid area 8o2a. Similarly, for wastewater asset 104i, which is located within grid quadrant 8o2a-2, then the rainfall data R2, R, and R.8 of those grid areas 8o2b, 8o2i and 8o2h adjacent to grid quadrant 8o2a-2 are selected for use, along with rainfall data Ri of grid area 802a, in estimating / calculating the hyperlocal rainfall data estimate R; of wastewater asset 104i within grid area 8o2a. As well, for wastewater asset 104j, which is located within grid quadrant 8o2a-3, then the rainfall data R6, R7 and R8 of those grid areas 802f, 8o2i and 8o2h adjacent to grid quadrant 8o2a-2 are selected for use, along with rainfall data Ri of grid area 8o2a, in estimating / calculating the hyperlocal rainfall data estimate R of wastewater asset umi within grid area 802a. The reduction in the amount of rainfall data from adjacent grid areas that are selected reduces the computational requirements of the interpolation and/or averaging process used when calculating /estimating each hyperlocal rainfall dataset for each of the wastewater assets 104a-io4m of wastewater network 102.
[00194] Figures Sb to Se illustrate an example hyper-rainfall calculation Sioa- 81oe using interpolation and averaging for estimating / calculating the hyperlocal rainfall Rk dataset for wastewater asset 104k located within grid area 8o2a and grid quadrant 8o2a-1. This may be applied for each of the wastewater assets 1o4a-1o4m of wastewater network 102. As described, the grid areas 8o2b-8o2d adjacent to grid quadrant 802-la that wastewater asset 104k is located within are selected for use in calculating the hyper-local rainfall estimate Rk of wastewater asset 104k. In this example, the center of each grid area 8o2a-8o2d is labelled with the corresponding rainfall data R, to R4 that is to be used for estimating / calculating the hyper-local rainfall dataset Rk of wastewater asset 104k. Referring to figure 8b, a first part of the hyper-rainfall calculation 810a is illustrated in which the center of each grid area 8o2a8o2d (e.g. solid dots labelled R,-R4) are joined with four line segments 812a-812d (e.g. illustrated as dash-dot lines in figure 8b) to form a rectangle (or square) within which wastewater asset 104k is located. For example, a first line segment 812a is formed between center Ri of grid area 802a and center R2 of grid area 8o2b, a second line segment 812b is formed between center R2 of grid area 8o2b and center R3 of grid area 8o2c, a third line segment 812c is formed between center R3 of grid area 8o2c and center R4 of grid area 8o2d, and a fourth line segment 812d is formed between center R4 of grid area 8o2d and center Ri of grid area 8o2a. The first, second, third and fourth line segments 812a-812d (e.g., the dashed-dot lines in figure 8a) form a rectangle (or square) within which wastewater asset 104k is located.
[00195] Referring to figure 8c, a second part of hyperlocal rainfall calculation 810b is illustrated in which the location of the wastewater asset 104k within the rectangle is projected onto each of the first, second, third and fourth line segments 112a-112d in which the locations 816a-816d of the projected location of wastewater asset 104k on each of the line segments 112a-112d is used to calculate first, second, third and fourth rainfall estimates L, 14, R., and Rd. For example, a first projected line 814a orthogonal to line segments 112a and 112c is projected from the center of wastewater asset 104k until the first projected line 814a intersects at a first and third intersection locations 816a and 816c on the first and third line segments 112a and 112c, respectively. The intersection of the first projected line 814a and first line segment 112a is used as the first intersection location 816a in which to calculate the first rainfall estimate L, and the intersection of the first projected line 814a with the third line segment 112c is used as a third intersection location 816c in which to calculate a third rainfall estimate L. Similarly, a second projected line 814b orthogonal to second the fourth line segments 112b and 112d is projected from the center of wastewater asset 1o4k until the second projected line 814b intersects at second and fourth intersection locations 816b and 816d on the second and fourth line segments 112b and 112d, respectively. The intersection of the second projected line 814b and second line segment 112b is used as the second intersection location 8161) in which to calculate the second rainfall estimate Rh, and the intersection of the second projected line 814b with fourth line segment 112d is used as the fourth intersection location 816d in which to calculate the fourth rainfall estimate Rd.
[00196] Referring to figure 8d, a third part of hyperlocal rainfall calculation 810c for wastewater asset 104k is illustrated in which the distances from the pairs of grid centers forming each line segment to the intersection location on said each line segment are determined, where these distances are used to interpolate and estimate the rainfall at the intersection location using the rainfall data associated with the corresponding grid centers. For example, linear interpolation may be applied to determine a weighted average based in the distances and the rainfall data associated with each of the centers of the grid areas may be used to estimate the rainfall data at the intersection location. The weighting of the rainfall data associated with each of the centers of the grid areas are inversely related to the distance from the centers of the grid areas (e.g. end points of the line segment) to the unknown rainfall data at the intersection location on said line segment, where the rainfall data associated with the center of the grid area that is closer to the intersection location has more influence than the other center of the grid area that is farther away from the intersection location.
[00197] In the example illustrated in figure 8d, the distance from each of the pair of grid centers R, and 122 forming the first line segment 812a to the first intersection location 816a is determined, with the distance from grid center R, to the first intersection location 816a being determined as distance d," (e.g. in km or m), and the distance from grid center R2 to the first intersection location 816a being determined as distance d2. (e.g. in km or m). Using linear interpolation based on the rainfall data R, and R2 and the distances dia and (12. from each grid center of the first line segment 812a to the intersection location 816a, an estimate for rainfall data Ra may be calculated based on: R. = (R, x cha + R2 x d,a) / (d1a + &J.
[00198] Similarly, the distance from each of the pair of grid centers R2 and R3 forming the second line segment 812b to the second intersection location 816b is determined, with the distance from grid center R2 to the second intersection location 816b being determined as distance d2b (e.g. in km or m), and the distance from grid center R3 to the second intersection location 816b being determined as distance (131, (e.g. in km or m). Using linear interpolation based on the rainfall data Ro and R3 and the distances doh and d3b from each grid center of the second line segment 812b to the intersection location 816b, an estimate for rainfall data Rb may be calculated based on: Rb = (R2 x d3b + R3 x d2b) (d2b d3b).
[00199] Similarly, the distance from each of the pair of grid centers R3 and R2 forming the third line segment 812c to the third intersection location 816c is determined, with the distance from grid center R3 to the third intersection location 816c being determined as distance d3" (e.g. in km or m), and the distance from grid center R4 to the third intersection location 816c being determined as distance d4, (e.g. in km or m). Using linear interpolation based on the rainfall data R3 and R4 and the distances d3" and d, from each grid center of the third line segment 812c to the intersection location 816c, an estimate for rainfall data Re may be calculated based on: = (R3 x d, + R4 x d33 ((Lc d4c).
[00200] Similarly, the distance from each of the pair of grid centers R4 and R, forming the fourth line segment 812d to the fourth intersection location 816d is determined, with the distance from grid center R4 to the fourth intersection location 816d being determined as distance d4d (e.g. in km or m), and the distance from grid center Ri to the fourth intersection location 816d being determined as distance did (e.g. in km or m). Using linear interpolation based on the rainfall data R4 and Ri and the distances d4d and did from each grid center of the fourth line segment 812d to the intersection location 816d, an estimate for rainfall data Rd may be calculated based on: Rd = (R4 X d4d + R4 x did) / (did + d4d).
[00201] Referring to Figure 8e, a fourth part 81od of hyperlocal rainfall calculation 81od for wastewater asset 104k is illustrated in which the pairs of intersection locations are used to form two line segments 814a and 814b intersecting the wastewater asset 104k. For each of these line segments 814a and 814b distances from the corresponding pairs of intersection locations to the wastewater asset 104k are determined. For each of the line segments 814a and 814b, the distances are used to interpolate and estimate the first and second intermediate rainfall estimates Rae and Rhd at the intersection location of wastewater asset 104k using the first and third rainfall data estimates Ra and Re and the second and fourth rainfall data estimates Rb and Rd, respectively, that were calculated for the corresponding intersection locations. The resulting intermediate rainfall data estimates Rai and Rhd associated with the two line segments 814a and 814b may then be averaged to form the hyperlocal rainfall dataset Rk for wastewater asset 104k.
[00202] In the example illustrated in figure 8e, the distance from each of the pair of intersection locations 816a and 816c forming the first intersection line segment 814a is determined, with the distance from the first intersection location 816a to the wastewater asset 104k on the first intersection line segment 814a being determined as distance dak (e.g. in km or m), and the distance from the third intersection location 816c to the wastewater asset 104k on the first intersection line segment 814a being determined as distance dek (e.g. in km or m). Using linear interpolation based on the first and third rainfall data estimates Ra and Re calculated in the third part of the hyperlocal rainfall calculation 810c for wastewater asset 104k and the distances dak and dr% an estimate for first intermediate rainfall data estimates Ra, may be calculated based on: Rae = (Ra x + R x dak) / (ddb + dek).
[00203] Similarly, the distance from each of the pair of intersection locations 816b and 816d forming the second intersection line segment 814b is determined, with the distance from the second intersection location 816b to the wastewater asset 104k on the second intersection line segment 814b being determined as distance dbk (e.g. in km or m), and the distance from the fourth intersection location 816d to the wastewater asset 104k on the second intersection line segment 814b being determined as distance ddk (e.g. in km or m). Using linear interpolation based on the second and fourth rainfall data estimates Rb and Rd calculated in the third part of the hyperlocal rainfall calculation 81oc for wastewater asset 104k and the distances dbk and ddk, an estimate for second intermediate rainfall data estimates Rbd may be calculated based on: Rbd = (RI, x ddk + Rd x dbk) / (dbk ddk).
[00204] Once the first and second intermediate rainfall data estimates Rac. and Rbd have been calculated, the hyperlocal rainfall data estimate Rk for wastewater asset 104k may be calculated based on: Rk = (Rae Rbe) / 2. The hyperlocal rainfall calculation outlined in figures 8b to 8e may be performed on each corresponding rainfall measurement of rainfall datasets R, R2, R3, and RI to form the hyperlocal rainfall estimate dataset Rk for wastewater asset 104k of wastewater network 102.
Although figures 8b to 8e describe a hyperlocal rainfall calculation for hyperlocal rainfall dataset Rk for wastewater asset 104k, this is by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that the hyperlocal rainfall calculation outlined in figures 8b to 8e may be applied to each of the wastewater assets 1o4a-1o4m of wastewater network 102.
[00205] The ingestion unit nob of water ingress detection apparatus no may perform the above methodologies for generating historical wastewater and environmental data measurements. When the wastewater measurements 114 from the sensors 1o8a-io8m of each of the wastewater assets 1o4a-io4m of the wastewater network 102 are received by the ingestion unit nob, the ingestion unit nob performs the above methodologies separately for each of the wastewater assets w4a-1o4m. For each wastewater asset io4i of the wastewater network 102, there are two sets of data, cleaned synchronised normalised wastewater measurement dataset of the sensor io8i for the wastewater asset 104i and updated environmental datasets including, without limitation, for example corresponding hyperlocal rainfall data Ri (or, if unavailable, rainfall data R,) and/or one or more different types of water ingress datasets such as, for example, one or more river water levels, ground water levels, flood water levels, tidal levels and/or other types of water ingress datasets. These datasets are time series datasets that the ingestion unit nob synchronises in time series at the hyperlocal rainfall data R, (or rainfall data R1) time interval of Mtime units (e.g. 15 minutes) between adjacent data value time instances. It is noted that the cleaned synchronised normalised wastewater measurement dataset of the sensor 108i for the wastewater asset 104i is a time series dataset may include at least three data values per time instance in the time series, namely, a mean value, maximum value and minimum value representative of the wastewater level or flow. In this example, these data values are represented as a capacity percentage, a skilled person would understand that any other normalisation may be applied as the application demands.
[00206] The following describes some of the operations and/or processes performed by the ML unit ima for training one of the ML models izoi-a of an ML model ensemble 120i for wastewater asset 104i, this is by way of example only and the invention is not so limited, the skilled person would appreciate that these operations and/or processes may be used for training each of the plurality of ML models 120i-a to 1201-s for ML model ensemble 120i for wastewater asset 104i and also for each of the plurality of ML models of the ML model ensembles 120a-120m of each of the wastewater assets 104a-104m of wastewater network 102, and/or used for updating each of the ML models of the ML model ensembles 120a-120m of the wastewater assets 104a-104m of wastewater network 102. Operations 208 and/or process 300 of figures 2 and 3 may be used to train an M L model of ML model ensemble 120i for wastewater asset 104i to predict data representative of the mean wastewater levels and/or maximum and minimum wastewater levels given a rainfall data instance at time ti. The ML algorithm that is used to train the model parameters of the ML model 126-a of ML model ensemble 120i may include one or more ML algorithms from the group of: regression algorithms or boosting algorithms (e.g. XGBoost, Adaboost, gradient boost regressor and the like), bagging algorithms, neural networks, and/or any other ML algorithm capable of tracking the behaviour of the mean, minimum and/or maximum wastewater levels or flow through wastewater asset 104i given rainfall data and/or including any other type of environmental data such as, without limitation, river level data, ground water level data, tidal level data, or other water ingress dataset and the like.
[00207] Although in this example the training environmental and wastewater measurement feature set used to train the ML model 120i-a may include the normalised wastewater measurements of wastewater asset 104i and an environmental dataset corresponding to rainfall data associated with the wastewater asset 104i (or hyper local rainfall data of wastewater asset 104i or rainfall data in the rainfall grid area the wastewater asset 104i is located within). This is used, for simplicity and by way of example only and the invention is not so limited, it is to be appreciated by the skilled person that a plurality of training environmental and wastewater measurement feature sets may be created, where in addition to rainfall data (or hyper local rainfall data of wastewater asset 104i or rainfall data in the rainfall grid area the wastewater asset 104i is located within), other one or more types of environmental time series datasets are included in each of the plurality of training environmental and wastewater measurement feature sets. These other training environmental and wastewater measurement feature sets with different combinations of one or more type of water ingress datasets may be used for training further ML models 120i-b to 120i-s of the ML model ensemble 1201 for wastewater asset io4i. For example, the one or more types of water ingress datasets may include, without limitation, for example one or more combinations of river level time series data, ground water level time series data, and tidal level time series data, and/or other types of water ingress time series data from other sources, in which each time series dataset is synchronised to the time series rainfall dataset of each training environmental and wastewater measurement feature set. it is noted that the time series rainfall dataset is included in all training environmental and wastewater measurement feature sets.
[00208] The training environmental and wastewater measurement feature sets may include historical environmental and wastewater measurement data that may cover a certain period of time in which water ingress may be suspected or over a windows time period between certain dates/days or even a scheduled time period or time window including the most recent historical environmental and wastewater measurement time series data for wastewater asset 104i, and/or any other time period or time window as the application demands. However, the time period or time window for each training environmental and wastewater measurement feature set of the plurality of training environmental and wastewater measurement feature sets used to train the plurality of ML models 120i-a to 120i-s of ML model ensemble 1201 of wastewater asset 104i.
[00209] The first ML model 120i-a of ML model ensemble 1201 may be trained on a first training environmental and wastewater measurement feature set that only includes rainfall data only, thus the first training environmental and wastewater measurement feature set includes two sets of data, the cleaned synchronised normalised wastewater measurement dataset of the sensor io8i for the wastewater asset 104i and corresponding hyperlocal rainfall data R., calculated for the wastewater asset 104i (or simply rainfall data R, of the rainfall grid area that the wastewater asset 1041 is located in). The cleaned synchronised normalised wastewater measurement dataset of the sensor io8i for the wastewater asset io4i is a time series dataset that may include three data values per time instance in the time series, namely, a mean value, maximum value and minimum value representative of the wastewater level or flow. In this example, these data values are represented as a capacity percentage, but a skilled person would understand that any other normalisation may be applied as the application demands.
[00210] A second ML model 120i-b of ML model ensemble 120i may be trained on a second training environmental and wastewater measurement feature set that includes rainfall data and river level data, thus the second training environmental and wastewater measurement feature set includes three sets of data, the cleaned synchronised normalised wastewater measurement dataset of the sensor io8i for the wastewater asset 104i, corresponding hyperlocal rainfall data R calculated for the wastewater asset 104i (or simply rainfall data R, of the rainfall grid area that the wastewater asset 104i is located in), and a synchronised river level dataset.
[00211] A third ML model 1201-C of ML model ensemble 1201 may be trained on a third training environmental and wastewater measurement feature set that includes rainfall data and ground water level data, thus the third training environmental and wastewater measurement feature set includes three sets of data, the cleaned synchronised normalised wastewater measurement dataset of the sensor 1081 for the wastewater asset 1041, corresponding hyperlocal rainfall data R; calculated for the wastewater asset io4i (or simply rainfall data Ri of the rainfall grid area that the wastewater asset io4i is located in), and a synchronised ground water level dataset.
[00212] A fourth ML model 1201-d of ML model ensemble 1201 may be trained on a fourth training environmental and wastewater measurement feature set that includes rainfall data and flood water level data, thus the fourth training environmental and wastewater measurement feature set includes three sets of data, the cleaned synchronised normalised wastewater measurement dataset of the sensor 108i for the wastewater asset 104i, corresponding hyperlocal rainfall data R, calculated for the wastewater asset 104i (or simply rainfall data R, of the rainfall grid area that the wastewater asset 104i is located in), and a synchronised flood water level dataset.
[00213] A fifth ML model 120i-e of ML model ensemble 1201 may be trained on a fifth training environmental and wastewater measurement feature set that includes rainfall data and tidal level data, thus the fifth training environmental and wastewater measurement feature set includes three sets of data, the cleaned synchronised normalised wastewater measurement dataset of the sensor 1081 for the wastewater asset 104i, corresponding hyperlocal rainfall data R, calculated for the wastewater asset 104i (or simply rainfall data R, of the rainfall grid area that the wastewater asset 104i is located in), and a synchronised tidal level dataset.
[00214] One or more further ML models 120i-f to 120i-s of ML model ensemble 120i may be trained on one or more further training environmental and wastewater measurement feature sets each of which may include environmental data comprising rainfall data and either: a) one of a plurality other types of water ingress datasets; or b) one or more combinations of river level data, ground water data, flood water data, tidal level data, and other types of water ingress datasets, and also corresponding clean synchronised normalised wastewater measurement dataset of sensor 108i for wastewater asset 104i. Thus a plurality of training environmental and wastewater measurement feature sets may be generated for training the plurality of MT. models 120i-a to 120i-s of ML model ensemble 120i, where each of the plurality of training environmental and wastewater measurement feature sets are unique. That is there no duplicate combinations of rainfall data, wastewater measurement data, and types of water ingress datasets. All of the plurality of training environmental and wastewater measurement feature sets for wastewater asset 104i include rainfall data and clean synchronised normalised wastewater measurement for wastewater asset 104i.
[00215] Each of the ML models 120i-a to 1201-s of ML model ensemble 1201 may be trained on corresponding training environmental and wastewater measurement feature set as described above. Firstly an ML algorithm needs to be selected for use in training the ML models 120i-a to 1201-s. In this example, the ML algorithm that was found to be very successful in tracking the behaviour of wastewater levels or flow through wastewater asset 104i for water ingress detection may be chosen from the regression or boosting type family of ML algorithms including, but not limited to, XGBoost, Adaboost, gradient boost regressor ML algorithms. However, any other type of ML algorithm may be used that is capable or suitable for tracking the behaviour of wastewater levels or flow through wastewater asset tozid for water ingress detection (or other applications) using time series training environmental and wastewater measurement feature sets as described above.
[00216] Prior to training the model parameters for each of ML models 120i-a to 1201-s of ML model ensemble 1201 of wastewater asset 1041 using the chosen ML algorithm (e.g. XGBoost, Adaboost, gradient boost regressor), a set of hyperparameter grid ranges is required to be selected for use in performing the hyperparameter grid search for finding the optimal hyperparameters for use in training each of the ML models 120i-a to 1201-s of ML model ensemble 120i. Hyperparameters are settings or model parameters whose values are set before training that affect how an ML model is trained. Various hyperparameters for the above chosen ML algorithm may include, without limitation, for example, rainfall durations or time windows, learning rates, base estimator, number of estimators, and the like.
[00217] An example of hyperparameter ranges for the number of estimators include, without limitation, for example [100, 200, 400, 600] or any other suitable number of estimators value. An example of hyperparameter ranges for learning rates include, without limitation, for example [0.1, 0.3, 0.6, o.8] and/or any other suitable learning rate value etc. The rainfall input durations (or rainfall time windows) represent the amount of previous rainfall data in the time series from the current rainfall data instance that is taken into account and input as current rainfall data during training and/or inferencing. For example, the rainfall duration may affect the wastewater level or flow for 1 day, 2 days, 3 days, 5 days, 10 days or any number of R days. For example, hyperparameter ranges for rainfall duration values (in days) may include, without limitation, for example [o, 1, 2, 3, 5,10], where this means that for each 15 minute rainfall data instance that is input for training and/or inference then either: only the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference; the previous 1 days of rainfall from the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference; the previous 2 days of rainfall from the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference; the previous 3 days of rainfall from the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference; the previous 5 days of rainfall from the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference; the previous 10 days of rainfall from the 15 minute rainfall data instance is used as the current rainfall data input for training and/or inference, and so on. The rainfall duration or time window is used to determine how rainfall affects the wastewater level or flow at wastewater asset 104i. Thus, a hyperparameter grid search is performed for each of the ML models 120i-a to 12oi-s of the ML model ensemble 120i for wastewater asset 104i. When performing the hyperparameter grid search the best performing ML model for each of the ML models 1201-a to 120i-s of the ML model ensemble 120i will have the optimal settings for rainfall duration, which will also be applied when inputting the corresponding environmental datasets to each of the ML models 120i-a to 120i-s of the ML model ensemble 120i.
[00218] As described with reference to operation 208 and process 300 of figures 2 and 3, a hyperparameter grid search is performed on each of the ML models 120i-a to 120i-s of ML model ensemble 120i using a chosen ML algorithm (e.g. e.g. XGBoost, Adaboost, gradient boost regressor, or other suitable ML algorithm). For each ML model 120i-a, the hyperparameter grid search trains model parameters for generating a plurality of sets of ML models (each set of ML models including a mean ML model, minimum ML model and maximum ML model) using all combinations of the selected hyperparameters (e.g. rainfall input duration, learning rate, base estimator, number of estimators and other hyperparameters) along with the corresponding training environmental and wastewater measurement feature set (i.e. training dataset) for that ML model 120i-a. Each set of ML models generated for ML model 120i-a may include a mean ML model, minimum ML model and a maximum ML model each of which having been trained using the mean, minimum and maximum time series data sets of the cleaned synchronised normalised wastewater measurement dataset, respectively, and the corresponding environmental datasets included in the corresponding training environmental and wastewater measurement feature set for ML model i2oi-a. The hyperparameter grid search is performed for each of the ML models 120i-a to 1201-s based on their corresponding training environmental and wastewater measurement feature sets. Thus a plurality of ML models for each of ML models 120i-a to 120i-s are generated.
[00219] For each of the ML models i2oi-a to 120i-s of the ML model ensemble 120i, all of the generated plurality of ML models are ranked and scored against the validation data based on model performance statistics such as, without limitation, for example RMSE, MSE and/or MAE or other model performance metric. For example, there may be a training/validation data split for each of the corresponding training environmental and wastewater measurement feature sets, with a first part for training and a second part for validation / scoring (e.g., train on one part, score on another part, or if small dataset train on all and score on all etc.) or by any ML training/validation techniques as is well known by the skilled person. In this example, the RMSE and MSE is used to score the plurality of sets of ML models for each of the ML models 120i-a to 120i-s of ML model ensemble 120i. The best performing ML model having the best RMSE and MSE from the ranked plurality of sets of ML models for each of the ML models 120i-a to 120i-s (e.g., it could be one of a mean ML model, maximum ML model or even minimum ML model) is selected.
[00220] Thus, the ML model ensemble 120i for wastewater asset 104i has a plurality of trained ML models 120i-a to 120i-s, in which each trained ML model 120i-a of the plurality of trained ML models 120i-a to 120i-s has been selected as the ML model with the best RMSE and MSE scoring of the plurality of sets of ML models generated for said each trained ML model 120i-a in the above hyperparameter search. Once again, the plurality of trained ML models 120i-a to 120i-s of the ML model ensemble 1201 may be scored and ranked based on, without limitation, the RMSE and MSE, which provides measure of a ML models prediction accuracy. The topmost ranked or best performing ML model of the plurality of trained ML models -1201-a to 120i-s of the ML model ensemble 120i may be selected as the trained ML model that best predicts the wastewater flow through the wastewater asset 104i.
[00221] The water ingress detection unit noc may use the selected trained ML model and the corresponding training environmental and wastewater measurement data feature set used to train the selected trained ML model for identifying whether wastewater asset 104i suffers from water ingress, and if so, the one or more types of water ingress that may be associated with wastewater asset 1041 For example, if the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be the first training environmental and wastewater measurement data feature set, then it is determined that wastewater asset 104i is not experiencing water ingress as the first training environmental and wastewater measurement data feature set only includes rainfall data. If the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be the second training environmental and wastewater measurement data feature set, then it is determined that wastewater asset 104i is experiencing water ingress from a river associated with the river level dataset because the second training environmental and wastewater measurement data feature set only includes rainfall data and river level data associated with the river. if the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be the third training environmental and wastewater measurement data feature set, then it is determined that wastewater asset io4i is experiencing water ingress from ground water seepage associated with the ground water level dataset because the fourth training environmental and wastewater measurement data feature set only includes rainfall data and ground water level data associated with the river. If the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be the fourth training environmental and wastewater measurement data feature set, then it is determined that wastewater asset io4i is experiencing water ingress from flood water of a flood water source associated with the flood water level dataset because the fourth training environmental and wastewater measurement data feature set only includes rainfall data and flood water level data associated with the flood water source. If the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be the fifth training environmental and wastewater measurement data feature set, then it is determined that wastewater asset 104i is experiencing water ingress from tidal sea water associated with the tidal level dataset because the fifth training environmental and wastewater measurement data feature set only includes rainfall data and tidal level data associated with the tidal sea water source. if the training environmental and wastewater measurement data feature set used to train the selected trained ML model is identified to be a training environmental and wastewater measurement data feature set from the one or more further training environmental and wastewater measurement feature sets each of which may include environmental data comprising rainfall data and either: a) one of a plurality other types of water ingress datasets; or b) one or more combinations of river level data, ground water data, flood water data, tidal level data, and other types of water ingress datasets, and also corresponding clean synchronised normalised wastewater measurement dataset of sensor 108i for wastewater asset 104i, then it is determined that wastewater asset ro4i is experiencing water ingress associated with either: a) one of a plurality of other types of water ingress datasets; or b) one or more combinations of river level data, ground water data, flood water data, tidal level data, and other types of water ingress datasets. The identified one or more types of water ingress are detected by examining the types of water ingress datasets within the training environmental and wastewater measurement feature sets used to train the selected ML model associated with the wastewater asset 104i.
[00222] As an option, the selected trained ML model may be used to build an ML model for wastewater asset 104i based on identifying the corresponding mean, minimum and maximum ML models from the set of ML models that the selected trained ML model is associated with. This may be used to build a final ML model for wastewater asset 104i, which may be based on any of the ML models for predicting wastewater flow through wastewater asset 104i as described with reference to figures if to 8e and the like. For example, the hyperparameters used for training the selected ML model may be used to train the corresponding mean, minimum and maximum ML models, and so the rainfall duration hyperparameter from these hyperparameters is used to form the input to the final ML model. That is, the current rainfall data that is input to the final ML model 12oi includes the current rainfall data instance at time ti (e.g., M=15 minutes) and also the previous rainfall data instances within the rainfall duration period (or rainfall time window). Thus, the final ML model for wastewater asset io4i may take as input the environmental data associated with the training environmental and wastewater measurement data feature set used to train the selected ML model, which includes current rainfall data and, if any water ingress datasets are included, the corresponding current environmental data associated said waster ingress datasets (e.g. river level, ground water level, flood water level, tidal level, and/or any other type of water ingress level etc.) The above processes and procedures may be performed at each of the wastewater assets io4a-io4m to determine, if any, the types of water ingress at each of the wastewater assets 104a-104m of wastewater network 102 and/or to build an ML model for predicting mean wastewater flow or predicting minimum and maximum wastewater thresholds for each of those wastewater assets 104a-104.m of [00223] Figure 9 illustrates an example plot 900 illustrating an example trained ML model output for representing normal wastewater flow or level 902 for an example wastewater asset of a wastewater network according to some embodiments of the invention. In this example, the trained ML model is configured to predict minimum and maximum wastewater thresholds 9o8a and 908b when rainfall measurement data 904 is input to the trained ML model. On this plot 900, the x-axis represents time duration over a time period of 2 months from 00:00 29 June 2020 to 12:00 03 July 202o, and the left y-axis represents normalised wastewater level or flow measurements 902 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset to which the sensor is calibrated. On this plot goo, rainfall measurement data 904 in relation to the example wastewater asset is also plotted, where the right y-axis of plot 900 represents rainfall in millimetres (mm). The normalised wastewater measurements 902 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 904 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service everyM time units (e.g., every 15 minutes) for the example wastewater asset as described with reference to figures 8a-8e and/or as herein described. The plot 90o also illustrates that a capacity of i00% represents an overflow level 906, whereby when the wastewater asset reaches i00% capacity wastewater may be redirected via an overflow mechanism or pipe away from the wastewater asset site to prevent flooding and the like.
[430224] In operation, the sensor of the example wastewater asset sends raw sensor readings representing wastewater level measurements measured in the chamber of the example wastewater asset to ML water ingress detection apparatus no, which the ingestion unit nob normalises each received sensor reading into a normalised wastewater measurement 902 (e.g., normalised as a capacity percentage). The ingestion unit nob may also generate, at each time instance of Mtime units (e.g., 15 minutes), a rainfall data based on either: a) rainfall data periodically received from the weather service every M time units (e.g. every 15 minutes) for the example wastewater asset; b) a hyper-local rainfall estimate calculated from rainfall data periodically received from the weather service every M time units (e.g. every 15 minutes) for the example wastewater asset as described with reference to figures 8a-8e and/or as herein described; or c) any other rainfall estimate received in relation to the example wastewater asset. The normalised wastewater measurements 902 may be stored as historical wastewater measurements for use in training one or more ML model for predicting wastewater flow through the corresponding wastewater asset. As well, the rainfall data or hyper-local rainfall estimates may be stored with the historical environmental data as historical rainfall data or historical hyperlocal rainfall data for use in training one or more ML models for predicting, for example, wastewater flow through wastewater asset. At the each time instance, the corresponding rainfall data for said each time instance can be input by the ML unit noa to a trained ML model (e.g., trained using an ML algorithm (e.g., regressor, XGBoost, AdaBoost, or gradient Boost) as described with reference to figures fa to 8e).
[00225] In this example, the training environmental and wastewater flow feature set used for training the ML model includes historical wastewater measurement data of the wastewater asset, and historical rainfall data associated with the wastewater asset. The ML model is trained to predict data representative of minimum and maximum wastewater thresholds 9o8a and 9o8b of the wastewater flow through the wastewater asset as described with reference to figures la to 8e. For example, one or more of the ML models described with reference to figures if to ii maybe trained based on the above training environmental and wastewater flow feature set. In this example, the ML model is trained to predict the minimum and maximum wastewater thresholds 908a and 908b in relation to rainfall. However, it is to be appreciated that one or more other types of corresponding water ingress data sets such as, for example, river levels, groundwater levels, flood water levels, tidal levels and/or other types of water ingress datasets may be added to the training environmental and wastewater flow feature set of wastewater asset for detecting water ingress and the like. Once the ML model is trained, the trained ML model is configured to process the rainfall data as input and, from processing this, predict data representative of minimum and maximum wastewater thresholds 9o8a and 9o8b for the next time instance as described herein.
[00226] The plot goo shows the predicted minimum and maximum wastewater thresholds 9o8a and 9o8b, respectively, represented as dashed lines dynamically changing over the 2-month time period as a function of the rainfall 904 affecting the example wastewater asset. It can be seen that in normal conditions, the predicted minimum and maximum wastewater thresholds 9o8a and 908b track the normalised wastewater measurements 902 based on rainfall 904. The plot goo represents a normal operating condition of the example wastewater asset when the normalised wastewater measurements 902 remain inside the predicted minimum and maximum wastewater thresholds 9o8a and 908b. A storm event 910 is illustrated in which a storm caused excess rainfall 904 resulting in an overflow condition 906 of the example wastewater asset. However, even though there was an overflow condition 906, this still represents normal behaviour of the example wastewater asset because the normalised wastewater measurements 902 remain inside the predicted minimum and maximum wastewater thresholds 9o8a and 9o8b. The plot goo shows that the ML model that was trained, as described with reference to figures ta-8e, for the example wastewater asset is capable of predicting minimum and maximum wastewater thresholds 9o8a and 908b for normal conditions and behaviour of the example wastewater asset in relation to rainfall affecting the example wastewater asset.
[002271 Figure ioa illustrates an example of ML wastewater management system of figure la for water ingress detection according to some embodiments of the invention. in this example a portion of the wastewater network 102 of ML wastewater management system 100 is shown and a small river 1002 with a river level sensor 1009a with designation E873o is illustrated. The wastewater asset 104m may be a wastewater pumping station or treatment works with sensor 108m. In this example, the operators of the wastewater network 102 suspect that water ingress may be affecting wastewater asset 1o4m, however, it is difficult to determine where the water ingress is coming from. From experience, the operators have deduced the water ingress may be coming from the river n6 or sea 118 and were planning some large remedial works to eliminate this. The small river 1002 had been largely built over and so was not an apparent water ingress source to the operator. In order to confirm whether the water ingress is coming from the river u6 via overflow mechanism/pipe 1o7a or sea via overflow mechanism/pipe 1o7b the ML water ingress detection apparatus no of wastewater management system too was applied in relation to wastewater asset 1o4m using various combinations of environmental datasets including rainfall data associated with wastewater asset 104m, river level datasets from first and second river sensors 109a and 109b of river 116, where second river sensor 109b is designated E8790, tidal level datasets from sensors 109f and/or 109g, and river level datasets from sensor 1oo9a designated E873o of small river 1002. It is noted that the second river sensor 109b with designation E8790 may be affected by both river level and tidal flow in and out of river n6. In this example, the different types environment datasets associated with water ingress were collected from the level sensors 109a-109g and 1009a for analysis by ML water ingress detection apparatus no in relation to wastewater asset do8m.
[002281 For example, a plurality of training environmental and wastewater measurement feature sets were created for training a plurality of ML models 120m-a to 120111-s of ML model ensemble 120M in relation to wastewater asset 1o4m as described herein. For example, a first training environmental and wastewater measurement feature set included clean synchronised normalised wastewater measurement data measured by sensor 1o8m of wastewater asset 1o4m and corresponding hyper-local rainfall data associated with wastewater asset do4m. A second training environmental and wastewater measurement feature set included clean synchronised normalised wastewater measurement data measured by sensor 108m of wastewater asset 104m, corresponding hyper-local rainfall data associated with wastewater asset 1o4m, and corresponding river level dataset from second sensor 1o9b designated E8790 of river n6. A third training environmental and wastewater measurement feature set included clean synchronised normalised wastewater measurement data measured by sensor 1o8m of wastewater asset io4m, corresponding hyper-local rainfall data associated with wastewater asset 1o4m, and corresponding river level dataset from sensor mo9a designated E873o of small river 1002. Further training environmental and wastewater measurement feature sets including clean synchronised normalised wastewater measurement data measured by sensor 1o8m of wastewater asset 104m, corresponding hyper-local rainfall data associated with wastewater asset io4m and one or more of and/or combinations thereof may also be created. However, for simplicity, the focus of this example is on the first, second and third training environmental and wastewater measurement feature sets that include environmental datasets corresponding to: a) rainfall only; b) rainfall and second sensor 1o9b designated E8790 of river n6; and c) rainfall and river level dataset from sensor wo9a designated E873o of small river 1002; respectively.
[00229] Figure lob illustrates an example plot 1010 representing wastewater flow through a wastewater asset io4m, which is a wastewater treatment works pumping station, in the ML wastewater management system of figure ma according to some embodiments of the invention. On this plot lino, the x-axis represents time duration over a time period of 2 years from January 2020 to January 2022, and the first left y-axis Iowa represents normalised wastewater level or flow measurements 1012 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset to which the sensor is calibrated. On this plot lino, rainfall measurement data 1014 in relation to the example wastewater asset ao4m is also plotted and illustrated below the plot of normalised wastewater level or flow measurements 1012. The second left y-axis loilab of plot Imo for the rainfall measurement data 1014 represents rainfall in millimetres (mm). The normalised wastewater measurements 1012 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 1014 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service every M time units (e.g., every 15 minutes) for the example wastewater asset io4m as described with reference to figures 8a-8e and/or as herein described. The plot 1010 also illustrates that a capacity of mo% represents an overflow level 1016, whereby when the wastewater asset reaches l00% capacity wastewater may be redirected via an overflow mechanism or pipe away from the wastewater asset site io4m to prevent flooding and the like.
[0023o] The example wastewater asset io4m has first and second periods io17a and ini7b of normal operation in which the normalised wastewater level or flow measurements 1012 follows or reacts normally in relation to the rainfall measurement data 1014 operates relatively normally. However, the example wastewater asset 104m has first, second and third periods 1018a, 1018b and 1018c of prolonged periods of high levels during winter months in which the normalised wastewater level or flow measurements 1012 reaches the overflow level 1016 over each of the first, second and third periods 1n8a, ioi8b and ioi8c. This is abnormal wastewater level or flow because it is contrary to the rainfall measurement data 1014 over these first, second and third periods 1018a, 1018b and 1018c. It seems that water ingress to wastewater asset 104m is occurring over the winter months from another source causing the wastewater asset io4m to almost be in a permanent overflow condition over the first, second and third periods 1n8a, ioi8b and ioi8c.
[00231] Figure 10c illustrates an example scoring and ranking plot 1020 of multiple scored trained ML models 120m-a to izom-s of ML model ensemble izom for wastewater asset io4m of wastewater treatment works according to some embodiments of the invention. The ML unit noa of the ML water ingress detection apparatus no trained each of the ML models 12om-a to 120M-s of the ML model ensemble 120M based on at least the first, second, and third training environmental and wastewater measurement feature sets. In this example, the XGBoost regression ML algorithm was selected and various ML models for predicting mean wastewater flow, minimum threshold, maximum threshold of wastewater flow through wastewater asset io4m were trained over different XGBoost hyperparameters including XGBoost hyperparameter ranges for the number of estimators such as, without limitation, for example [100, 200, 400, 600] or any other suitable number of estimators value; XGBoost hyperparameter ranges for learning rates such as, without limitation, for example [0.1, 0.3, o.6, o.8] and/or any other suitable learning rate value etc.; XGBoost hyperparameters associated with the type of learner such as, for example a tree or a linear function; hyperparameter ranges for rainfall duration values (in days) may include, without limitation, for example [o, 1, 2, 3, 5, 10], or any other suitable rainfall duration value and the like. The ML water ingress detection unit noc calculates ML model prediction accuracy scores for each of the trained ML models izom-a to izom-s of the ML model ensemble 120m using the ML performance metrics mean absolute error (MAE) and root mean squared error (RMSE), where a low score indicates improved ML prediction accuracy. Thus, the lower RMSE and MAE the better the scores are and the better the prediction accuracy of the corresponding trained ML model. The best ML model of the ML model ensemble 120M of wastewater asset is the ML model 120m-a, which was trained using the third training environmental and wastewater feature set, which included the rainfall for wastewater asset io4m and the river level dataset from sensor ioo9a designated E873o of small river 1002. The ML model izom-a was trained to predict minimum wastewater threshold for wastewater asset 1o4m. The ML model izom-a produced an RMSE score of 18.8 and MAE of 10.09, which clearly shows a strong correlation between the wastewater asset 1o4m and the river level dataset from sensor ioo9a designated E873o of small river 1002. The ML water ingress detection unit noc selects the topmost performing ML model 12oma, i.e. the ML model having the best ML prediction accuracy, and analyses the third training environmental and wastewater feature set used to train the ML model izom-a, which reveals that the water ingress dataset is the river level dataset from sensor 1oo9a designated E873o of small river 1002. The ML water ingress unit noc may then an output of data representative that wastewater asset 104m is affected by water ingress from small river 1002 and not from river 116 or sea 118 and the like.
[00232] Figure Ind illustrates an example river level flow plots 1030 and 1040 for river 116 and small river 1002, respectively, of figure ma confirming the water ingress detection of ML water ingress detection unit noc in relation to water ingress to the wastewater asset 1o4m at wastewater treatment works of ML wastewater management system 100 according to some embodiments of the invention. On the plot 1030, the x-axis represents time duration over a time period of 1.5 years from January 2020 to July 2021, and the left y-axis represents river level measurements 1032 (e.g., Gauge) of river level sensor for river n6 designated E879o. On this plot 1030, the river level measurements 1032 does not seem to correlate fully with at least the first and second periods 1oi8a and ioi8b of prolonged periods of high levels during winter months illustrated in figure lob. On the plot 1040, the x-axis represents time duration over a time period of 1.5 years from January 2020 to July 2021, and the left y-axis represents river level measurements 1042 (e.g., Gauge) of river level sensor for small river 1002 designated E873o. On this plot 1040, the river level measurements 1042 do seem to correlate with at least the first and second periods ioi8a and ioi8b of prolonged periods of high levels during winter months illustrated in figure lob. As can be seen, the small river 1002 seems to only flow during the winter, which provides further evidence that the water ingress to wastewater asset 104m is from the small river 1002, not river n6, thus river water from the small river 1002 is somehow ending up in wastewater asset 104m, which may indicate an incorrect connection of one or more wastewater pipes, a hole in one or more wastewater pipes in the area of wastewater asset 104m and/or any other cause of water ingress from small river 1002 to wastewater asset m4m. This is how we targeted this stream as the source of the water ingress issue with the pumping station. The water ingress detection nit noc may notify the operator of the wastewater network 102 or ML wastewater management system 100 to identify the water ingress affecting wastewater asset 104m is the small river 1002, in which maintenance personnel may be directed to attend the site of the wastewater asset 104m and/or wastewater assets nearby to investigate where that portion of the wastewater network 102 comes close to the small river 1002 and eliminate this water ingress.
[00233] Figure we illustrates an example plot 105o of the output of a first ML model derived from ML models 12om-k and 120m-1 trained against river source n6 in relation to wastewater flow through wastewater asset1o4m of wastewater treatment works of ML wastewater management system of figure ma. On this plot 1050, the x-axis represents time duration over a time period over several months, and the first left y-axis 1050a represents normalised wastewater level or flow measurements 1052 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset to which the sensor is calibrated. On this plot 1050, rainfall measurement data 1054 in relation to the example wastewater asset 104m is also plotted and illustrated below the plot of normalised wastewater level or flow measurements 1052. The second left y-axis 1050b of plot 1050 for the rainfall measurement data 1054 represents rainfall in millimetres (mm). The normalised wastewater measurements 1052 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 1054 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service every M time units (e.g., every 15 minutes) for the example wastewater asset io4m as described with reference to figures 8a-8e and/or as herein described. The plot 1050 also illustrates that a capacity of Km% represents an overflow level 1056, whereby when the wastewater asset reaches l00% capacity wastewater may be redirected via an overflow mechanism or pipe away from the wastewater asset site 104m to prevent flooding and the like.
[00234] In this example, as seen from figure 10c, the first ML model used is based on trained ML model 120m-k configured for predicting maximum wastewater threshold 1o58b of wastewater flow through wastewater asset 1o4m and trained ML model 12om-1 configured for predicting minimum wastewater threshold 1658a of wastewater flow through wastewater asset 104m. Both ML models 120m-k and 120m-1 were trained on the second training environmental and wastewater measurement feature set, which includes clean synchronised normalised wastewater measurement data 1052 measured by sensor 1o8m of wastewater asset 104m, corresponding hyper-local rainfall data 1054 associated with wastewater asset 104m, and corresponding river level dataset from second sensor 160 designated E8790 of river 116. As can be seen in the plot 105o, the first ML model fails to correlate with the maximum or minimum normalised wastewater level or flow measurements 1052. Thus, further confirms that an first ML model based on ML models 12om-k and 120m-1 that were trained on the second training environmental and wastewater measurement feature set including river level dataset from second sensor 160 designated E8790 of river n6 has a poor prediction accuracy in relation to wastewater flow through wastewater asset 1o4m.
[00235] Figure lof illustrates an example plot io6o of the output of a second ML model derived from ML models 12om-a and 120m-b trained against small river source 1002 in relation to wastewater flow through wastewater asset 104m of the wastewater treatment works of ML wastewater management system 100 of figure ma. On this plot 106o, the x-axis represents time duration over a time period over several months, and the first left y-axis 1o6oa represents normalised wastewater level or flow measurements 1062 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset to which the sensor is calibrated. On this plot 1060, rainfall measurement data 1064 in relation to the example wastewater asset 104m is also plotted and illustrated below the plot of normalised wastewater level or flow measurements 1062. The second left y-axis 106ob of plot 1060 for the rainfall measurement data 1064 represents rainfall in millimetres (mm). The normalised wastewater measurements 1062 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 1064 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service every /1/ time units (e.g., every 15 minutes) for the example wastewater asset 1o4m as described with reference to figures 8a-8e and/or as herein described. The plot 1060 also illustrates that a capacity of l00% represents an overflow level 1066, whereby when the wastewater asset reaches 100% capacity wastewater may be redirected via an overflow mechanism or pipe away from the wastewater asset site 104m to prevent flooding and the like.
[430236] In this example, as seen from figure toc, the second ML model used is based on trained ML model 120m-b configured for predicting maximum wastewater threshold 1o68b of wastewater flow through wastewater asset 104m and trained ML model 120m-a configured for predicting minimum wastewater threshold 1068a of wastewater flow through wastewater asset 104m. Both ML models 120m-a and 120m-b were trained on the third training environmental and wastewater measurement feature set, which includes clean synchronised normalised wastewater measurement data 1062 measured by sensor 108m of wastewater asset 104m, corresponding hyper-local rainfall data 1064 associated with wastewater asset 104m, and corresponding river level dataset from river level sensor 1009a designated E8730 of small river 1002. As can be seen in the plot 1060, the ML model has a good correlation with the maximum or minimum normalised wastewater level or flow measurements 1062. Thus, further confirming that an ML model based on ML models 120m-a and 120m-b that were trained on the third training environmental and wastewater measurement feature set including river level dataset from river sensor 1009a designated E8730 of small river 1002 has a good prediction accuracy in relation to wastewater flow through wastewater asset 104m. This further confirms that the water ingress to the wastewater asset 104m is from the small river 1002.
[430237] Figure na illustrates another example scoring and ranking 1100 of another example ML model ensemble 120m for wastewater asset 104m of wastewater treatment works of ML wastewater management system 100 of figure toa when trained, at another period of time, against different types of possible water ingress datasets according to some embodiments of the invention. in this example, the operators of the wastewater network 102 suspect that another type of water ingress may be affecting wastewater asset 104m. In order to confirm where the water ingress is coming from the ML water ingress detection apparatus no of wastewater management system 100 was applied in relation to wastewater asset 104m using various combinations of environmental datasets including rainfall data associated with wastewater asset 104m, river level datasets from first and second river sensors 109a and 109b of river n6, where second river sensor logb, tidal level datasets from sensors 109f and/or 109g, and river level datasets from sensor 1oo9a of small river 1002. In this example, the different types environment datasets associated with water ingress were collected from the level sensors 109a-109g and 1009a for analysis by ML water ingress detection apparatus 110 in relation to wastewater asset 104m.
[00238] For example, a plurality of training environmental and wastewater measurement feature sets were created for training a plurality of ML models 120m-a to 120M-S of ML model ensemble 120M in relation to wastewater asset 104m as described herein. For simplicity, a first training environmental and wastewater measurement feature set included clean synchronised normalised wastewater measurement data measured by sensor 108m of wastewater asset 104m and corresponding hyper-local rainfall data associated with wastewater asset tozim. A second training environmental and wastewater measurement feature set included clean synchronised normalised wastewater measurement data measured by sensor 1o8m of wastewater asset 1o4m, corresponding hyper-local rainfall data associated with wastewater asset 104m, and corresponding tidal level dataset from tidal sensor 109f of sea 118. Further training environmental and wastewater measurement feature sets including clean synchronised normalised wastewater measurement data measured by sensor io8m of wastewater asset 104m, corresponding hyper-local rainfall data associated with wastewater asset 1o4m and one or more of and/or combinations thereof may also be created. However, for simplicity, the focus of this example is on the first and second training environmental and wastewater measurement feature sets that include environmental datasets corresponding to: a) rainfall only; and b) rainfall and tidal level sensor 109f of sea n8; respectively.
[00239] The ML unit noa of the ML water ingress detection apparatus no trained each of the ML models 120 m-a to 120M-s of the ML model ensemble 120m based on at least the first and second training environmental and wastewater measurement feature sets. in this example, the XGBoost regression ML algorithm was selected and various ML models for predicting mean wastewater flow, minimum threshold, maximum threshold of wastewater flow through wastewater asset 104m were trained over different XGBoost hyperparameters in a similar manner as herein described with reference to figure 10c and the like. The ML water ingress detection unit noc calculates ML model prediction accuracy scores for each of the trained ML models 120m-a to 120111-s of the ML model ensemble 120M using the ML performance metrics mean absolute error (MAE) no2a and root mean squared error (RMSE) 11o2b, where a low score indicates improved ML prediction accuracy. Thus, the lower RMSE and MAE the better the scores are and the better the prediction accuracy of the corresponding trained ML model. The best ML model of the ML model ensemble 120M of wastewater asset is the ML model 120m-a having the lowest RMSE and MAE scores 1102b and 11o2a, respectively, which was trained using the second training environmental and wastewater feature set, which included the rainfall for wastewater asset io4m and the tidal level dataset from tidal level sensor 1o9f of sea 118. The ML model nom-a was trained to predict a mean wastewater threshold for wastewater asset io4m. The ML model nom-a produced the lowest RMSE score 11o2b and MAE score no2a of the ML model ensemble noi, which illustrates a correlation between the wastewater asset io4m and the tidal level dataset from sensor 109f, indicating tidal water may be entering overflow mechanism 1o7b and flowing to wastewater asset io4m. The ML water ingress detection unit riot selects the topmost performing ML model nom-a, i.e. the ML model having the best ML prediction accuracy, and analyses the second training environmental and wastewater feature set used to train the ML model nom-a, which reveals that the water ingress dataset is the tidal level dataset from tidal sensor iogf of sea n8. The ML water ingress unit noc may then output data representative of an indication that wastewater asset 1o4m is affected by water ingress from tidal waster of sea n8 near overflow mechanism 1o7b.
[00240] Figure nb illustrates an example plot 1110 of the output of a first ML model derived from ML model mom-a of figure na trained against rainfall data (e.g. before the tidal sensor station 1o9f was added to the training environmental and wastewater feature set) in relation to wastewater flow 1112 through wastewater asset io4m of wastewater treatment works of ML wastewater management system of figure ma. On this plot 1110, the x-axis represents time duration over a time period over 1 week, and the first left y-axis inoa represents normalised wastewater level or flow measurements 1112 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset 104m to which the sensor is calibrated. On this plot 111o, rainfall measurement data 1114 in relation to the example wastewater asset io4m is also plotted and illustrated below the plot of normalised wastewater level or flow measurements 1112. The second left y-axis mob of plot ino for the rainfall measurement data 1114 represents rainfall in millimetres (mm). The normalised wastewater measurements 1112 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 1114 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service every Mtime units (e.g., every 15 minutes) for the example wastewater asset io4m as described with reference to figures 8a-8e and/or as herein described. The plot 1110 also illustrates that a capacity of 100% represents an overflow level 1116, whereby when the wastewater asset reaches 100% capacity wastewater may be redirected via an overflow mechanism io7b or pipe away from the wastewater asset site io4m to prevent flooding and the like.
[00241] In this example, as seen from figure nb, the first ML model used is based on trained ML model 120m-a configured for predicting minimum and maximum wastewater thresholds 1118a and 1118b of wastewater flow through wastewater asset 104m, which may be based on any suitable ML structure as described with reference to figures la to iof. For example, the hyperparameters used to build the trained ML model 120m-a, which was configured to predict mean wastewater flow through wastewater asset io4a may also be used to build the first ML model for predicting minimum and maximum wastewater thresholds 1118a and 1118b of wastewater flow through wastewater asset 104m. The first ML model was trained on the first training environmental and wastewater measurement feature set, which includes clean synchronised normalised wastewater measurement data 1112 measured by sensor 1o8m of wastewater asset 1o4m and corresponding hyper-local rainfall data 1114 associated with wastewater asset to4m. As can be seen in the plot 111o, the first ML model fails to correlate with the maximum normalised wastewater level or flow measurements 1112 as can be seen, for example, by the normalised wastewater level or flow measurement peaks 1112a, 1112b, 1112C, 1112d, 1112e and 1112f overshooting the predicted maximum wastewater threshold 1118b. This, further confirms that a first ML model based on ML models 120m-a that was trained on the first training environmental and wastewater measurement feature set including only rainfall datasets associated with wastewater asset 104m has a poor prediction accuracy in relation to wastewater flow through wastewater asset 104m.
[00242] Figure nc illustrates an example plot 1120 of the output of a second ML model derived from ML model 120m-a of figure na trained against rainfall data and tidal data after tidal sensor station io9f was added to the training environmental and wastewater feature set in relation to wastewater flow 1122 through wastewater asset 104m of wastewater treatment works of ML wastewater management system of figure 10a. On this plot 1120, the x-axis represents time duration over a time period over 1 week, and the first left y-axis 1120a represents normalised wastewater level or flow measurements 1122 (e.g., Sewer Level Measurements (SLM)), normalised as a capacity percentage in relation to the capacity of the example wastewater asset io4m to which the sensor is calibrated. On this plot 1120, rainfall measurement data 1124 in relation to the example wastewater asset to4m is also plotted and illustrated below the plot of normalised wastewater level or flow measurements 1122. The second left y-axis 112ob of plot 1120 for the rainfall measurement data 1124 represents rainfall in millimetres (mm). The normalised wastewater measurements 1122 may be normalised as described with reference to operation 202 of figure 2 and/or as described herein. The rainfall measurement data 1124 may comprise hyper-local rainfall data estimated from rainfall data periodically received from a weather service every M time units (e.g., every 15 minutes) for the example wastewater asset to4m as described with reference to figures 8a-8e and/or as herein described. The plot 112o also illustrates that a capacity of t00% represents an overflow level 1126, whereby when the wastewater asset reaches t00% capacity wastewater may be redirected via an overflow mechanism 1o7b or pipe away from the wastewater asset site to4m to prevent flooding and the like.
[00243] In this example, as seen from figure nc, the second ML model used is based on trained ML model l2om-a configured for predicting minimum and maximum wastewater thresholds 1128a and 1128b of wastewater flow through wastewater asset to4m, and may be based on the same MT. structure and hyperparameters used for the first ML model of figure rib but which was trained using the second training environmental and wastewater measurement feature set, which includes clean synchronised normalised wastewater measurement data 1122 measured by sensor 1o8m of wastewater asset to4m, corresponding hyper-local rainfall data 1124 associated with wastewater asset 1o4m, and a tidal dataset from tidal sensor/station to6f. As can be seen in the plot 1120, the second ML model correlates normalised wastewater measurements 1122 in which it can be seen that the predicted maximum wastewater threshold 1128b tracks the maximum normalised wastewater level or flow measurements 1122 as can be seen, for example, by the predicted maximum wastewater threshold 1128b tracking the normalised wastewater level or flow measurement peaks 1122a, 1122b, 1122c, 1122d, 1122e and 1122f. This, further confirms that the second ML model based on ML models 12°m-a that was trained on the second training environmental and wastewater measurement feature set including the tidal dataset of tidal sensor/station 1o9f has the best prediction accuracy in relation to wastewater flow through wastewater asset 1o4m. This further confirms that the water ingress to the wastewater asset 1o4m is from a tidal source, e.g. sea n8, near tidal station/sensor 109f.
[00244] Figure 12 illustrates a schematic example of a computing system/apparatus 1200 for performing any of the methods, operations or processes described herein and/or for implementing any of the systems, units and/or apparatus as described herein. The computing system/apparatus 1200 shown is an example of a computing device or platform. It will be appreciated by the skilled person that other types of computing devices/systems/platforms may alternatively be used to implement the methods described herein, such as a distributed computing system.
[00245] The apparatus (or system) 1200 comprises one or more processors 1202 (e.g., CPUs). The one or more processors 1202 control operation of other components of the system/apparatus 1200. The system/apparatus 1200 may be part of a computing device, computing system, distributed computing system, cloud computing platform and the like for implementing the functionality of the systems/apparatus and/or one or more methods/operations/processes as described herein. The one or more processors 1202 may, for example, comprise a general-purpose processor. The one or more processors 1202 may be a single core device or a multiple core device. The one or more processors 1202 may comprise a Central Processing Unit (CPU) or a graphical processing unit (CPU). Alternatively, the one or more processors 1202 may comprise specialized processing hardware, for instance a RISC processor or programmable hardware with embedded firmware. Multiple processors may be included. In some embodiments, the one or more processors 1202 may be part of a distributed computing system such as a cloud computing system and/or cloud computing platform.
[00246] The system/apparatus comprises memory system or memory 1204 including a working or volatile memory 1206. The one or more processors may access the volatile memory 1206 in order to process data and may control the storage of data 1207 in memory. The volatile memory 1206 may comprise RAM of any type, for example, Static RAM (SRAM), Dynamic RAM (DRAM), or it may comprise Flash memory, such as an SD-Card. In some embodiments, the memory 1204 and/or one or more volatile memories 1206 may comprise a multiple of a plurality of memory 1204 forming part of the distributed computing system such as the cloud computing system and/or cloud computing platform and the like.
[00247] The system/apparatus comprises a non-volatile memory 1208. The non-volatile memory 1208 may store a set of operation or operating system instructions 1209a for controlling the operation of the processors 1202 in the form of computer readable instructions and/or software instructions 1209b in the form of computer readable instructions, which when executed on the one or more processors 1202 cause the processors to implement the methods, processes, operations and/or functionality of the ML water ingress detection apparatus, ML models and/or water ingress described herein. The non-volatile memory 1208 may be a memory of any kind such as a Read Only Memory (ROM), a Flash memory, SD drive, a magnetic drive memory or magnetic disc drive memory and the like as the application demands. In some embodiments, the non-volatile memory 1208 may comprise a multiple of a plurality of non-volatile memory 1208 forming part of the distributed computing system such as the cloud computing system and/or cloud computing platform and the like.
[00248] The one or more processors 1202 are configured to execute operating instructions 1209a and/or software instructions 1209b to cause the system/apparatus to perform any of the methods or processes described herein. The operating instructions 1209a may comprise code (i.e., drivers) relating to the hardware components of the system/apparatus 1200, as well as code relating to the basic operation of the system/apparatus 1200. Generally speaking, the one or more processors 1202 execute one or more instructions of the operating instructions 1209a and/or software instructions 1209b, which are stored permanently or semi-permanently in the non-volatile memory1208, using the volatile memory 1206 to store temporarily data generated during execution of said operating instructions 1209a and/or software instructions 1209b.
[00249] The one or more processors 1202 may be connected to a network interface 1208 including a transmitter (TX) and a receiver (RX) for communicating over a network with other apparatus and systems such as wastewater assets, wastewater network, wastewater network management systems, environmental data measurement services and/or operators and/or any other apparatus, service, system and/or device as the application demands. The one or more processors 1202 may, optionally, be connected with a user interface (UI) 1210 for user or operator input for instructing or using the computing system and/or for outputting data therefrom. The one or more processors 1202 may, optionally, be connected with a display 1212 for displaying output to a user or operator. The at least one processor 1202, with the at least one memory 1204 and the computer program code 1209a, 1209b are arranged to cause the computing system 1200 to at least perform at least the operations, methods, and/or processes, for example as disclosed in relation to the schematic diagrams, flow diagrams or operations as described with any of figures la to uc and related features thereof.
[00250] FIG. 13 shows a non-transitory media 1300 according to some embodiments. The non-transitory media 1300 may include a computer readable storage medium 1302 and/or input/output mechanism 1304 for enabling a computing system 1200 to access said computer-readable medium 1302. Although in this example the non-transitory media is USB stick, this is by way of example only and the invention is not so limited, the skilled person would appreciate the non-transitory media 1300 may be any other type of computer readable media or medium such as, for example, a CD, a DVD, a USB stick, a blue ray disk, flash drive etc. and/or any other computer readable media as the application demands. The non-transitory media 130o stores computer program code, causing an apparatus to perform one or more of the methods, operations, processors of any preceding process for example as disclosed in relation to the flow diagrams and schematic diagrams of figures la to 12 and related features thereof.
[00251] Implementations of the methods or processes described herein may be realized as in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These may include computer program products (such as software stored on e.g., magnetic discs, optical disks, memory, Programmable Logic Devices) comprising computer readable instructions that, when executed by a computer, such as that described in relation to Figure 12, cause the computer to perform one or more of the methods described herein.
[00252] Any system feature as described herein may also be provided as a method or process feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure. in particular, method aspects may be applied to system aspects, and vice versa.
[00253] Furthermore, any, some and/or all features in one aspect can be applied to any, some and/or all features in any other aspect, in any appropriate combination. It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.
[00254] Although several embodiments have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles of this disclosure, the scope of which is defined in the claims and their equivalents.
Claims (40)
- Claims 1. A computer-implemented method for detecting water ingress at a wastewater asset of a wastewater network, the method comprising: training an ensemble of ML models for the wastewater asset, each ML model configured for predicting wastewater flow through the wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics; selecting a trained ML model from the ensemble of trained ML models with the best score; identifying the training environmental and wastewater flow feature set used to train the selected trained ML model; detecting one or more types of water ingress for the wastewater asset based on whether the identified training environmental and wastewater flow feature set is associated with one or more types of water ingress.
- 2. The computer-implemented method as claimed claim 1, wherein each of the plurality of training environmental and wastewater flow feature sets includes wastewater flow data associated with the wastewater asset, rainfall data associated with the wastewater asset and a unique set of one or more water ingress datasets from a plurality of water ingress datasets.
- 3. The computer-implemented method as claimed in claim 2, wherein each set of water ingress datasets comprising at least one or more from the group of: river level data; tidal data; ground water level data; flood water level data; any other type of environmental data affecting wastewater flow through a wastewater asset; any other type of environmental data associated with water ingress affecting 35 wastewater flow through a wastewater asset; and any other environmental data external to the wastewater asset affecting the flow through the wastewater asset.
- 4. The computer-implemented method as claimed in any preceding claim, wherein the one or more types of water ingress comprising at least one from the group of: river water ingress; tidal water ingress; ground water ingress; flood water ingress; and any other type of water ingress affecting wastewater flow through a wastewater asset.
- 5. The computer-implemented method as claimed in any preceding claim, wherein scoring each trained ML model and selecting a trained ML model further comprising: measuring prediction accuracy for each trained ML model of the ensemble of trained ML models; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics associated with measurement prediction accuracy; ranking all the trained ML models based on the scoring; and selecting the highest or topmost ranked trained ML model.
- 6. The computer-implemented method as claimed in any preceding claim, wherein scoring each trained ML model and selecting a trained ML model further comprising: scoring and ranking each of the trained ML models of the ensemble of trained ML models based on root mean squared error and mean squared error; selecting the highest or topmost ranked trained ML model.
- 7. The computer-implemented method as claimed in any preceding claim, wherein the training environmental and wastewater flow feature set comprises rainfall data associated with the wastewater asset, wherein the rainfall data associated with the wastewater asset is calculated based on a combination of first rainfall data corresponding to a first rainfall area the wastewater asset is located within, and one or more other rainfall data corresponding to rainfall areas adjacent to the first rainfall area.
- 8. The computer-implemented method as claimed in claim 6, wherein calculating the rainfall data associated with the wastewater asset further comprising: calculating a hyper-local rainfall data at the location of the wastewater asset based a weighted combination of the first rainfall estimate and the one or more other rainfall data in relation to the location of the wastewater asset within the first rainfall area and the relative location of the wastewater asset to each of the one or more other rainfall areas.
- 9. The computer-implemented method as claimed in claim 7, further comprising calculating the hyper-local rainfall data associated with the wastewater asset further comprising performing at least one of: a multivariate interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a three-dimensional interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a tri-linear interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; a tri-cubic interpolation in relation to at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas; or any other numerical, estimation or interpolation method for estimating the hyper-local rainfall data at the location of the wastewater asset based on at least the first rainfall data and the one or more other rainfall data and location of the wastewater asset in relation to the first and other rainfall areas.
- The computer-implemented method as claimed in claims 8 or 9, wherein calculating the hyper-local rainfall estimate associated with the wastewater asset based on an interpolation method further comprising: dividing a rainfall grid area in which the wastewater asset is located within into quadrants; identifying the grid area quadrant of the rainfall grid area that the wastewater asset is located within; selecting at least three rainfall grid areas adjacent to the identified grid area quadrant the wastewater asset is located within; calculating a rectangle formed from the centers of the at least three rainfall grid areas and the rainfall grid area the wastewater asset is located within, wherein the wastewater asset is located within said rectangle; projecting the location of the wastewater asset onto each of the line segments or edges of the rectangle based on orthogonally projecting lines from the wastewater asset to each line segment or edge to form intersection locations on each line segment or edge for estimating intersection rainfall dataset estimates for said line segments or edges; calculating, for each line segment or edge of the rectangle, an intersection rainfall estimate dataset based a linear interpolation using distances between each center of the grid areas corresponding to said each line segment or edge and the intersection location for said each line segment and the corresponding rainfall datasets associated with said centers the grid areas; calculating, for each projection line, an intermediate rainfall estimate dataset based a linear interpolation using distances between each pair of intersection locations on said each projection line and said wastewater asset and the corresponding intersection estimate rainfall datasets associated with said intersection locations on said each projection line; and calculating a hyperlocal rainfall dataset for said wastewater asset based on averaging the intermediate rainfall estimate datasets.
- 11. The computer-implemented method as claimed in any preceding claim, wherein training each ML model of the ensemble of ML models further comprising performing training of said each ML model based on using an ML algorithm to train model parameters defining said each ML model for predicting mean wastewater flow through the wastewater asset based on the corresponding training environmental and wastewater flow feature set, wherein the corresponding training environmental and wastewater flow feature set comprises data representative of corresponding historical wastewater measurement data for the wastewater asset and historical environmental data comprising either: a) a historical rainfall data associated with the wastewater asset; or b) both historical rainfall data associated with the wastewater asset and one or more water ingress datasets.
- 12. The computer-implemented method as claimed in claim 11, further comprising normalising the historical wastewater measurement data based on the maximum and minimum capacity of the wastewater asset.
- 13. The computer-implemented method as claimed in claim 11 or 12, further comprising processing the timeseries normalised historical wastewater measurement data for the wastewater asset to be synchronised with either: a) the timeseries rainfall data of the historical environmental data associated with the wastewater asset; or b) the timeseries rainfall data of the historical environmental data associated with the wastewater asset and one or more timeseries water ingress datasets of the historical environmental data associated with the wastewater asset.
- 14. The computer-implemented method as claimed in any of claims 11 to 13, wherein training further comprising: performing hyperparameter tuning using the ML algorithm based on training a plurality of sets of ML models using different combinations of hyperparameters, each set of ML models comprising a mean ML model, wherein: the mean ML model is trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data and/or water ingress data as input; scoring and ranking each of the trained ML models of the plurality of sets of ML models based on root mean squared error and mean squared error; selecting the best ranked trained ML model; and generating the final trained ML model for predicting means wastewater flow 25 using the selected mean trained ML model.
- 15. The computer-implemented method as claimed in any of claims 1 to 9, wherein training each ML model of the ensemble of ML models further comprising performing training of said each ML model based on using an ML algorithm to train model parameters defining said each ML model for predicting minimum and maximum thresholds associated with wastewater flow through the wastewater asset based on the corresponding training environmental and wastewater flow feature set comprising data representative of historical wastewater measurement data for the wastewater asset and historical environmental data comprising either: a) a historical rainfall data associated with the wastewater asset; or b) both historical rainfall data associated with the wastewater asset and one or more water ingress datasets.
- 16. The computer-implemented method as claimed in claims 15, further comprising normalising the historical wastewater measurement data based on the maximum and minimum capacity of the wastewater asset.
- 17. The computer-implemented method as claimed in claim 15 or 16, further comprising processing the timeseries normalised historical wastewater measurement data for the wastewater asset to be synchronised with either: a) the timeseries rainfall data of the historical environmental data associated with the wastewater asset; orb) the timeseries rainfall data of the historical environmental data associated with the wastewater asset and one or more timeseries water ingress datasets of the historical environmental data associated with the wastewater asset.
- 18. The computer-implemented method as claimed in any of claims 15 to 17, wherein training further comprising: performing hyperparameter tuning using the ML algorithm based on training a plurality of sets of ML models using different combinations of hyperparameters, each set of ML models comprising a mean ML model, a minimum ML model and a maximum ML model, wherein: the mean ML model is trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data as input; the minimum ML model is trained and configured for predicting the time series minimum values in the normalised historical wastewater measurement data based on at least rainfall data as input; and the maximum ML model is trained and configured for predicting the time series maximum values in the normalised historical wastewater measurement data based on at least rainfall data as input; scoring and ranking each of the trained ML models of the plurality of sets of ML models based on root mean squared error and mean squared error; selecting the best ranked trained ML model; selecting the corresponding minimum and maximum trained ML models from the set of ML models that the selected best ranked trained ML model belongs; generating the final trained ML model for predicting minimum and maximum 35 wastewater thresholds based on using the selected minimum and maximum trained ML models.
- 19. The computer-implemented method as claimed in any of claims 15 to 17, wherein training further comprising: performing hyperparameter tuning of the ML algorithm based on training a 5 plurality of ML models using different combinations of hyperparameters associated with the ML algorithm and training dataset, wherein each comprises a mean ML model trained and configured for predicting the time series mean values in the normalised historical wastewater measurement data based on at least rainfall data as input; scoring and ranking each of the trained mean ML models of the plurality of ML models based on root mean squared error and mean squared error model performance metrics; selecting the best ranked trained mean ML model; using the hyperparameters of the selected best ranked trained mean ML model to generate a corresponding minimum and maximum trained ML models, wherein: the minimum ML model is trained and configured for predicting the time series minimum values in the normalised historical wastewater measurement data based on at least rainfall data as input; and the maximum ML model is trained and configured for predicting the time series maximum values in the normalised historical wastewater measurement data based on at least rainfall data as input; generating the final trained ML model for predicting minimum and maximum wastewater thresholds based on using the corresponding minimum and maximum trained ML models.zo.
- The computer-implemented method as claimed in any of claims 15 to 17, wherein the hyperparameters associated with the training dataset include a set of rainfall data time windows, wherein each rainfall data time window corresponds to, for each current rainfall data instance, inputting during training or inference the current rainfall data instance and a plurality of preceding rainfall data instances within said each rainfall data time window.
- 21. The computer-implemented method as claimed in any of claims 16 to 20, wherein the historical rainfall data is a timeseries dataset with a time interval M between datapoints, and the historical wastewater measurement data is a timeseries 35 dataset with a time interval N between datapoints, where M> =N, further comprising generating a synchronised historical wastewater measurement dataset that forms a timeseries dataset with a time interval M between datapoints based on calculating the mean, minimum and maximum for each i-th datapoint from those datapoints of the historical wastewater measurement data falling between the (i-r)-th datapoint and the i-th datapoint within said each time interval M, wherein the training dataset comprises the mean, minimum and maximum values of the synchronised historical wastewater measurement dataset.
- 22. The computer-implemented method as claimed in claim 21, further comprising: performing a first data clean-up of the normalised synchronised historical wastewater measurement dataset based on: performing statistical analysis of the normalised synchronised historical wastewater measurement dataset for identifying blocks of outlier datapoints; generating a first clean wastewater measurement dataset based on removing the identified outlier datapoints from the normalised synchronised historical wastewater measurement dataset; and generating a first rainfall dataset based on removing the corresponding rainfall datapoints associated with the identified outlier datapoints from the historical rainfall data; performing second data clean-up of the first clean wastewater measurement dataset based on: performing further statistical analysis to analyse long and short-term average behaviour of the first clean wastewater measurement dataset for identifying, based on a ruleset, inaccurate of discontinuous measurement data for interpolation or removal; generating a second clean wastewater measurement dataset based on filtering the identified measurement data using interpolation or removal; and generating a second rainfall dataset based on removing the corresponding rainfall datapoints associated with the removed datapoints from the first clean wastewater measurement dataset from the historical rainfall dataset; performing a third data clean-up of the second clean wastewater measurement dataset based on: identifying from the second clean wastewater measurement dataset exclusion events comprising one or more of: a) blockage and sensor fault events; b) rainfall events; c) dry weather events; and/or d) other feature events causing noisy or spurious data; generating a clean wastewater measurement dataset based on removing the blockage and sensor fault events and other feature events causing noise or spurious data from the second clean wastewater measurement dataset; and generating a clean rainfall dataset based on removing the corresponding rainfall datapoints associated with the removed identified outlier datapoints from the historical rainfall data; and generating the training dataset based on the clean wastewater measurement dataset and the clean third rainfall dataset.
- 23. The computer-implemented method as claimed in claim 22, further comprising generating a dry weather dataset for the wastewater asset based on removing identified rainfall events from the clean wastewater measurement dataset.
- 24. The computer-implemented method as claimed in claim 22, further comprising: training a dry weather ML model based on using an ML algorithm to train model parameters defining the dry weather ML model for predicting minimum and maximum dry weather thresholds associated with wastewater flow through the wastewater asset for use in water ingress detection based on a training dry weather dataset comprising data representative of the generated dry weather dataset; and training a wet weather ML model based on using the ML algorithm to train model parameters defining the wet weather ML model for predicting minimum and maximum wet weather thresholds associated with wastewater flow through the wastewater asset for use in water ingress detection based on a training dataset comprising data representative of the clean wastewater measurement dataset and the clean third rainfall dataset associated with the wastewater asset; forming a trained ML model based on the trained dry weather ML model and trained wet weather ML model, wherein the trained ML model is configured to predict minimum and maximum wastewater thresholds, where the predicted minimum wastewater threshold comprises a combination of the predicted minimum dry weather threshold and the predicted minimum wet weather threshold, and the predicted maximum wastewater threshold comprises a combination of the predicted maximum dry weather threshold and the predicted maximum wet weather threshold.
- 25. The computer-implemented method as claimed in claims 22 or 23, performing statistical analysis of the normalised synchronised historical wastewater measurement dataset for identifying blocks of outlier datapoints further comprising: generating a histogram dispersion graph for the normalised synchronised historical wastewater measurement dataset; identifying the outlier blocks, if any, in the histogram dispersion graph based on comparing the histogram dispersion graph with an ideal histogram data pattern associated with the wastewater asset; generating the first clean wastewater dataset based on removing any identified outlier blocks from the normalised synchronised historical wastewater measurement dataset.
- 26. The computer-implemented method as claimed in any of claims 15 to i8, wherein the ML algorithm comprising at least one from the group of: regression learning algorithm; neural network; extreme gradient boost regressor algorithm; Adaptive Boosting algorithm; Gradient boosting algorithm; any other statistical classification meta-algorithm; any other ML algorithm suitable for training model parameters of an ML model for tracking the behaviour of wastewater flow through a wastewater asset and for predicting data representative of a minimum wastewater threshold and maximum 20 wastewater threshold for said wastewater asset.
- 27. The computer-implemented method as claimed in any preceding claim, wherein the ML algorithm comprises a regression learning algorithm based on one or more of: extreme gradient boost regressor algorithm; Adaptive Boosting algorithm; Gradient boosting algorithm; any other statistical classification meta-algorithm, boosting algorithm or regression algorithm suitable for training model parameters of an ML model for tracking the behaviour of wastewater flow through a wastewater asset and for predicting data representative of a minimum wastewater threshold and maximum wastewater threshold for said wastewater asset.
- 28. The computer-implemented method according to any preceding claim, further comprising: performing water ingress detection at each of a plurality of wastewater assets of the wastewater network according to the computer-implemented of any preceding 5 claim; determining a set of wastewater assets connected together in a daisychain having a correlation above a threshold correlation with said each type of water ingress; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.
- 29. The computer-implemented method as claimed in claim 33, wherein pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress further comprising: determining a set of wastewater assets connected together in a daisychain; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.3o.
- The computer-implemented method as claimed in claim 33, wherein the set of wastewater assets in the daisychain comprise a first wastewater asset upstream of the other wastewater assets in the daisychain and a last wastewater asset in the daisychain downstream of all the other wastewater assets in the daisychain.
- 31. A water ingress detection apparatus for detecting water ingress at one or more of a plurality of wastewater assets of a wastewater network, the water ingress detection apparatus comprising: a water ingress machine learning, ML, unit configured for: training an ensemble of ML models for each wastewater asset, each ML model configured for predicting wastewater flow through said each wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets associated with said each wastewater asset, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; scoring each trained ML model of the ensemble of ML models based on one or more ML model performance metrics; selecting a trained ML model from the ensemble of trained ML models having the best score; and identifying the training environmental and wastewater flow feature set used to train the selected trained ML model for said each wastewater asset; a water ingress detection unit configured for: detecting one or more types of water ingress for said each wastewater asset based on whether the identified training environmental and wastewater flow feature set for said each wastewater asset is associated with one or more types of water ingress.
- 32. The water ingress detection apparatus as claimed in claim 31, wherein the ML unit and detection unit are configured for implementing the corresponding steps of the computer-implemented method as claimed in any of claims i to 3o.
- 33. The water ingress detection apparatus as claimed in claims 31 or 32, wherein the water detection apparatus further configured for: performing water ingress detection at each of a plurality of wastewater assets of the wastewater network by applying the water ingress ML detection unit to each wastewater asset; determining for each wastewater asset of the plurality of wastewater assets the correlation of the type of water ingress affecting said each wastewater asset; and for each type of water ingress detected in the plurality of wastewater assets, pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress.
- 34. The water ingress detection apparatus as claimed in claim 33, the water detection apparatus further comprising a pinpointing analysis unit configured for: pinpointing one or more areas associated one or more wastewater assets having the highest correlation for said each type of water ingress by: determining a set of wastewater assets connected together in a daisychain having a correlation above a threshold correlation with said each type of water ingress; ranking the correlation of the type of water ingress at each of the wastewater assets in the set of wastewater assets in the daisychain; and selecting the wastewater asset having the highest correlation with the type of water ingress within the set of wastewater assets in the daisychain.
- 35. The water ingress detection apparatus as claimed in claim 34, wherein the set of wastewater assets in the daisychain comprise a first wastewater asset upstream of the other wastewater assets in the daisychain and a last wastewater asset in the daisychain downstream of all the other wastewater assets in the daisychain.
- 36. An apparatus comprising one or more processors, a memory and a communication interface, the one or more processors connected to the memory and communication interface, wherein the apparatus is configured to implement the computer-implemented method of any of claims 1 to 3o.
- 37. A wastewater management system comprising: a wastewater network comprising a plurality of wastewater assets, wherein each wastewater asset comprises a sensor for measuring data representative of wastewater passing through said each wastewater asset; and an water ingress detection apparatus according to any of claims 31 to 35; wherein: the water ingress detection apparatus is configured for detecting water ingress at one or more wastewater assets of the plurality of wastewater assets of said wastewater network.
- 38. A computer-readable medium comprising data or instruction code, which when executed on a processor, causes the processor to implement the computer-implemented method of any of claims 1 to 3o.
- 39. A machine learning model configured for predicting wastewater flow for a wastewater asset of a wastewater network given rainfall data and/or one or more water ingress data as input and obtained according to the computer-implemented method of any of claims i to 30.
- 40. A non-transitory tangible computer-readable medium comprising data or instruction code, which when executed on one or more processor(s), causes at least one of the one or more processor(s) to perform the steps of the method of: detecting water ingress at a wastewater asset of a wastewater network, the 5 method further comprising: training an ensemble of ML models for the wastewater asset, each ML model configured for predicting wastewater flow through the wastewater asset, wherein each ML model is trained based on a different training environmental and wastewater flow feature set from a plurality of training environmental and wastewater flow feature sets, wherein at least one of the plurality of training environmental and wastewater flow feature sets is associated with one or more types of water ingress; measuring prediction accuracy for each trained ML model of the ensemble of trained ML models; selecting a trained ML model from the ensemble of trained ML models having the best prediction accuracy; identifying the training environmental and wastewater flow feature set used to train the selected trained ML model; detecting one or more types of water ingress for the wastewater asset based on whether the identified training environmental and wastewater flow feature set is associated with one or more types of water ingress.
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2301386.5A GB2628750A (en) | 2023-01-31 | 2023-01-31 | Water ingress detection in wastewater networks |
| GBGB2304280.7A GB202304280D0 (en) | 2023-01-31 | 2023-03-23 | Water ingress detection in wastewater networks |
| EP24702996.0A EP4659174A1 (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
| PCT/EP2024/052411 WO2024160916A1 (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
| AU2024216142A AU2024216142A1 (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
| GB2401284.1A GB2628881A (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2301386.5A GB2628750A (en) | 2023-01-31 | 2023-01-31 | Water ingress detection in wastewater networks |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202301386D0 GB202301386D0 (en) | 2023-03-15 |
| GB2628750A true GB2628750A (en) | 2024-10-09 |
Family
ID=85476638
Family Applications (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2301386.5A Pending GB2628750A (en) | 2023-01-31 | 2023-01-31 | Water ingress detection in wastewater networks |
| GBGB2304280.7A Ceased GB202304280D0 (en) | 2023-01-31 | 2023-03-23 | Water ingress detection in wastewater networks |
| GB2401284.1A Pending GB2628881A (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
Family Applications After (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GBGB2304280.7A Ceased GB202304280D0 (en) | 2023-01-31 | 2023-03-23 | Water ingress detection in wastewater networks |
| GB2401284.1A Pending GB2628881A (en) | 2023-01-31 | 2024-01-31 | Water ingress detection in wastewater networks |
Country Status (1)
| Country | Link |
|---|---|
| GB (3) | GB2628750A (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150112647A1 (en) * | 2013-03-14 | 2015-04-23 | Trifecta Global Infrastructure Solutions Ltd. | Systems and methods for advanced sanitary sewer infrastructure management |
| GB2618171A (en) * | 2022-09-20 | 2023-11-01 | Stormharvester Ipr Ltd | Anomaly detection in wastewater networks |
-
2023
- 2023-01-31 GB GB2301386.5A patent/GB2628750A/en active Pending
- 2023-03-23 GB GBGB2304280.7A patent/GB202304280D0/en not_active Ceased
-
2024
- 2024-01-31 GB GB2401284.1A patent/GB2628881A/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150112647A1 (en) * | 2013-03-14 | 2015-04-23 | Trifecta Global Infrastructure Solutions Ltd. | Systems and methods for advanced sanitary sewer infrastructure management |
| GB2618171A (en) * | 2022-09-20 | 2023-11-01 | Stormharvester Ipr Ltd | Anomaly detection in wastewater networks |
Also Published As
| Publication number | Publication date |
|---|---|
| GB202301386D0 (en) | 2023-03-15 |
| GB2628881A (en) | 2024-10-09 |
| GB202401284D0 (en) | 2024-03-13 |
| GB202304280D0 (en) | 2023-05-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Dong et al. | Bayesian modeling of flood control networks for failure cascade characterization and vulnerability assessment | |
| GB2618171A (en) | Anomaly detection in wastewater networks | |
| Hill et al. | Real-time Bayesian anomaly detection for environmental sensor data | |
| CN102884407B (en) | System and method for monitoring resources in a water utility network | |
| Laucelli et al. | Detecting anomalies in water distribution networks using EPR modelling paradigm | |
| US20210088369A1 (en) | Blockage detection using machine learning | |
| Agonafir et al. | Understanding New York City street flooding through 311 complaints | |
| Hiroi et al. | FloodEye: Real-time flash flood prediction system for urban complex water flow | |
| EP4590903A1 (en) | Anomaly detection for wastewater assets with pumps in wastewater networks | |
| CN118351659A (en) | Pipeline blockage early warning method and system based on GPS positioning and wireless sensor network | |
| JP7795450B2 (en) | Inflow prediction system | |
| Goodarzi et al. | A machine learning approach for predicting and localizing the failure and damage point in sewer networks due to pipe properties | |
| Aziz et al. | Wastewater flooding risk assessment for coastal communities: Compound impacts of climate change and population growth | |
| GB2507184A (en) | Anomaly event classification in a network of pipes for resource distribution | |
| Savic et al. | Intelligent urban water infrastructure management | |
| EP4659174A1 (en) | Water ingress detection in wastewater networks | |
| Strauss et al. | Predictive maintenance of stormwater infrastructure using internet-of-things technology | |
| GB2628750A (en) | Water ingress detection in wastewater networks | |
| Li et al. | Exploring Cost-effective Implementation of Real-time Control to Enhance Flooding Resilience against Future Rainfall and Land Cover Changes | |
| Ebtehaj et al. | Early detection of river flooding using machine learning for the Sain-Charles river, Quebec, Canada | |
| Post et al. | Quantifying the effect of proactive management strategies on the serviceability of gully pots and lateral sewer connections | |
| Armon et al. | Algorithmic network monitoring for a modern water utility: a case study in Jerusalem | |
| Hutton et al. | Apparent seasonal bias in delta outflow estimates as revealed in the historical salinity record of the San Francisco Estuary: Implications for delta net channel depletion estimates | |
| Rosin | Data Analytics for Automated Near Real Time Detection of Blockages in Smart Wastewater Systems | |
| Wu | Whole Life Cost Modelling For Railway Drainage Systems Including Uncertainty |