Data reduction system and method based on dynamic correlation analysis of data of Internet of things
Technical Field
The invention belongs to the technical field of the Internet of things, and particularly relates to a data reduction system and method based on dynamic correlation analysis of data of the Internet of things.
Background
In the environment of the internet of things, applications such as environment monitoring, event monitoring, intelligent medical treatment and the like are commonly finished by using multi-source heterogeneous sensors in a cooperative mode, wherein the multi-source time series data collected by the sensors usually have a correlation relationship, and the multi-source time series data can be used for data missing completion and disaster tolerance and fault tolerance of service. Two sensor devices placed in the same location may carry highly correlated information. For example, in the application scene of the intelligent desk lamp, the infrared sensing sensor and the acoustic wave sensor can generate information related to the infrared sensing sensor, and the purpose of reducing sampling of the acoustic wave sensor can be achieved by predicting the reading of the acoustic wave sensor through the infrared sensing sensor to a certain extent.
Existing studies mostly deal with the problems of missing, complement, and reduction of sensor data based on the spatio-temporal correlation of sensors, such as modeling and predicting the data of an environmental sensor network through spatial correlation, and combining the data from multiple neighboring nodes to estimate lost data of the nodes. Or the time correlation of the sensor is utilized to complete data reduction, and a prediction model is established through nuclear linear regression. In the communication of the sensor node with the receiver, only the regression coefficients are transmitted and no actual measurement values are transmitted.
At present, a data reduction method for correlation analysis of a plurality of sensor data focuses more on space-time correlation among the sensor data, and has no adaptability to dynamic change of correlation rules among the sensor data. In some environment monitoring scenes of the internet of things, data acquired by one sensor may have the same or opposite change directions with other data, and meanwhile, the change of the data presents a certain periodic characteristic. Ignoring the periodic and dynamic correlation changes will make the data reduction algorithm unable to adapt to the dynamic internet of things environment, resulting in reduced reduction rate and accuracy.
Disclosure of Invention
Based on the above problems, the invention provides a data reduction system and method based on dynamic correlation analysis of data of the Internet of things, which performs dynamic correlation analysis on multidimensional data values, and dynamically constructing the correlation relation among the data of the plurality of sensors according to the real-time change of the data correlation, so that the values of certain sensors are predicted according to the relation among the plurality of sensors with correlation, and the reduction purpose of reducing the transmission rate of the predicted sensors is achieved.
The invention provides a data reduction system based on dynamic correlation analysis of data of the Internet of things, which comprises an environment change detection module, a correlation set construction module, a substitution relation generation module, a sensor substitution model solving module and a substitution relation dynamic adjustment module, wherein the environment change detection module is used for detecting the correlation set;
The environment change detection module is used for receiving sensor data perceived by the Internet of things and detecting concept drift;
The correlation set construction module is used for dynamically constructing a multi-sensor correlation set subSi according to the change of the correlation of the sensor data so as to maximize the data reduction rate;
the substitution relation generation module is used for determining substitution relation among the sensors according to the correlation degree of the single sensor and the rest sensors and the rest energy, constructing a sensor substitution evaluation function and determining an optimal substitution sensor;
The sensor substitution model solving module is used for judging whether to construct a reconstruction function of the sensor data according to the number of the sensors in the set subSi and is used for optimizing a substitution model of the sensor;
the substitution relation dynamic adjustment module is used for adjusting substitution relation among a plurality of sensors according to the data correlation change so as to adapt to fluctuation of the sensor relation.
The invention provides a data reduction method based on the dynamic correlation analysis of data of the Internet of things, which is realized based on a data reduction system based on the dynamic correlation analysis of the data of the Internet of things, and comprises the following steps:
Step 1, uniformly loading sensing data of the Internet of things into an environment change detection module;
step 2, constructing a correlation set according to the received sensing data of the Internet of things, and grouping the sensors to form a multi-sensor correlation set subSi;
Step 3, constructing a sensor substitution evaluation function according to the multi-sensor related relation set subSi and facing the energy constraint to select an optimal substitution sensor;
Step 4, solving a sensor substitution model for the data in the multi-sensor correlation set subSi to obtain a data reconstruction function of the optimal substitution sensor;
And 5, when the sensor attribute k in a certain correlation set subSi no longer has a correlation with other attributes in subSi, updating the correlation and regenerating a substitution relation among multiple sensors, returning to the step 1 and performing conceptual drift detection, if the multiple attributes in the correlation set subSi have conceptual drift, the edge end needs to detect whether the error of the current multiple linear regression prediction model exceeds a predefined error, if so, the step 4 is returned to update the data reconstruction function, otherwise, the current multiple linear regression prediction model operates by the current data reconstruction function.
The step 2 comprises the following steps:
step 2.1, constructing a correlation matrix A according to a Pearson phase relation table among a plurality of sensors;
Where a ij represents a Pearson correlation coefficient between the i-th sensor and the j-th sensor, i=1, 2,..n, j=1, 2,..n;
step 2.2, setting the triangle element under the matrix A as 0, and setting the Pearson correlation value smaller than the set threshold value as 0;
step 2.3, forming a multisensor correlation set subSi according to the correlation among the sensors, wherein the steps are as follows:
step 2.3.1, repeatedly selecting the maximum value max (a ij) from the matrix A, and judging whether the row i and the column j corresponding to the max (a ij) are added into the preparation set D, wherein the specific expression is as follows:
① If neither i nor j is in the preliminary set D, adding the doublet < i, j > to the preliminary set D;
② If only one of i and j is in the preparation set D, determining a binary group < j, x > or < i, x > formed by the sensor i or j and the other sensor x, adding the binary group < i, j, x > into the correlation subset V, and setting the i row and the j row in the matrix A to 0;
③ If i and j are both in the preparation set D, a ij in the matrix A is set to 0;
step 2.3.2, traversing all elements in the matrix A until all elements in the matrix A are 0;
Step 2.3.3, adding all tuples in the preliminary set D to the multisensor related set subSi.
The step 3 comprises the following steps:
Step 3.1, respectively performing normalization processing on a matrix A of Pearson correlation coefficients among a plurality of sensors in a correlation relation set subSi by using a formula (3) and a matrix E formed by the residual electric quantity of the sensors by using a formula (4);
wherein e i represents the remaining power of the sensor i;
step 3.3 construct a surrogate evaluation function WAAw for sensor i:
Wherein Data i=(ai1,ai2,...,aij...,ain,ei) is a set of internet of things sensing Data about sensor i, w= (w 1,w2,…wn,wn+1) is a weight vector of Data set (a i1,ai2,...,ain,ei);
And 3.4, according to the sensor replacement evaluation function, taking the sensor corresponding to the maximum function value as the optimal replacement sensor.
The step 4 comprises the following steps:
step 4.1, if the number of the sensors in the correlation set subSi is 1, the sensor substitution model cannot be solved, and the step 4 is exited;
Step 4.2, if the number of sensors in the correlation relation set SubSi is 2, taking out another sensor x in subSi, obtaining data of the sensor x in the current window, solving a sensor substitution model through linear regression, and exiting the step 4;
And 4.3, if the number of the sensors in the correlation relation set SubSi is greater than 2, taking out the remaining sensors x 1,x2 in subSi, acquiring data of the remaining sensors in the current window, and solving a sensor substitution model through multiple linear regression.
The step 5 comprises the following steps:
step 5.1, if no concept drift is detected in the current sensor data from the sensor end to the edge end, continuing to reduce the data without adjusting the alternative relation, and exiting step 5;
Step 5.2, if the concept drift occurs in the data of the s sensors, finding out the correlation sets subSi to which the s sensors belong respectively;
and 5.3, detecting whether the correlation in the correlation set subSi.2 is changed or not through a correlation solving mode of the step 2, returning to the step 2 if the correlation is changed, detecting whether the difference between the current data reconstruction value and the actual data value of the sensor is larger than a threshold error or not if the difference is larger than the threshold error, and returning to the step 3.
The beneficial effects of the invention are as follows:
The invention provides a data reduction system and a data reduction method based on dynamic correlation analysis of data of the Internet of things, which dynamically adjust reduction rate according to a data change rule, can improve the data reduction rate on the basis of ensuring the accuracy of data reconstruction when facing the environment of the Internet of things with the substitution relation among sensors, so as to reduce bandwidth and energy consumption, and can adapt to the semantic change of the substitution relation of the sensors by dynamically adjusting the substitution relation and a substitution model of the sensors when facing the environment of the Internet of things with the dynamic Internet of things, so as to maintain high accuracy of data reconstruction and improve the data reduction rate as much as possible.
Drawings
Fig. 1 is a schematic diagram of a data reduction system based on dynamic correlation analysis of internet of things data in the present invention.
Fig. 2 is a flow chart of a data reduction method based on the dynamic correlation analysis of the data of the internet of things in the invention.
FIG. 3 is a schematic diagram of a correlation analysis process between sensors according to the present invention.
Fig. 4 is a schematic diagram of construction of a preparation set D according to the present invention, wherein (a) is a schematic diagram of adding a row i and a column j to the preparation set D, and (b) is a schematic diagram of adding a row i or a column j to the preparation set D.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples of specific embodiments.
A data reduction system based on the dynamic correlation analysis of the data of the Internet of things comprises an environment detection module, a correlation set construction module, a substitution relation generation module, a sensor substitution model solving module and a substitution relation dynamic adjustment module;
The environment change detection module is used for receiving sensor data perceived by the Internet of things and detecting concept drift;
The correlation set construction module is used for dynamically constructing an optimal sensor correlation set subSi to maximize the data reduction rate according to the change of the sensor data correlation, and the correlation among a plurality of sensor data is easy to change due to the unpredictability of the environment. When fewer sensors are selected as one group, the correlation is relatively stable, and the alternate relationship phase between the sensor data and the sensor group based on the sensor correlation are easily updated, so that the sensors having a high correlation are added to the same correlation set subSi.
The sensor replacement evaluation module is used for constructing a sensor replacement evaluation function to determine an optimal replacement sensor according to the replacement relation between the single sensor and the remaining sensors, wherein the replacement relation generation module integrates the correlation degree between the single sensor and the remaining sensors and the remaining energy as consideration factors to determine the replacement relation between the sensors. Considering that in the correlation set subSi, the correlations of each sensor with other sensors are also different, and the energy consumed by each sensor is different, when the sensor power weight is determined, the optimal replacement sensor is determined by the sensor replacement evaluation function.
The sensor surrogate model solving module is used for judging whether to construct a reconstruction function of sensor data according to the number of sensors in the set subSi for constructing a surrogate model of an optimal sensor, and for each correlation set subSi, firstly calculating the optimal surrogate sensor y in the correlation set subSi and acquiring data about y. Then, the number of elements (i.e., sensors) in subSi is determined, if only one element is present in subSi, the data reconstruction function cannot be constructed, if two elements are present in subSi, the simple linear regression data reconstruction function is constructed, and if more than two elements are present in subSi, the multiple linear regression data reconstruction function is constructed.
The substitution relation dynamic adjustment module is used for adjusting substitution relation among a plurality of sensors according to the data correlation change so as to adapt to fluctuation of the sensor relation. Considering the unpredictability of the environment of the internet of things, whether the correlation exists among the plurality of sensor data and the change of the correlation intensity of the plurality of sensor data in the current state need to be judged in time. If the correlation between the multi-sensor data changes, the update of the sensor substitution relation is triggered and the data reconstruction function between the multi-sensors is regenerated. If only the strength of the correlation relationship of the plurality of sensors changes, only the regression prediction model is updated.
As shown in fig. 1, in the reduction process, the sensors of a plurality of different attributes are divided into a plurality of correlation sets subSi (where i=1, 2,..n), and when the sensor attribute k in one correlation set subSi no longer has a correlation with other attributes in subSi, the correlation needs to be updated and the substitution relationship among the multiple sensors needs to be regenerated. If the plurality of attributes in the correlation relation set subSi have conceptual drift at this time, the edge needs to detect whether the error of the current multiple linear regression prediction model is large, and if the error exceeds a predefined error, the multiple linear regression data reconstruction function needs to be regenerated.
As shown in fig. 2, a data reduction method based on dynamic correlation analysis of data of the internet of things is implemented based on the data reduction system based on dynamic correlation analysis of data of the internet of things, and the method comprises the following steps:
Step 1, uniformly loading sensing data of the Internet of things into an environment change detection module;
step 2, constructing a correlation set according to the received sensing data of the internet of things, grouping the sensors to form a multi-sensor correlation set subSi, wherein the specific expression is as follows, as shown in fig. 3:
step 2.1, constructing a correlation matrix A according to a Pearson phase relation table among a plurality of sensors;
where a ij represents the Pearson correlation coefficient between the i-th sensor and the j-th sensor, i=1, 2,..,
j=1,2,...,n;
Step 2.2, setting the triangle element under the matrix A as 0, and setting the Pearson correlation value smaller than the set threshold value as 0;
step 2.3, forming a multisensor correlation set subSi according to the correlation among the sensors, wherein the steps are as follows:
step 2.3.1, repeatedly selecting the maximum value max (a ij) from the matrix A, and judging whether the row i and the column j corresponding to the max (a ij) are added into the preparation set D, wherein the specific expression is as follows:
① If neither I nor J is in the preliminary set D, then the doublet < I, J > is added to the preliminary set D, FIG. 4 (a) is a schematic illustration of both rows I, columns J being added to the preliminary set D, and if neither sensor J nor I is in the preliminary set D, then both sensors J and I are added to the preliminary set D as a doublet, as shown in FIG. 4 (a).
② If only one of I and J is in the preliminary set D, then the triplet < J, x > or < I, x > of the sensor I or J and the other sensor x is determined, then the triplet < I, J, x > is added to the correlation subset V, the I row and J column in the matrix A are set to 0, FIG. 4 (b) is a schematic diagram of the addition of the I row or J column in the preliminary set D, and if the sensor J is not in the preliminary set D, the sensors I, J and H are added as a triplet to the preliminary set D if the triplet of sensors I and H is present in the preliminary set D.
③ If i and j are both in the preparation set D, a ij in the matrix A is set to 0;
step 2.3.2, traversing all elements in the matrix A until all elements in the matrix A are 0;
Step 2.3.3, adding all tuples in the preliminary set D to the multisensor related set subSi.
Step 3, constructing a sensor substitution evaluation function to select an optimal substitution sensor according to the multi-sensor related relation set subSi, wherein the method specifically comprises the following steps:
Step 3.1, respectively performing normalization processing on a matrix A of Pearson correlation coefficients among a plurality of sensors in a correlation relation set subSi by using a formula (3) and a matrix E formed by the residual electric quantity of the sensors by using a formula (4);
wherein e i represents the remaining power of the sensor i;
step 3.3 construct a surrogate evaluation function WAAw for sensor i:
Wherein Data i=(ai1,ai2,...,aij...,ain,ei) is a set of internet of things sensing Data about sensor i, w= (w 1,w2,…wn,wn+1) is a weight vector of Data set (a i1,ai2,...,ain,ei);
And 3.4, according to the sensor replacement evaluation function, taking the sensor corresponding to the maximum function value as the optimal replacement sensor.
And 4, solving a sensor substitution model for the data in the multi-sensor correlation set subSi to obtain a data reconstruction function of the optimal substitution sensor, wherein the method is specifically expressed as follows:
step 4.1, if the number of the sensors in the correlation set subSi is 1, the sensor substitution model cannot be solved, and the step 4 is exited;
Step 4.2, if the number of sensors in the correlation relation set SubSi is 2, taking out another sensor x in subSi, obtaining data of the sensor x in the current window, solving a sensor substitution model through linear regression, and exiting the step 4;
And 4.3, if the number of the sensors in the correlation relation set SubSi is greater than 2, taking out the remaining sensors x 1,x2 in subSi, acquiring data of the remaining sensors in the current window, and solving a sensor substitution model through multiple linear regression.
Step 5, when the sensor attribute k in a certain correlation set subSi no longer has a correlation with other attributes in subSi, updating the correlation and regenerating the substitution relation among multiple sensors, returning to step 1 and performing conceptual drift detection, if the multiple attributes in the correlation set subSi have conceptual drift, the edge needs to detect whether the error of the current multiple linear regression prediction model exceeds a predefined error, if so, returning to step 4 to update the data reconstruction function, otherwise, running with the current data reconstruction function, and specifically expressed as:
step 5.1, if no concept drift is detected in the current sensor data from the sensor end to the edge end, continuing to reduce the data without adjusting the alternative relation, and exiting step 5;
Step 5.2, if the concept drift occurs in the data of the s sensors, finding out the correlation sets subSi to which the s sensors belong respectively;
and 5.3, detecting whether the correlation in the correlation set subSi.2 is changed or not through a correlation solving mode of the step 2, returning to the step 2 if the correlation is changed, detecting whether the difference between the current data reconstruction value and the actual data value of the sensor is larger than a threshold error or not if the difference is larger than the threshold error, and returning to the step 3.