Clustering method of ionizing radiation time sequence
Technical Field
The invention belongs to the field of data mining and big data processing, and relates to a clustering method of an ionizing radiation time sequence.
Background
With the rise of the application of the internet of things, more and more sensor data are accumulated, and how to analyze and process the data is an important subject faced by people to mine valuable information. These data are basically time series data, i.e. there is a strict chronological order, but the data at each time sampling point have different dimensions, for example, video is multi-dimensional matrix data of each frame, and each time sampling point of the temperature control sensor is continuous one-dimensional data (i.e. scalar quantity), etc. The ionizing radiation time series is the ionizing radiation values of local peripheral areas acquired by ionizing radiation sensors deployed in real environments (water, air and land), and mainly senses radiation intensities of gamma, neutron lines and the like. Ionizing radiation, which causes disturbances of the physiological metabolic functions of the human body by damaging cells to charge them, and electromagnetic radiation, which is harmless and causes at most thermal effects, are very weak and thus negligible in many cases, are different concepts.
The ionizing radiation value is a continuous one-dimensional time sequence, and the value of the ionizing radiation value fluctuates along with the influences of events such as rainfall, snowfall, perspective construction (namely, perspective equipment such as X-ray is used for detecting industrial pipeline facilities), hospital radiotherapy (including cardiac surgery and the like), even nuclear facility leakage and the like. How to automatically cluster and identify the events of the time series fluctuation values so as to distinguish the radiation fluctuation abnormal events is a very important application topic. The traditional time series clustering requires the length of sequence segments participating in clustering to be consistent, which limits the flexibility of application, and the forced use of equal-length preprocessing can cause distortion of a dynamic change mechanism in a sequence.
The time series of ionizing radiation peaks vary in duration, for example, the rise in radiation caused by precipitation must vary with the duration of the precipitation. Therefore, equal-length clustering techniques, such as k-means clustering in the time domain, gaussian mixture model, and other classical techniques, cannot be used for clustering of radiation peaks.
Disclosure of Invention
In order to overcome the defects of poor flexibility and poor accuracy of the conventional ionizing radiation time sequence clustering mode, the invention provides a clustering method of an ionizing radiation time sequence with good flexibility and high accuracy, which does not need to introduce isometric preprocessing and adopts a Gaussian mixture model technology of time sequence segments on a frequency domain to carry out automatic clustering.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a clustering method of ionizing radiation time series comprises the following steps:
1) cutting input ionizing radiation sequence data, and intercepting a peak wave band;
2) carrying out Fourier transformation on the detected peak segment, and replacing the peak segment with filling 0 if the length of the peak segment is not enough for the input length requirement of the Fourier transformation;
3) calculating a frequency spectrum according to the frequency domain coefficients, namely calculating a nonnegative value corresponding to each bin serial number, and obtaining a real part, an imaginary part square and a root opening number corresponding to each bin of the frequency domain coefficients;
4) the spectrum sections corresponding to the wave crest section sequences are all equal in length, namely the same bin number is contained; performing k-means clustering on the frequency spectrum sections until the frequency spectrum sections are converged in an iteration mode;
5) using the vector center and the variance vector of each class obtained by k-means clustering as the mean and the variance of each of the k Gaussian mixture models;
6) iterating the Gaussian mixture model by using an expected maximum algorithm until the Gaussian mixture model is converged;
7) and calculating the conditional probability of each frequency spectrum section belonging to the class i by using the converged Gaussian mixture model, wherein the class i is the ith branch of the Gaussian mixture model, and the branch with the maximum probability value belongs to the class number of the I, so that the class of the wave crest section of the input radiation time sequence corresponding to the frequency spectrum section is determined.
Further, in the step 1), a peak band is detected by using an adaptive threshold method.
The invention has the following beneficial effects: the method does not need to introduce isometric preprocessing, and adopts a Gaussian mixture model technology of time sequence segments on a frequency domain to perform automatic clustering; the flexibility is better, the accuracy is higher.
Drawings
Fig. 1 is a flowchart of a clustering method of an ionizing radiation time series.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a clustering method of an ionizing radiation time series includes the following steps:
1) cutting input ionizing radiation sequence data, and intercepting a peak wave band;
2) carrying out Fourier transformation on the detected peak segment, and replacing the peak segment with filling 0 if the length of the peak segment is not enough for the input length requirement of the Fourier transformation;
3) calculating a frequency spectrum according to the frequency domain coefficients, namely calculating a nonnegative value corresponding to each bin serial number, and obtaining a real part, an imaginary part square and a root opening number corresponding to each bin of the frequency domain coefficients;
4) the spectrum sections corresponding to the wave crest section sequences are all equal in length, namely the same bin number is contained; performing k-means clustering on the frequency spectrum sections until the frequency spectrum sections are converged in an iteration mode;
5) using the vector center and the variance vector of each class obtained by k-means clustering as the mean and the variance of each of the k Gaussian mixture models;
6) iterating the Gaussian mixture model by using an expected maximum algorithm until the Gaussian mixture model is converged;
7) and calculating the conditional probability of each frequency spectrum section belonging to the class i by using the converged Gaussian mixture model, wherein the class i is the ith branch of the Gaussian mixture model, and the branch with the maximum probability value belongs to the class number of the I, so that the class of the wave crest section of the input radiation time sequence corresponding to the frequency spectrum section is determined.
Further, in the step 1), a peak band is detected by using an adaptive threshold method.