Disclosure of Invention
The invention aims to provide a baby abnormal behavior detection method based on a condition generation countermeasure network CGAN, which is combined with a supervised SVM classification method to improve the accuracy of baby abnormal behavior detection.
A baby abnormal behavior detection method based on condition generation countermeasure network and SVM is characterized in that a training sample library required by target tracking is constructed in advance, the condition generation countermeasure network is utilized to track the four limbs and the whole body of a baby, the training sample library comprises the limbs and the whole body marked by the baby, motion trail information is extracted by utilizing wavelet approximate waveform and wavelet power spectrum analysis, and the characteristics of the motion trail information are classified by a support vector machine SVM, and the method comprises the following steps:
1.1, acquiring a baby video and carrying out unified preprocessing;
1.2, intercepting the baby video obtained in the step 1.1 for 15s, uniformly naming, and uniformly naming the images converted into frames;
1.3 tracking of baby motion trail: for the frame image obtained in the step 1.2, a confrontation network CGAN is generated by using conditions to track the four limbs and the whole body movement locus of the baby respectively, and the method specifically comprises the following steps:
1.3.1 constructing a training sample library required by target tracking: marking the left hand, the right hand, the left leg, the right leg and the whole body of the baby in the frame image obtained in the step 1.2, forming a training sample library by the marked limbs and the whole body of the baby as a target data set input to CGAN, and using the corresponding label as a condition Y;
1.3.2 generative model design: randomly dividing each frame of image containing the baby to be used as a pseudo target data set, and inputting the pseudo target data set and the condition Y into a discrimination model device through a convolution layer;
1.3.3 discriminant model design: sending the target data set and the condition Y into a discrimination model to judge limbs and the whole body, sending the pseudo target data set into a discriminator, and judging whether the target is a target or not;
1.3.4 judging whether the target is the target or not, calculating errors to enable the errors to accord with a formula, wherein the specific formula is as follows:
optimizing D:
maxDV(D,G)=Ex~pdata(x)[log(D(x))]+Ez~pz(z)[log(1-D(G(z)))]
optimizing G:
minGV(D,G)=Ez~pz(z)[log(1-D(G(z)))]
wherein: v (D, G) represents a loss function; pdata (x) is the true sample distribution; pz (z) is the pseudo-sample distribution; d (x) represents the real sample data in the discriminator; d (g (z)) represents pseudo sample data in the discriminator; e represents expectation;
performing model parameter adjustment according to the optimization conditions, wherein parameters of the generated model G and the discrimination model D are shared;
1.3.5 if the error is too large, feeding back the error to the input of the generation model, reconstructing a pseudo target data set, judging again until the positions of the four limbs and the whole body of the baby in the pseudo target data set are found, and recording the positions and the motion tracks of the left hand, the right hand, the left leg, the right leg and the whole body of the baby in each frame;
1.4, analyzing the motion trail information: the method specifically comprises the following steps of storing the position information of continuous y-axis coordinate change during movement of the four limbs and the whole body of the baby tracked in the step 1.3, and calculating the wavelet approximation waveform and the wavelet power spectrum of a continuous change waveform diagram formed by the position information of the y-axis coordinate, wherein the continuous change waveform diagram comprises the following steps:
1.4.1 because the coordinate change of the x axis is not obvious, only selecting the coordinate change diagram of the y axis for analysis, firstly, analyzing the approximate waveform of the wavelet, and analyzing the tracked waveform by using harr wavelet to obtain the approximate waveform of the wavelet;
1.4.2 for the y-axis coordinate change graphs of limbs and the whole body, power spectrum information is obtained by utilizing a power spectrogram based on wavelets;
1.5 extracting characteristic vectors from the obtained wavelet approximate oscillogram and wavelet power spectrogram, and training and learning by using a Support Vector Machine (SVM), specifically comprising the following steps:
1.5.1, dividing the sample into normal and abnormal samples for marking, and setting a normal sample label as 1 and an abnormal sample label as-1;
1.5.2 dividing the sample into a training set and a testing set, normalizing the data, and obtaining the highest accuracy by adjusting the values of parameters c and g in the SVM, thereby obtaining the optimal training model;
1.6 comprehensive judgment of abnormal behaviors of infants: according to the optimal training model obtained in the step 1.5.2, different weights are set for different accuracies, and weighting judgment is carried out, and the method specifically comprises the following steps:
1.6.1 for the SVM model trained from the wavelet approximation waveform obtained in step 1.4.1, different weight coefficients are set according to different accuracies of limbs and the whole body, specifically: left upper limb a 1: 0.35; right upper limb a 2: 0.01; left lower limb a 3: 0.2; right lower limb a 4: 0.35; whole body a 5: 0.09; the judgment result vectors of the four limbs and the whole body are respectively represented by Y1 to Y5, and the calculation formula is as follows:
Y1=(test label+predict label)/2
wherein: test label is the actual label of the test sample; the prediction label is a label predicted by the test sample; y2 through Y5 were calculated in the same manner as above;
the five resulting vectors are weighted, as follows:
Y=0.35*Y1+0.01*Y2+0.2*Y3+0.35*Y4+0.09*Y5
wherein: the multiplication operation is represented, Y is a judgment value predicted by the wavelet details, a judgment standard is defined, if-1 < Y < -0.3, the baby behavior is judged to be in an abnormal state, if 0.3< Y <1, the baby behavior is judged to be in a normal state, and the rest are all regarded as judgment error states;
1.6.2 for the SVM model trained from the wavelet power spectrum obtained in the step 1.4.2, different weight coefficients are set according to different accuracy of limbs and the whole body, specifically: left upper limb P1: 0.35; right upper limb P2: 0.01; left lower limb P3: 0.35; right lower limb P4: 0.2; whole body P5: 0.09; the judgment result vectors of the four limbs and the whole body are respectively expressed by X1 to X5, and the calculation formula is as follows:
X1=(test label+predict label)/2
wherein: test label is the actual label of the test sample; the prediction label is a label predicted by the test sample; x2 through X5 were calculated in the same manner as above;
the five resulting vectors are weighted, as follows:
X=0.35*X1+0.01*X2+0.35*X3+0.2*X4+0.09*X5
wherein: expressing multiplication, wherein X is a judgment value predicted by the wavelet power spectrum, a judgment standard is specified, if-1 is more than X and less than-0.3, the baby behavior is judged to be in an abnormal state, if 0.3 is more than X and less than 1, the baby behavior is judged to be in a normal state, and the rest are all considered to be in a judgment error state;
and comprehensively judging the X and the Y, and if the test sample at least meets one condition of the X and the Y, determining that the judgment result is correct, and distinguishing whether the behavior of the baby is normal.
The generative model design and discriminative model design of step 1.3.2 and step 1.3.3 specifically includes the following steps:
2.1 generative model design: wherein 6 layers of convolution layers are arranged, and the step length is set as 1; 6 layers of pooling layers, the size of the pooling window being 2 x 2; the network applies a corrected Linear unit relu (corrected Linear unit) activation function, which can obtain good results and faster convergence speed, and the specific operation formula is as follows:
F(Z)=σ(W*Z+b)
wherein: w is the convolution kernel; is a convolution operation; z is a feature vector; b is an offset; σ is a ReLU activation function;
2.2 discriminant model design: wherein 5 layers of convolution layers are arranged, and the step length is set as 1; 5 layers of pooling layers, the size of the pooling window being 2 x 2; applied in the network is a correcting Linear unit ReLU (rectified Linear Unit) activation function.
Step 1.4 the specific calculation of the wavelet approximation waveform map and the wavelet power spectrogram comprises the following steps:
3.1 analyzing the tracked oscillogram by harr wavelets, constructing a five-layer pyramid according to a Mallat pyramid decomposition algorithm of discrete wavelet transformation, extracting a wavelet approximation signal of a fifth layer, corresponding to four limbs and the whole body, and respectively recording as: abnormal left upper limb: a01; abnormal right upper limb: a02; abnormal left lower limb: a03; abnormal right lower limb: a04; abnormal whole body: a05; normal left upper limb: a11; normal right upper limb: a12; normal left lower limb: a13; normal right lower limb: a14; normal whole body: a15;
3.2 for the y-axis coordinate change graphs of limbs and the whole body, utilizing a wavelet-based power spectrogram, wherein the set sampling length is the video total frame length 375, the sampling frequency is 1000, the sampling interval is 1/1000, and the obtained power spectrograms are respectively recorded as: abnormal left upper limb: p01; abnormal right upper limb: p02; abnormal left lower limb: p03; abnormal right lower limb: p04; abnormal whole body: p05; normal left upper limb: p11; normal right upper limb: p12; normal left lower limb: p13; normal right lower limb: p14; normal whole body: p15.
For generative model design and discriminative model design, the method comprises the following steps:
step A1, generating model design: wherein 6 layers of convolution layers are arranged, and the step length is set as 1; 6 layers of pooling layers, the size of the pooling window being 2 x 2; the network applies a corrected Linear unit relu (corrected Linear unit) activation function, which can obtain good results and faster convergence speed, and the specific operation formula is as follows:
F(Z)=σ(W*Z+b)
wherein: w is the convolution kernel; is a convolution operation; z is a feature vector; b is an offset; σ is a ReLU activation function;
step A2, design of a discriminant model: wherein 5 layers of convolution layers are arranged, and the step length is set as 1; 5 layers of pooling layers, the size of the pooling window being 2 x 2; applied in the network is a correcting Linear unit ReLU (rectified Linear Unit) activation function.
Further, the specific calculation of the wavelet approximate oscillogram and the wavelet power spectrogram comprises the following steps:
step R1, analyzing the tracked oscillogram by harr wavelets, constructing a five-layer pyramid according to a Mallat pyramid decomposition algorithm of discrete wavelet transformation, extracting a fifth-layer wavelet approximate waveform corresponding to limbs and the whole body, and respectively recording as: abnormal left upper limb: a01; abnormal right upper limb: a02; abnormal left lower limb: a03; abnormal right lower limb: a04; abnormal whole body: a05; normal left upper limb: a11; normal right upper limb: a12; normal left lower limb: a13; normal right lower limb: a14; normal whole body: a15;
step B2, for the y-axis coordinate change diagrams of limbs and the whole body, utilizing a wavelet-based power spectrogram, wherein the set sampling length is the video total frame length 375, the sampling frequency is 1000, the sampling interval is 1/1000, and the obtained power spectrograms are respectively marked as: abnormal left upper limb: p01; abnormal right upper limb: p02; abnormal left lower limb: p03; abnormal right lower limb: p04; abnormal whole body: p05; normal left upper limb: p11; normal right upper limb: p12; normal left lower limb: p13; normal right lower limb: p14; normal whole body: p15.
The invention adopts a baby abnormal behavior detection method based on condition generation countermeasure network and SVM, firstly preprocessing the acquired baby video, then, a conditional generation countermeasure network CGAN is used for respectively tracking the target movement locus of the four limbs and the whole body of the baby in the video, the obtained movement locus information is stored, then the movement locus information is extracted by utilizing wavelet transformation, establishing a sample set for the extracted wavelet approximate waveform, training the sample set by using a set SVM (support vector machine), solving a power spectrum for motion trajectory information by using the wavelet to obtain a characteristic establishing sample set, training the sample set by using the set SVM support vector machine, testing two trained models, according to the difference of the accuracy of the two models, different weight parameters are set for weighting judgment, so that the optimal training result is obtained;
the invention combines the four limbs and the whole body information of the baby to detect the movement track, the information obtained by the detection is more comprehensive than the information obtained by single limb detection, the track tracking is more accurate by using CGAN network semi-supervised learning, and the characteristics have more abstract specificity by combining a wavelet domain and a power spectrum domain, meanwhile, an SVM support vector machine is used to classify the characteristics, the detection result is weighted and judged, the false detection rate is reduced, whether the baby behavior is abnormal or not is detected, the intervention is carried out as soon as possible, and the invention has important significance for preventing the diseases such as the cerebral palsy of the baby.
Detailed Description
The following describes the implementation process of the present invention with reference to the attached drawings.
An infant abnormal behavior detection method based on condition generation countermeasure network and SVM, which integrally realizes a flow, as shown in FIG. 1, and comprises the following steps:
1. and acquiring a baby video and carrying out unified preprocessing.
2. And (3) intercepting the baby video in the step (1) for 15s, uniformly naming, and uniformly naming the images converted into frames.
3. Tracking the motion trail of the baby: for the frame image obtained in step 2, a confrontation network CGAN is generated by using conditions to track the four limbs and the whole body movement locus of the infant, and a flow chart is shown in fig. 2, and specifically includes the following steps:
3.1 constructing a training sample library required by target tracking, marking the left hand, the right hand, the left leg, the right leg and the whole body of the baby in the frame image obtained in the step 2, forming the training sample library by the marked limbs and the whole body of the baby as a target data set input into the CGAN, and taking the corresponding label as a condition Y;
3.2 generative model design: randomly dividing each frame of image containing the baby to be used as a pseudo target data set, and inputting the pseudo target data set and the condition Y into a discrimination model device through a convolution layer;
3.3 discriminant model design: sending the target data set and the condition Y into a discrimination model to judge limbs and the whole body, sending the pseudo target data set into a discriminator, and judging whether the target is a target or not;
3.4 judging whether the target is the target or not, calculating the error to enable the error to accord with a formula, wherein the specific formula is as follows:
optimizing D:
maxDV(D,G)=Ex~pdata(x)[log(D(x))]+Ez~pz(z)[log(1-D(G(z)))]
optimizing G:
minGV(D,G)=Ez~pz(z)[log(1-D(G(z)))]
wherein: v (D, G) represents a loss function; pdata (x) is the true sample distribution; pz (z) is the pseudo-sample distribution; d (x) represents the real sample data in the discriminator; d (g (z)) represents pseudo sample data in the discriminator; e represents expectation.
The purpose of the formula is to minimize the error of the generated model to make the generated false target as true as possible, i.e. to find the target position as possible and to maximize the error of the discriminant model.
Performing model parameter adjustment according to the optimization conditions, wherein parameters of the generated model G and the discrimination model D are shared;
3.5 if the error is too large, feeding back the error to the input of the generation model, reconstructing the pseudo target data set, judging again until the positions of the limbs and the whole body of the baby in the pseudo target data set are found, as shown in fig. 3, recording the positions and the motion tracks of the left hand, the right hand, the left leg, the right leg and the whole body of the baby in each frame, wherein the image is a single frame of image tracked by the left upper limb of the baby.
4. Analyzing the motion track information: storing the position information of continuous y-axis coordinate change during movement (as shown in fig. 4) of the four limbs and the whole body of the baby tracked in the step 3, and calculating the wavelet approximation waveform and the wavelet power spectrum of a continuous change waveform diagram formed by the position information of the y-axis coordinate, specifically comprising the following steps:
4.1 because the coordinate change of the x axis is not obvious, only selecting the coordinate change diagram of the y axis for analysis, firstly, carrying out wavelet analysis, and analyzing the tracked oscillogram by using harr wavelets to obtain a wavelet approximate oscillogram;
and 4.2, solving power spectrum information by using a wavelet-based power spectrogram according to the y-axis coordinate change graphs of the limbs and the whole body.
5. Extracting feature vectors from the obtained wavelet approximate oscillogram and wavelet power spectrogram, and training and learning by using a Support Vector Machine (SVM), wherein the method specifically comprises the following steps:
5.1, dividing the sample into normal and abnormal samples for marking, setting a normal sample label as 1, and setting an abnormal sample label as-1;
5.2, dividing the sample into a training set and a testing set, normalizing the data, and obtaining the highest accuracy by adjusting the values of parameters c and g in the SVM, so as to obtain the optimal training model;
6. and (3) comprehensive judgment of abnormal behaviors of the infant: according to the optimal training model obtained in the step 5.2, different weights are set for different accuracies, and weighting judgment is carried out, and the method specifically comprises the following steps:
6.1 for the SVM model trained by the wavelet approximate waveform obtained in the step 4.1, different weight coefficients are set according to different accuracies of limbs and the whole body, specifically: left upper limb a 1: 0.35; right upper limb a 2: 0.01; left lower limb a 3: 0.2; right lower limb a 4: 0.35; whole body a 5: 0.09; the judgment result vectors of the four limbs and the whole body are respectively represented by Y1 to Y5, and the calculation formula is as follows:
Y1=(test label+predict label)/2
wherein: test label is the actual label of the test sample; the prediction label is a label predicted by the test sample; y2 through Y5 were calculated in the same manner as above;
the five resulting vectors are weighted, as follows:
Y=0.35*Y1+0.01*Y2+0.2*Y3+0.35*Y4+0.09*Y5
wherein: and (4) multiplication operation is performed, Y is a judgment value predicted by the wavelet details, a judgment standard is defined, if-1 < Y < -0.3, the baby behavior is judged to be in an abnormal state, if 0.3< Y <1, the baby behavior is judged to be in a normal state, and the rest are all considered to be in a judgment error state.
6.2 for the SVM model trained from the wavelet power spectrum obtained in the step 4.2, different weight coefficients are set according to different accuracy of limbs and the whole body, specifically: left upper limb P1: 0.35; right upper limb P2: 0.01; left lower limb P3: 0.35; right lower limb P4: 0.2; whole body P5: 0.09; the judgment result vectors of the four limbs and the whole body are respectively expressed by X1 to X5, and the calculation formula is as follows:
X1=(test label+predict label)/2
wherein: test label is the actual label of the test sample; the prediction label is a label predicted by the test sample; the calculation of X2 through X5 is the same as above.
The five resulting vectors are weighted, as follows:
X=0.35*X1+0.01*X2+0.35*X3+0.2*X4+0.09*X5
wherein: and (4) multiplication operation is performed, X is a judgment value predicted by the wavelet power spectrum, a judgment standard is defined, if-1 < X < -0.3, the baby behavior is judged to be in an abnormal state, if 0.3< X <1, the baby behavior is judged to be in a normal state, and the rest are all considered to be in a judgment error state.
And comprehensively judging X and Y, wherein a specific flow chart is shown in FIG. 7, if the test sample at least meets one condition of X and Y, the judgment result is considered to be correct, and whether the behavior of the baby is normal can be distinguished.
The invention relates to a generative model design and a discriminant model design, which comprises the following steps:
step A1, generating model design: wherein 6 layers of convolution layers are arranged, and the step length is set as 1; 6 layers of pooling layers, the size of the pooling window being 2 x 2; the network applies a corrected Linear unit relu (corrected Linear unit) activation function, which can obtain good results and faster convergence speed, and the specific operation formula is as follows:
F(Z)=σ(W*Z+b)
wherein: w is the convolution kernel; is a convolution operation; z is a feature vector; b is an offset; σ is a ReLU activation function;
step A2, design of a discriminant model: wherein 5 layers of convolution layers are arranged, and the step length is set as 1; 5 layers of pooling layers, the size of the pooling window being 2 x 2; applied in the network is a correcting Linear unit ReLU (rectified Linear Unit) activation function.
The specific calculation of the wavelet approximate oscillogram and the wavelet power spectrogram in the invention comprises the following steps:
step B1, analyzing the tracked oscillogram by using hart wavelets, constructing a five-layer pyramid according to the Mallat pyramid decomposition algorithm of discrete wavelet transform, extracting a fifth-layer wavelet approximate waveform (as shown in fig. 5) corresponding to four limbs and the whole body, and respectively recording as: abnormal left upper limb: a01; abnormal right upper limb: a02; abnormal left lower limb: a03; abnormal right lower limb: a04; abnormal whole body: a05; normal left upper limb: a11; normal right upper limb: a12; normal left lower limb: a13; normal right lower limb: a14; normal whole body: A15.
step B2, for the y-axis coordinate change diagrams of limbs and the whole body, using a wavelet-based power spectrogram, wherein the set sampling length is the video total frame length 375, the sampling frequency is 1000, and the sampling interval is 1/1000, and the obtained power spectrograms (as shown in fig. 6) are respectively recorded as: abnormal left upper limb: p01; abnormal right upper limb: p02; abnormal left lower limb: p03; abnormal right lower limb: p04; abnormal whole body: p05; normal left upper limb: p11; normal right upper limb: p12; normal left lower limb: p13; normal right lower limb: p14; normal whole body: p15.