Background
Drivers are in a state of fatigue driving and their ability to recognize road conditions and driving skills is significantly reduced. The results of the study show that 25% -30% of traffic accidents are caused by fatigue driving. To overcome this problem, a system must be developed that can effectively detect fatigue driving of drivers and warn them in time.
The fatigue of the driver can be detected using a wearable device to measure the heart rate of the driver or using an RGB camera to extract facial features. However, the wearable device may cause inconvenience and discomfort to the driver, and the detection accuracy of the RGB camera may be affected by light, glasses, and head orientation. In addition, most existing methods ignore the time information of fatigue features and the relationship between the features thereof, and reduce the recognition accuracy. Furthermore, some existing fatigue detection methods focus on processing fatigue features with temporal slices, ignoring temporal variations in the features.
Vehicle behavior-based methods mainly measure vehicle data such as steering angle, speed, acceleration, and turning angle, without considering the physiological signals that detect driver fatigue and make early warnings. Physiological signal based methods mainly apply Electrograms (EOGs) and electrocardiograms and other physiological signals. Drivers must wear associated equipment that is intrusive, prevents driving, and results in a poor user experience.
In terms of the behavior-based approach, fatigue (as a percentage of eyelid closure (PERC L OS)) is determined by detecting eyelid closure frequency.
Driver fatigue is a continuous time process, so temporal changes in fatigue characteristics are very important for fatigue driving identification. However, existing algorithms focus on dealing with fatigue features that have a short time, ignoring the temporal variation of the fatigue features. Some methods may detect fatigue by measuring heart rate using a wearable device. However, wearing the device is inconvenient for the driver and may make them uncomfortable. Existing fatigue driving detection methods use RGB images to extract eye openness, which can affect light, glasses, and head orientation. There is little existing model to capture temporal information of features and temporal relationship information between features for driver fatigue detection.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides an EEG signal emotion recognition method based on reinforcement learning, which comprises the following steps:
s1, filtering electroencephalogram signals with label errors in a training set by adopting a semi-supervised learning mechanism;
s2, extracting the relation between an EEG channel and a frequency band by adopting vectorization convolution; a
S3, combining fuzzy reasoning with RNN to extract effective temporal information of electroencephalogram so as to better understand emotion;
and S4, outputting the EEG signal corresponding to the emotion of the subject, wherein the emotion refers to outputting which of positive emotion, negative emotion and neutral emotion the subject belongs to.
Further, labeling the subject's three emotions, positive emotion, negative emotion, and neutral emotion in the EEG data, the label error comprising neutral emotion EEG data wrongly labeled as positive emotion in the positive emotion EEG data, or neutral emotion EEG data wrongly labeled as negative emotion in the negative emotion EEG data,
the electroencephalogram signal with the wrong label in the filtering training set is that the electroencephalogram data with the neutral emotion wrongly labeled as the positive emotion or the negative emotion is screened out by processing the EEG data through a fuzzy clustering algorithm (FCM) based on a target function.
In an EEG dataset, when the overall emotional state of the video used to stimulate the mood of the subject is positive, negative or neutral, the respective EEG data emotional state from the subject will be labeled as positive, negative or neutral. However, during EEG acquisition, positive or negative mood EEG data may contain partially neutral mood data that is incorrectly labeled as positive or negative mood. Such mislabeled electroencephalographic data can greatly interfere with the accuracy of the emotion recognition method.
Further, neutral emotion EEG data is screened from negative emotion EEG data containing neutral emotions by the following specific screening method:
assume that the input EEG data for FCM is X ═ X1,x2,…,xn]Negative emotion EEG data containing negative emotions and neutral emotions,
the EEG data X is divided into two groups by FCM in the time dimension, where one group is neutral electroencephalographic data X that is falsely labeled as negative emotions1=[x1,x2,…,xi]The other group is negative emotional electroencephalogram data X2=[xi+1,xi+2,...,xn],
Computing X by Euclidean
1And X
neural、X
2And X
neuralThe distance between the electroencephalograms is determined to determine the similarity, and the real neutral emotion electroencephalogram data are screened out through the similarity
ρ is the length of the selected element sequence.
Further, the vectorization convolution is adopted to extract the relation between the EEG channel and the frequency band; by fusing the information of the two dimensions, the EEG signal can be more comprehensively utilized for emotion classification. The method specifically comprises the following steps: by performing a convolution operation on the vectorized convolution layer,
assuming X is the EEG feature over a period of time, X for each elementiPerforming vectorized convolution, wherein xiThe rows and columns of channels are frequency bands, and x is obtained by a series of convolution operationsiBecomes a vector and the length of the final vector is the number of lanes.
Further, the combination of fuzzy inference and RNN in the time information for extracting heart rate characteristics from EEG signals specifically comprises 6 layers,
the first layer is an input layer and the second layer is an output layer,
the second layer is a fuzzy layer, and the membership value of the data from the first layer is calculated by using a Gaussian membership function which is as follows:
wherein m is
i、
Respectively representing the mean and variance of the gaussian membership functions,
the output of the second layer is represented by,
indicates the output of
layer 1, i indicates the number of neurons, and λ indicates the number of neurons in the second layer.
Electroencephalograms are physiological signals that contain noise. The noise can be attenuated to some extent by the blur layer.
The third layer is a spatial activation layer for calculating the membership degree of each node of the fourth layer, the nodes of the third layer provide spatial activation degrees by using the membership degree operation received from the second layer, and the spatial activation strength is calculated by using continuous cumulative multiplication as a fuzzy operator as follows:
wherein,
the output of the third layer is shown,
λ means the dimensions of the j-th output of the i-th output of
layer 2 and the i-th output of
layer 2 neurons, respectively.
The fourth layer is a cycle layer for acquiring time-varying features of EEG, and the layer outputs the spatial activation intensity transmitted by the third layer and the last time activation intensity
In combination, t represents the current time step number, t-1 represents the last time step number, and each neuron in this layer is calculated as follows:
wherein
Representing the temporal activation intensity, the output of the third layer and the output of the fourth layer, respectively.
The mood of the subject is related to the change in the electroencephalographic signal over time. RNNs can effectively extract the characteristics of time series data and are therefore applied in this layer. Obtaining time-varying features of the EEG can help accurately identify emotions that vary over time.
The fifth layer is the result layer, and since the defuzzification operation of the sixth layer requires the use of the input of the fuzzification layer, temporal information needs to be merged with the output of the second layer in this layer, specifically, this layer uses the first layer
And a fourth layer
The outputs of which are weighted linear summation calculations, each node in layer 5 having an output as input for the next layer,
where i, j ∈ [1, ρ],ν
ijω
iIs a weight parameter, v
ij、ω
iRepresenting the weight passed by
layer 2 to the current layer value and the weight passed by layer 4 to the current layer, respectively.
Respectively representing the jth dimension in the ith output of
layer 2, the output of the fourth layer, and the output of the fifth layer.
The sixth layer is an output layer, and the defuzzification is executed by adopting a weighted average defuzzification method, which is as follows:
y represents the output of the sixth layer and is the corresponding positive or negative or neutral mood of the subject.
In the real model, network parameters and structures are adjusted by utilizing back propagation, and the loss function of the model is as follows:
wherein,
is the emotion label for the electroencephalogram data and y represents the output of the sixth layer. y is
iThe ith element denoted as y.
Compared with the prior art, the invention has the following beneficial effects:
(1) the unsupervised learning approach filters the raw EEG data, making the data used to train the model more accurate. The method enables the model after circulation to be capable of predicting the model more effectively, and improves the identification precision of the model.
(2) Considering that the emotion of a subject is related to the change of electroencephalogram signals along with time, fuzzy reasoning is combined with RNN, wherein the fuzzy reasoning can well process noise, the RNN can effectively extract the characteristics of time series data, and the time-varying characteristics of EEG can be obtained to help accurately identify the emotion along with time. The robustness of the model is effectively improved.
(3) The proposed neural network is enabled to fuse information in both EEG signal channel and frequency dimensions by vectorized convolution. By fusing the information in these two dimensions, the EEG signal can be more fully utilized for emotion classification.
Detailed Description
The invention will be further described with reference to examples and figures, but the embodiments of the invention are not limited thereto.
As shown in fig. 1, the present embodiment provides an EEG signal emotion recognition method based on reinforcement learning, which includes the following steps:
s1, filtering electroencephalogram signals with label errors in a training set by adopting a semi-supervised learning mechanism;
s2, extracting the relation between an EEG channel and a frequency band by adopting vectorization convolution, and performing emotion classification by fusing the information of the two dimensions by utilizing the EEG signal more comprehensively.
S3, combining fuzzy reasoning and RNN, extracting effective temporal information of electroencephalogram to better understand emotion, and outputting the EEG signal corresponding to the emotion of the subject, namely outputting the emotion of the subject to be one of positive emotion, negative emotion and neutral emotion.
In particular, in an EEG dataset, when the overall emotional state of the video used to stimulate the mood of the subject is positive, negative or neutral, the respective EEG data emotional state from the subject will be labeled as positive, negative or neutral mood. However, during EEG acquisition, positive or negative mood EEG data may contain partially neutral mood data that is incorrectly labeled as positive or negative mood. The overall emotional state of the video selected from the data set is negative, and therefore, the electroencephalographic signal collected from the subject is labeled as negative electroencephalographic data. When the subject begins to watch the video clip, it stimulates the subject to use EEG data that is neuro-mood neutral, rather than negative mood. But the data set would mark the emotion tags of these EEG data as negative. Such badly labeled EEG data can greatly interfere with the accuracy of the emotion recognition method.
The electroencephalogram signal with the wrong label in the filtering training set is that the electroencephalogram data with the neutral emotion wrongly labeled as the positive emotion or the negative emotion is screened out by processing the EEG data through a fuzzy clustering algorithm (FCM) based on a target function.
FCM is a fuzzy clustering algorithm based on an objective function, and improves emotion recognition accuracy of a model by calculating Euclidean distance of the distance from each sample to a clustering center to screen out wrongly labeled neutral electroencephalogram data from positive and negative emotion electroencephalogram data. For ease of understanding, a specific step of screening out neutral mood EEG data from negative mood EEG data (containing neutral mood incorrectly labeled as negative mood) by FCM is given below.
Assume that the input EEG data for FCM is X ═ X
1,x
2,…,x
n]Negative emotion EEG data including negative emotions and neutral emotions, and the EEG data X is divided into two groups in a time dimension by FCM, wherein one group is neutral electroencephalogram data X which is wrongly labeled as a negative emotion
1=[x
1,x
2,…,x
i]The other group is negative emotional electroencephalogram data X
2=[x
i+1,x
i+2,...,x
n]Computing X by Euclidean
1And X
neutral、X
2And X
neutralThe distance between the electroencephalograms is determined to determine the similarity, and the real neutral emotion electroencephalogram data are screened out through the similarity
Wherein, X
neutralThe expression means that the emotion sample data marked as neutral is represented, and ρ is the length of the element sequence to be screened.
In step S2, specifically, the extracting of the relationship between the EEG channel and the frequency band by using vectorized convolution(ii) a By fusing the information of the two dimensions, the EEG signal can be more comprehensively utilized for emotion classification. The method specifically comprises the following steps: the convolution operation is performed by vectorizing the convolutional layer. This embodiment pairs each element X in XiPerforming vectorized convolution, wherein xiThe rows and columns of channels are frequency bands, and x is obtained by a series of convolution operationsiBecomes a vector and the length of the final vector is the number of lanes.
In step S3, the fuzzy inference is combined with RNN, specifically including 6 layers,
the first layer is an input layer, and the input information of the input layer is X.
The second layer is a fuzzy layer, and the membership value of the data from the first layer is calculated by using a Gaussian membership function which is as follows:
wherein m is
i、
Respectively representing the mean and variance of the gaussian membership functions,
the output of the second layer is represented by,
indicates the output of
layer 1, and i indicates the number of the neuron. The meaning of λ is the number of
layer 2 neurons.
Electroencephalograms are physiological signals that contain noise. The noise can be attenuated to some extent by the blur layer.
The third layer is a spatial activation layer for calculating the membership degree of each node of the fourth layer, the nodes of the third layer provide spatial activation degrees by using the membership degree operation received from the second layer, and the spatial activation strength is calculated by using continuous cumulative multiplication as a fuzzy operator as follows:
wherein,
the output of the third layer is shown,
λ means the output of
layer 2 and the number of
layer 2 neurons, respectively.
The fourth layer is a cycle layer for acquiring time-varying features of EEG, and the layer outputs the spatial activation intensity transmitted by the third layer and the temporal activation intensity of the last time
And (4) combining. t represents the current time step number, and t-1 represents the last time step number. Each neuron of this layer is calculated as follows:
wherein
Representing the temporal activation intensity, the output of layer 3 and the output of layer 4, respectively.
The mood of the subject is related to the change in the electroencephalographic signal over time. RNNs can effectively extract the characteristics of time series data and are therefore applied in this layer. Obtaining time-varying features of the EEG can help accurately identify emotions that vary over time.
The fifth layer is the result layer, and since the defuzzification operation of the sixth layer requires the use of the input of the fuzzification layer, temporal information needs to be merged with the output of the second layer in this layer, specifically, this layer uses the first layer
And a fourth layer
The outputs of which are weighted linear summation calculations, each node in layer 5 having an output as input for the next layer,
where i, j ∈ [1, ρ],ν
ij、ω
iAre weight parameters representing the weight passed by
layer 2 to the current layer value and the weight passed by layer 4 to the current layer, respectively.
Respectively representing the jth dimension, the 4 th layer output and the 5 th layer output in the ith output of the 2 layers.
The sixth layer is an output layer, and the defuzzification is executed by adopting a weighted average defuzzification method, which is as follows:
y represents the output of the sixth layer, and the emotion of the output subject is either a positive emotion or a negative emotion or a neutral emotion.
In the model, network parameters and structures are adjusted by utilizing back propagation, and the loss function of the model is as follows:
wherein,
is an emotion label for electroencephalogram data, y represents the output of layer 6, y
iThe ith element denoted as y.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which are made without departing from the spirit and principle of the invention are equivalent substitutions and are within the scope of the invention.