Disclosure of Invention
The invention aims to provide a driving behavior recognition method and a driving behavior recognition system, which solve the problem of few driving behavior state classifications.
In order to achieve the purpose, the invention provides the following scheme:
a driving behavior recognition method, comprising:
acquiring driving behavior data;
performing symbolization processing on the driving behavior data to obtain a symbolized speed time sequence;
according to the symbolized speed time sequence, segmenting the driving behavior data by adopting a driving behavior sequence variable point detection algorithm to obtain segmented driving behavior segments;
preprocessing the driving behavior data to obtain a driving behavior fragment characteristic frequency statistical matrix;
acquiring an LDA model, and initializing the clustering number of the LDA model;
inputting the driving behavior segment characteristic frequency statistical matrix into the LDA model to obtain a first driving behavior category;
calculating the confusion degree of the LDA model according to the first driving behavior category, and recording the confusion degree and the LDA model corresponding to the confusion degree;
judging whether the clustering number of the LDA model is equal to a preset number or not to obtain a first judgment result;
if the first judgment result is negative, increasing the clustering number of the LDA model, returning to the step of inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA model to obtain a first driving behavior category, and updating and recording the confusion degree and the LDA model corresponding to the confusion degree;
if so, comparing the confusion degrees of all records to obtain the minimum confusion degree;
obtaining an LDA model corresponding to the minimum confusion degree according to the minimum confusion degree;
and inputting the characteristic frequency statistical matrix of the driving behavior segment into an LDA model corresponding to the minimum confusion degree to obtain the driving behavior category.
Optionally, the symbolizing the driving behavior data to obtain a symbolized speed time sequence specifically includes:
obtaining a speed time series S in the driving behavior datavelocity={v1,v2,…,vnIn which v is1,v2,…,vnRepresenting sequence points, n representing the total number of said sequence points;
obtaining a symbolized velocity time series S according to the formula (1), wherein the formula (1) is as follows:
in the formula (1), vjRepresenting said speedTime series SvelocityThe j-th sequence point in the sequence table, j represents the sequence number of the sequence point, j belongs to [2, n-1 ]]。
Optionally, the segmenting the driving behavior data by using a driving behavior sequence variable point detection algorithm according to the symbolized speed time sequence to obtain segmented driving behavior segments specifically includes:
calculating the information entropy of the velocity sequence points in the symbolized velocity time sequence according to formula (2):
H(S)=-∑p(si)log(p(si))(i=1,2,...,n) (2);
in the formula (2), h (S) represents the information entropy of the velocity sequence point, S represents the symbolized velocity time sequence, and S ═ S1,s2,…,sn},s1,s2,…,snSymbolized values representing sequence points, i.e. velocity sequence points; p(s)i) Representing the ith speed sequence point siThe occurrence probability of (2); si1 denotes acceleration, si0 denotes constant velocity, si-1 represents deceleration;
calculating the ith speed sequence point s according to formula (3)iLocal minimum entropy E ofj:
Ej=H(SForward)+H(SBackward) (3);
In the formula (3), H (S)Forward) Representing said velocity sequence points siEntropy of information of a previously signed velocity time series, SForward={s1,s2,…,si};H(SBackward) Representing said velocity sequence points siInformation entropy of the subsequent symbolized velocity time series, SBackward={si+1,si+2,…,sn};
Obtaining a driving state division point of the symbolized speed time sequence by adopting a variable step length sliding window method according to the symbolized speed time sequence;
and acquiring all driving state segmentation points, wherein the driving behavior data between any two adjacent driving state segmentation points is a segmented driving behavior segment.
Optionally, obtaining the driving state segmentation point by using a variable-step sliding window method according to the symbolized speed time sequence specifically includes:
acquiring the preset minimum length of a window, the preset maximum length of the window, a window offset step length and a window length;
initializing a serial number j and a window length l of a speed sequence point;
acquiring a sequence to be detected according to the serial number of the initialized speed sequence point and the initialized window length;
acquiring a first region from the sequence to be detected according to the window offset step length and the window length;
calculating the minimum local entropy of all speed sequence points in the first region according to a formula (3) to obtain a minimum local entropy set;
calculating the minimum local entropy of the minimum set of local entropies according to formula (4):
Es=min{Ej+f,Ej+f+1,...,Ej+l-f-1,Ej+l-f} (4)
e in formula (4)sRepresenting the minimum local entropy;
judging whether the minimum local entropy is unique in the first region or not to obtain a second judgment result;
if the second judgment result is yes, carrying out third judgment;
if the second judgment result is negative, performing fourth judgment;
the third judgment is to judge whether the reduction times of all the minimum local entropies before the speed sequence point corresponding to the minimum local entropy and the increase times of all the minimum local entropies after the speed sequence point corresponding to the minimum local entropy are simultaneously greater than a preset time to obtain a third judgment result; if the third judgment result is negative, the fourth judgment is carried out;
if so, the speed sequence point corresponding to the minimum local entropy is the driving state division point, and the fourth judgment is carried out;
the fourth judgment is to judge whether the window length is greater than or equal to the maximum length of the window to obtain a fourth judgment result;
if so, updating the serial number j of the speed sequence point, adding 1 to j, enabling the window length l to be equal to the minimum length of the window, and performing fifth judgment;
if the fourth judgment result is negative, updating the window length, adding 1 to the window length l, and performing the fifth judgment;
the fifth judgment is to judge whether j + l is greater than or equal to n to obtain a fifth judgment result;
if so, acquiring the driving state division point;
and if not, updating the sequence to be detected according to the updated window length and the updated sequence number of the speed sequence point, and returning to the step of acquiring the first region from the sequence to be detected according to the window offset step length and the window length.
Optionally, the preprocessing is performed on the driving behavior data to obtain a driving behavior segment characteristic frequency statistical matrix, which specifically includes:
normalizing the driving behavior characteristics in the driving behavior data to obtain normalized driving behavior characteristics; the driving behavior characteristics comprise a speed characteristic, a current characteristic and an acceleration characteristic;
equally dividing each driving behavior characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the driving behavior characteristic;
calculating the frequency of the speed, the current and the acceleration in each driving behavior segment in each characteristic interval by a counting statistical method to obtain a frequency matrix;
and counting all the driving behavior segments to obtain frequency matrixes of all the driving behavior segments, and combining the frequency matrixes of all the driving behavior segments into a characteristic frequency statistical matrix of the driving behavior segments.
Optionally, the calculating a perplexity of the LDA model according to the first driving behavior category, and recording the perplexity and the LDA model corresponding to the perplexity specifically includes:
calculating the degree of confusion according to equation (5):
in formula (5), p (d) represents the degree of confusion; d represents the number of the driving behavior segments, D represents the sequence number of the driving behavior segments, and D belongs to D; p (omega)d) Representing the occurrence probability of each driving behavior feature in each driving behavior segment; p (omega)d) P (z | d) × p (ω | z), p (z | d) representing the probability of occurrence of each of the driving behavior classes in each of the driving behavior segments, and p (ω | z) representing the probability of occurrence of each of the driving behavior features in each of the driving behavior classes; rdRepresenting the total length of the driving behavior segment.
A driving behavior recognition system, comprising:
the acquisition module is used for acquiring driving behavior data;
the symbolization speed time sequence module is used for carrying out symbolization processing on the driving behavior data to obtain a symbolization speed time sequence;
the driving behavior segment module is used for segmenting the driving behavior data by adopting a driving behavior sequence variable point detection algorithm according to the symbolized speed time sequence to obtain segmented driving behavior segments;
the driving behavior segment characteristic frequency statistical matrix module is used for preprocessing the driving behavior data to obtain a driving behavior segment characteristic frequency statistical matrix;
the initialization module is used for acquiring the LDA model and initializing the clustering number of the LDA model;
the first driving behavior category module is used for inputting the driving behavior segment characteristic frequency statistical matrix into the LDA model to obtain a first driving behavior category;
the confusion degree module is used for calculating the confusion degree of the LDA model according to the first driving behavior type and recording the confusion degree and the LDA model corresponding to the confusion degree;
the first judgment module is used for judging whether the clustering number of the LDA model is equal to the preset number or not to obtain a first judgment result; if the first judgment result is negative, executing an updating module; if so, executing a minimum confusion module;
the updating module is used for increasing the clustering number of the LDA models, executing a first driving behavior classification module, and updating and recording the confusion degree and the LDA models corresponding to the confusion degree;
the minimum confusion degree module is used for comparing the confusion degrees of all records to obtain the minimum confusion degree;
the LDA model module is used for obtaining an LDA model corresponding to the minimum confusion degree according to the minimum confusion degree;
and the driving behavior classification module is used for inputting the driving behavior segment characteristic frequency statistical matrix into the LDA model corresponding to the minimum confusion degree to obtain the driving behavior classification.
Optionally, the symbolization speed time sequence module specifically includes:
an acquisition unit for acquiring a speed time series S in the driving behavior datavelocity={v1,v2,…,vnIn which v is1,v2,…,vnRepresenting sequence points, n representing the total number of said sequence points;
a calculating unit, configured to obtain a symbolized velocity time series S according to formula (1), where formula (1):
in the formula (1), vjRepresenting said velocity time series SvelocityThe j-th sequence point in the sequence table, j represents the sequence number of the sequence point, j belongs to [2, n-1 ]]。
Optionally, the driving behavior segment module specifically includes:
an information entropy unit, configured to calculate information entropy of velocity sequence points in the signed velocity time sequence according to formula (2):
H(S)=-∑p(si)log(p(si))(i=1,2,...,n) (2);
in the formula (2), h (S) represents the information entropy of the velocity sequence point, S represents the symbolized velocity time sequence, and S ═ S1,s2,…,sn},s1,s2,…,snSymbolized values representing sequence points, i.e. velocity sequence points; p(s)i) Representing the ith speed sequence point siThe occurrence probability of (2); si1 denotes acceleration, si0 denotes constant velocity, si-1 represents deceleration;
a local minimum entropy unit for calculating the ith velocity sequence point s according to formula (3)iLocal minimum entropy E ofj:
Ej=H(SForward)+H(SBackward) (3);
In the formula (3), H (S)Forward) Representing said velocity sequence points siEntropy of information of a previously signed velocity time series, SForward={s1,s2,…,si};H(SBackward) Representing said velocity sequence points siInformation entropy of the subsequent symbolized velocity time series, SBackward={si+1,si+2,…,sn};
The driving state division point unit is used for obtaining the driving state division point of the symbolic speed time sequence by adopting a variable step length sliding window method according to the symbolic speed time sequence;
and the driving behavior segment unit is used for acquiring all driving state segmentation points, and driving behavior data between any two adjacent driving state segmentation points is a segmented driving behavior segment.
Optionally, the confusion module specifically includes:
a confusion unit for calculating the confusion according to equation (5):
in formula (5), p (d) represents the degree of confusion; d represents the number of the driving behavior segments, D represents the sequence number of the driving behavior segments, and D belongs to D; p (omega)d) Representing the occurrence probability of each driving behavior feature in each driving behavior segment; p (omega)d) P (z | d) × p (ω | z), p (z | d) representing the probability of occurrence of each of the driving behavior classes in each of the driving behavior segments, and p (ω | z) representing the probability of occurrence of each of the driving behavior features in each of the driving behavior classes; rdRepresenting the total length of the driving behavior segment.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a driving behavior identification method and system. The method comprises the following steps: acquiring driving behavior data; performing symbolization processing on the driving behavior data to obtain a symbolized speed time sequence; according to the symbolized speed time sequence, a driving behavior sequence variable point detection algorithm is adopted to segment driving behavior data to obtain segmented driving behavior segments; preprocessing driving behavior data to obtain a driving behavior fragment characteristic frequency statistical matrix; acquiring an LDA model, and initializing the clustering number of the LDA model; inputting the characteristic frequency statistical matrix of the driving behavior segment into an LDA model to obtain a first driving behavior category; calculating the confusion degree of the LDA model according to the first driving behavior category, and recording the confusion degree and the LDA model corresponding to the confusion degree; judging whether the clustering number of the LDA model is equal to a preset number or not to obtain a first judgment result; if the first judgment result is negative, increasing the clustering number of the LDA models, returning to the step of inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA models to obtain a first driving behavior category, and updating and recording the confusion degree and the LDA models corresponding to the confusion degree; comparing the confusion degrees of all records to obtain the minimum confusion degree as a first judgment result; obtaining an LDA model corresponding to the minimum confusion degree according to the minimum confusion degree; and inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA model corresponding to the minimum confusion degree to obtain the driving behavior category. According to the method, the driving behavior data are segmented by adopting a driving behavior sequence variable point detection algorithm to obtain segmented driving behavior segments, the driving behavior data can be segmented more accurately, the segmentation points of the driving behavior state can be identified more accurately, the misjudgment condition of the driving behavior state is reduced, and the formed driving behavior segments are more complete; and measuring the LDA model by using the confusion degree to obtain the optimal clustering number, and obtaining the driving behavior category according to the LDA model with the optimal clustering number, wherein the obtained driving behavior category is more complete and comprehensive.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a driving behavior recognition method, and fig. 1 is a flow chart of the driving behavior recognition method provided by the embodiment of the invention. Referring to fig. 1, the driving behavior recognition method includes:
step 101, acquiring driving behavior data. The method comprises the steps of obtaining speed data, current data and acceleration data of 50 pure electric passenger cars from a national monitoring and management platform of the new energy automobile as driving behavior data.
And 102, performing symbolization processing on the driving behavior data to obtain a symbolized speed time sequence.
Step 102, specifically comprising:
obtaining a time series of speeds S in driving behavior datavelocity={v1,v2,…,vnIn which v is1,v2,…,vnRepresenting the sequence points and n representing the total number of sequence points. The driving behavior recognition method provided by the invention utilizes the speed data to segment the driving behavior data, and the speed data in the driving behavior data cannot be directly applied to a driving behavior sequence variable point detection algorithm, so that the speed time sequence is symbolized.
Obtaining a symbolized velocity time series S according to the formula (1), wherein the formula (1) is as follows:
in the formula (1), vjRepresenting a time series S of velocitiesvelocityThe j-th sequence point in the sequence table, j represents the sequence number of the sequence point, j belongs to [2, n-1 ]]In this embodiment, the first sequence point and the last sequence point of the velocity time series are not subjected to the symbolization process. Step 102 passes the original velocity time series through a symbolization process, represented as a symbolized velocity time series including-1, 0 and 1, which may be used to represent the velocity time series based on minimum local entropyAnd (3) a driving behavior sequence change point detection algorithm.
In order to improve the information accuracy of the symbolized speed time series S and eliminate the influence of certain short-time operations or constant-speed operations on the segmentation of the driving behavior data, the following rule is added on the basis of the formula (1): and if the symbolic numerical value of any sequence point is 0, but the symbolic numerical values of the previous sequence point and the next sequence point of the sequence point are consistent and are not 0, making the symbolic numerical value of the sequence point consistent with the states of the symbolic numerical values of the previous sequence point and the next sequence point.
And 103, segmenting the driving behavior data by adopting a driving behavior sequence variable point detection algorithm according to the symbolized speed time sequence to obtain segmented driving behavior segments.
Step 103, specifically comprising:
calculating the information entropy of the velocity sequence points in the symbolized velocity time sequence according to formula (2):
H(S)=-∑p(si)log(p(si))(i=1,2,...,n) (2);
in formula (2), h (S) represents information entropy of velocity sequence points, S represents symbolized velocity time sequence, and S ═ S1,s2,…,sn},s1,s2,…,snSymbolized values representing sequence points, i.e. velocity sequence points; p(s)i) Representing the ith speed sequence point siThe occurrence probability of (2); si1 denotes acceleration, si0 denotes constant velocity, si-1 represents deceleration; i denotes the number of speed sequence points.
Calculating the ith speed sequence point s according to the formula (3)iLocal minimum entropy E ofj:
Ej=H(SForward)+H(SBackward) (3);
In the formula (3), H (S)Forward) Representing a velocity sequence of points siEntropy of information of a previously signed velocity time series, SForward={s1,s2,…,si};H(SBackward) Representing a velocity sequence of points siSubsequent symbolization rateEntropy of information of degree time series, SBackward={si+1,si+2,…,sn}。
And obtaining all driving state division points in the symbolized speed time sequence by adopting a variable step length sliding window method according to the symbolized speed time sequence. As can be seen from the formula (3), any speed sequence point in the symbolic speed time sequence may be a driving state division point, and only when the minimum local entropy of the speed sequence point is the minimum value of the minimum local entropies of all the speed sequence points, the speed sequence point is a real driving state division point, and the boundary reliability determined by the driving state division point is the maximum; if the driving state division point is not the real driving state division point, the minimum local entropy of the driving state division point is certainly larger than the minimum local entropy of the real driving state division point. Since the driving performance is a short-cycle process, the driving performance category of the driving performance data is related only to the entropy of the information of the adjacent speed sequence points in the driving performance data. Therefore, the invention adopts a variable step length sliding window method to calculate the minimum local entropy of the symbolized speed time sequence and the real driving state division point. If the driving state division points are not contained in the window, the change of the minimum local entropy of all possible driving state division points in the window is not regular. Although the minimum local entropies corresponding to different window lengths and different types of driving behavior states are different, as long as a driving state division point exists in a window, the minimum local entropy value of the possible driving state division point is a process from large to small from the first possible driving state division point, then the minimum local entropy value reaches the minimum value at the real driving state division point, and then the minimum local entropy value gradually increases from small to large.
Fig. 2 is a flowchart of a step-variable sliding window method according to an embodiment of the present invention, and referring to fig. 2, the step-variable sliding window method includes:
and step 1031, setting the minimum length of the window, the maximum length of the window, the window offset step length and the window length, wherein the setting range of the minimum length of the window is within 5 seconds(s) -10s, and the setting range of the maximum length of the window is 10s-20 s. The shorter the window length is set, the higher the degree of sensitivity to the change in the driving behavior state, and the longer the window length is set, the lower the degree of sensitivity to the change in the driving behavior state. Since the driving operation behavior is a process quantity, too short window length may cause too sensitive recognition of the driving behavior state, and too long window length may cause too sparse recognition of the driving behavior state, in this embodiment, the minimum window length min is 7 seconds, the maximum window length Maxl is 15 seconds, and the window offset step length f is 2.
Step 1032, acquiring the preset minimum length mint of the window, the maximum length Maxl of the window, the window offset step length f and the window length l.
In step 1033, the sequence number j and window length l of the velocity sequence point are initialized.
Step 1034, obtaining the serial number of the initialized speed sequence point, the initialized window length and the speed sequence point corresponding to the serial number of the initialized speed sequence point according to the serial number of the initialized speed sequence point and the initialized window length.
Step 1035, obtaining the sequence to be detected according to the initialized speed sequence point and the initialized window length, specifically: intercepting a signed speed time sequence segment of the window length according to the initialized speed sequence point and the window length, namely recording a sequence to be detected as St={sj,sj+1,…,sj+lL denotes the window length, sj,sj+1,…,sj+lRepresenting the velocity sequence points of the sequence to be detected.
Step 1036, obtaining a first region from the sequence to be detected according to the window offset step and the window length, where the first region is Se={sj+f,sj+f+1,…,sj+l-f},sj+f,sj+f+1,…,sj+l-fRepresenting the velocity sequence points in the first region. In order to prevent the minimum local entropy from being unreliable due to too few symbolic speed time sequences before any speed sequence point or too few symbolic speed time sequences after any speed sequence point, a first region is obtained from a sequence to be detected according to a window offset step length and a window length. If it is to be sequencedOnly one sequence point or no sequence exists in the column, and the entropy value of the minimum local entropy cannot be calculated, so that the first region is obtained from the sequence to be detected according to the window offset step length and the window length.
Step 1037, calculating the minimum local entropy of all the velocity sequence points in the first region according to the formula (3), and obtaining a minimum local entropy set, where the minimum local entropy set is E ═ Ej+f,Ej+f+1,...,Ej+l-f-1,Ej+l-f},Ej+f,Ej+f+1,…,Ej+l-f-1,Ej+l-fRepresenting the minimum local entropy of all velocity sequence points in the first region.
Step 1038, calculate the smallest local entropy in the smallest set of local entropies according to equation (4).
Es=min{Ej+f,Ej+f+1,...,Ej+l-f-1,Ej+l-f} (4)
E in formula (4)sRepresenting the minimum local entropy.
Step 1039, determine whether the minimum local entropy in the first region is unique, and obtain a second determination result.
If yes, the third determination is made in step 10310.
If the second determination result is negative, the fourth determination is performed in step 10312.
And step 10310, judging for the third time whether the reduction times of all the minimum local entropies before the speed sequence point corresponding to the minimum local entropy and the increase times of all the minimum local entropies after the speed sequence point corresponding to the minimum local entropy are simultaneously greater than the preset times p, and obtaining a third judgment result. In this embodiment, the predetermined number p is 2. If yes, go to step 10311; if the third determination result is no, a fourth determination is made in step 10312.
Step 10311, the speed sequence point corresponding to the minimum local entropy is the driving state division point, and step 10312 is determined for the fourth time.
And step 10312, judging for the fourth time whether the window length is greater than or equal to the maximum length of the window, and obtaining a fourth judgment result. If yes, go to step 10313; if the fourth determination result is no, step 10314 is performed.
And step 10313, updating the serial number j of the speed sequence point, adding 1 to the j, enabling the window length l to be equal to the minimum length of the window, and performing the fifth judgment in step 10315.
And step 10314, updating the window length, adding 1 to the window length l, and performing a fifth judgment in step 10315.
And step 10315, judging whether j + l is greater than or equal to n in the fifth judgment, and obtaining a fifth judgment result. If yes, go to step 10316; if the fifth determination result is no, step 10317 is performed.
And step 10316, obtaining driving state division points.
And step 10317, updating the sequence to be detected according to the updated window length and the updated sequence number of the speed sequence point, and returning to step 1036 to obtain the first region from the sequence to be detected according to the window offset step length and the window length.
The driving behavior sequence variable point detection algorithm can segment a long driving behavior data into a plurality of small driving behavior segments, and the driving state segmentation point of each driving behavior segment can represent the change of the driving state of a driver. Because the LDA model can not directly process the driving behavior data, the driving behavior data is segmented by adopting a driving behavior sequence variable point detection algorithm to obtain fine driving behavior segments, and the driving behavior segments can be applied to the LDA model.
And acquiring all driving state segmentation points, wherein the driving behavior data between any two adjacent driving state segmentation points is a segmented driving behavior segment.
And 104, preprocessing the driving behavior data to obtain a driving behavior fragment characteristic frequency statistical matrix. An lda (latent Dirichlet allocation) model is a document theme generation model, is also called a three-layer bayesian probability model, comprises three-layer structures of words, themes and documents, is also a hidden Dirichlet allocation model, and is a generation-type unsupervised machine learning algorithm.
Step 104, specifically comprising:
and carrying out normalization processing on the driving behavior characteristics in the driving behavior data to obtain normalized driving behavior characteristics. The driving behavior characteristics include a speed characteristic, a current characteristic, and an acceleration characteristic.
Equally dividing each driving behavior characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the driving behavior characteristic, and particularly equally dividing the normalized speed characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the speed characteristic; equally dividing the normalized current characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the current characteristic; the normalized acceleration characteristic is equally divided into 20 characteristic intervals according to the difference between the maximum value and the minimum value of the acceleration characteristic.
And calculating the frequency of the speed, the current and the acceleration in each driving behavior segment in each characteristic interval by a counting statistical method to obtain a frequency matrix. The frequency matrix is a 1 × 60 matrix including 3 driving behavior features and 20 feature intervals per driving behavior feature.
And counting all the driving behavior segments to obtain frequency matrixes of all the driving behavior segments, and combining the frequency matrixes of all the driving behavior segments into a driving behavior segment characteristic frequency statistical matrix. The statistical matrix of the characteristic frequency of the driving behavior segments is a matrix D multiplied by 60, and D represents the number of the driving behavior segments.
FIG. 4 is a structural diagram of the LDA model provided in the embodiment of the present invention, and in FIG. 4, α represents a hyper-parameter of Dirichlet (Dirichlet) distribution of a topic under each driving behavior segment, and θ representsdRepresenting the distribution of driving behavior categories under the d-th driving behavior segment; phi is akA distribution representing driving behavior characteristics under a kth topic; z is a radical ofd,mRepresenting the driving behavior category of the jth speed sequence point in the jth driving behavior segment; omegad,mRepresenting the driving behavior category corresponding to the jth sequence point in the d driving behavior segment which is finally generated; mdA quantity representing 3 characteristics of speed, current and acceleration; d represents drivingIn the embodiment, a document of the LDA model corresponds to the driving behavior segment, the theme corresponds to the driving behavior class and the driving behavior feature, and finally the distribution condition of the driving behavior class under different driving behavior states is obtained, wherein the driving behavior class consists of various driving behavior features.
The generation process of the LDA model comprises the following steps:
(1)α→θd→zd,mi.e. document-subject process. The document-topic process means that one topic class is generated in the distribution of "driving behavior segment-driving behavior class" in the d-th driving behavior segment, and the document-topic process obeys the following distribution:
sampling Dirichlet (Dirichlet) distributions with hyperparameters α to generate a theme distribution theta of the d-th driving behavior segmentd:
θd~Dir(α) (6)
Distribution of theta from the subjectdSampling and generating theme z of mth driving behavior category in the mth driving behavior segmentd,m:
zd,m~Mult(θd) (7)
(2)β→φk→ωd,m|k=zd,mI.e. a topic-word process. Topic-word process is represented by k ═ zd,mUnder the limitation of (1) at topic zd,mGenerating a driving behavior category in the distribution of the driving behavior category-driving behavior characteristics, wherein the subject-word process obeys the following distribution:
sampling from Dirichlet distribution of hyperparameter β generates topic zd,mDistribution of driving behavior characteristics phik:
φk~Dir(β) (8)
From the distribution of driving behavior characteristicskMid-sampling generation drivingBehavior class ωd,m:
ωd,m~Mult(φk) (9)
(3) And (3) repeating the step (1) to the step (2) until all driving behavior segment frequency statistical matrixes are traversed, so that the distribution condition of the speed characteristic, the acceleration characteristic and the current characteristic of each driving behavior category can be obtained.
The LDA model needs two parameters of 'driving behavior segment-driving behavior category' distribution and 'driving behavior category-driving behavior characteristic' distribution to be deduced through training and learning; the LDA model comprises a variation inference algorithm and an Expectation-maximization (EM) algorithm, and the LDA model is trained and learned based on the EM algorithm.
And 106, inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA model to obtain a first driving behavior category.
And step 107, calculating the confusion degree of the LDA model according to the first driving behavior type, and recording the confusion degree and the LDA model corresponding to the confusion degree. The confusion (Preplexity) is a measure of the LDA model, and is also a measurement method of information theory, and is generally used for comparison of probability models. The embodiment determines the number of the optimal clustering categories and the number of driving behavior categories of the LDA model through the minimum confusion value.
Step 107, specifically including:
the degree of confusion is calculated according to equation (5):
in the formula (5), p (D) represents the degree of confusion; d represents the number of the driving behavior segments, D represents the sequence number of the driving behavior segments, and D belongs to D; p (omega)d) Representing the occurrence probability of each driving behavior feature in each driving behavior segment; p (omega)d) P (z | d) × p (ω | z), p (z | d) representing the probability of occurrence of each driving behavior category in each driving behavior segment, i.e., θd(ii) a p (ω | z) represents the probability of occurrence of each driving behavior feature in each driving behavior class, i.e., φk;RdRepresenting the total length of the driving behavior segment.
And step 108, judging whether the clustering number of the LDA models is equal to a preset number or not, and obtaining a first judgment result.
And step 109, if the first judgment result is negative, increasing the clustering number of the LDA models, returning to the step 106, inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA models to obtain a first driving behavior category, and updating and recording the confusion degree and the LDA models corresponding to the confusion degree.
In step 110, the first judgment result is yes, and the confusion degrees of all records are compared to obtain the minimum confusion degree.
And step 111, obtaining the LDA model corresponding to the minimum confusion degree according to the minimum confusion degree. In this embodiment, the number of clusters of the LDA model corresponding to the minimum confusion degree is 13.
And 112, inputting the characteristic frequency statistical matrix of the driving behavior segment into the LDA model corresponding to the minimum confusion degree to obtain the driving behavior category. Step 112 also includes entering each driving behavior segment into the LDA model corresponding to the least degree of confusion to derive a probability that each driving behavior segment corresponds to each driving behavior category. 4 driving behavior states can be obtained by the present embodiment: acceleration behavior, deceleration behavior, uniform speed behavior, and idle behavior, 13 driving behavior categories: low-speed slow acceleration, medium-speed fast acceleration, medium-low speed slow acceleration, medium-low speed-medium speed fast acceleration, medium-speed braking deceleration, low-speed deceleration, medium-low speed deceleration, medium-high speed uniform speed, high-speed uniform speed, medium-low speed uniform speed, medium-speed uniform speed and idle speed. The driving behavior categories in this embodiment are shown in table 1:
TABLE 1 Driving behavior classes of electric-only passenger vehicles
The driving behavior sequence variable point detection algorithm can more accurately segment driving behavior data and more accurately identify the variable points of the driving behavior state, namely the driving state segmentation points. On the aspect of identification results, the driving behavior identification method can better identify the change position of the driving behavior state, the misjudgment condition of the driving behavior state is few, and the formed driving behavior segment is more complete.
Compared with the traditional method, the method has the advantages that by utilizing the LDA model, abstract linear segmentation can be associated with specific driving behavior characteristics; for the clustering analysis result of the LDA model, the driving behavior identification method can adopt a probability form with higher identification reliability to represent the driving behavior category in each driving behavior segment; the LDA model is a generative model, and only new driving behavior data or detection behavior quantity needs to be added, namely only the joint distribution of the new driving behavior class needs to be calculated.
The method takes mass, real-time and dynamic vehicle operation data of a national new energy vehicle supervision and management platform as a main data source of the driving behavior data, covers a large number of pure electric vehicles, and can cover various vehicle driving scenes and a large number of driving behavior characteristics. On the practical application level, the method has higher identification precision, relatively higher calculation speed and certain application capability.
The invention provides a driving behavior recognition system, and fig. 3 is a system structure diagram of the driving behavior recognition system provided in the embodiment of the invention. Referring to fig. 3, the driving behavior recognition system includes:
the acquiring module 201 is configured to acquire driving behavior data. The method comprises the steps of obtaining speed data, current data and acceleration data of 50 pure electric passenger cars from a national monitoring and management platform of the new energy automobile as driving behavior data.
And a symbolization speed time sequence module 202, configured to perform symbolization processing on the driving behavior data to obtain a symbolization speed time sequence.
The symbolization speed time sequence module 202 specifically includes:
an acquisition unit for acquiring a speed time series S in the driving behavior datavelocity={v1,v2,…,vnIn which v is1,v2,…,vnRepresenting the sequence points and n representing the total number of sequence points.
A calculating unit, configured to obtain a symbolized velocity time series S according to formula (1), where formula (1):
in the formula (1), vjRepresenting a time series S of velocitiesvelocityThe j-th sequence point in the sequence table, j represents the sequence number of the sequence point, j belongs to [2, n-1 ]]In this embodiment, the first sequence point and the last sequence point of the velocity time series are not subjected to the symbolization process. And if the symbolic numerical value of any sequence point is 0, but the symbolic numerical values of the previous sequence point and the next sequence point of the sequence point are consistent and are not 0, making the symbolic numerical value of the sequence point consistent with the states of the symbolic numerical values of the previous sequence point and the next sequence point.
And the driving behavior segment module 203 is configured to segment the driving behavior data by using a driving behavior sequence variable point detection algorithm according to the symbolized speed time sequence to obtain a segmented driving behavior segment.
The driving behavior segment module 203 specifically includes:
an information entropy unit, configured to calculate information entropy of velocity sequence points in the symbolized velocity time sequence according to formula (2):
H(S)=-∑p(si)log(p(si))(i=1,2,...,n) (2);
in formula (2), h (S) represents information entropy of velocity sequence points, S represents symbolized velocity time sequence, and S ═ S1,s2,…,sn},s1,s2,…,snSymbolized values representing sequence points, i.e. velocity sequence points; p(s)i) Representing the ith speed sequence point siThe occurrence probability of (2); si1 denotes acceleration, si0 denotes constant velocity, si-1 represents deceleration; i denotes the number of speed sequence points.
A local minimum entropy unit for calculating the ith velocity sequence point s according to formula (3)iLocal minimum entropy E ofj:
Ej=H(SForward)+H(SBackward) (3);
In the formula (3), H (S)Forward) Representing a velocity sequence of points siEntropy of information of a previously signed velocity time series, SForward={s1,s2,…,si};H(SBackward) Representing a velocity sequence of points siInformation entropy of the subsequent symbolized velocity time series, SBackward={si+1,si+2,…,sn}。
And the driving state division point unit is used for obtaining all driving state division points in the symbolized speed time sequence by adopting a variable step length sliding window method according to the symbolized speed time sequence.
The driving state division point unit specifically comprises:
the first subunit is used for setting the minimum length of the window, the maximum length of the window, the window offset step length and the window length; the minimum window length min is 7 seconds, the maximum window length Maxl is 15 seconds, and the window offset step f is 2.
And the second subunit is used for acquiring the preset minimum length Minl of the window, the maximum length Maxl of the window, the window offset step length f and the window length l.
And the third subunit is used for initializing the sequence number j of the speed sequence points and the window length l.
And the fourth subunit is used for obtaining the serial number of the initialized speed sequence point, the initialized window length and the speed sequence point corresponding to the serial number of the initialized speed sequence point according to the serial number of the initialized speed sequence point and the initialized window length.
A fifth subunit, configured to obtain, according to the initialized speed sequence point and the initialized window length, a sequence to be detected, which specifically is: intercepting a signed speed time sequence segment of the window length according to the initialized speed sequence point and the window length, namely recording a sequence to be detected as St={sj,sj+1,…,sj+lL denotes the window length, sj,sj+1,…,sj+lVelocity sequence representing sequence to be detectedAnd (4) point.
A sixth subunit, configured to obtain a first region from the sequence to be detected according to the window offset step length and the window length, where the first region is Se={sj+f,sj+f+1,…,sj+l-f},sj+f,sj+f+1,…,sj+l-fRepresenting the velocity sequence points in the first region.
A seventh sub-unit, configured to calculate minimum local entropy of all velocity sequence points in the first region according to formula (3), and obtain a minimum local entropy set, where the minimum local entropy set is E ═ { E ═ Ej+f,Ej+f+1,...,Ej+l-f-1,Ej+l-f},Ej+f,Ej+f+1,…,Ej+l-f-1,Ej+l-fRepresenting the minimum local entropy of all velocity sequence points in the first region.
And the eighth subunit is used for calculating the minimum local entropy in the minimum local entropy set according to the formula (4).
Es=min{Ej+f,Ej+f+1,...,Ej+l-f-1,Ej+l-f} (4)
E in formula (4)sRepresenting the minimum local entropy.
And the ninth subunit is configured to determine whether the minimum local entropy in the first region is unique, and obtain a second determination result. If the second judgment result is yes, executing a tenth subunit; if the second judgment result is negative, the twelfth sub-unit is executed.
And the tenth subunit is configured to determine whether the reduction times of all the minimum local entropies before the speed sequence point corresponding to the minimum local entropy and the increase times of all the minimum local entropies after the speed sequence point corresponding to the minimum local entropy are greater than the preset times p at the same time, and obtain a third determination result. In the present embodiment, the preset number p is 2. If the third judgment result is yes, executing an eleventh subunit; if the third judgment result is negative, the twelfth sub-unit is executed.
An eleventh sub-unit for taking the speed sequence point corresponding to the smallest minimum local entropy as the driving state division point, and executing a twelfth sub-unit.
And the twelfth subunit is used for judging whether the window length is greater than or equal to the maximum length of the window or not and obtaining a fourth judgment result. If the fourth judgment result is yes, executing a thirteenth sub-unit; and if the fourth judgment result is negative, executing a fourteenth subunit.
And the thirteenth subunit is used for updating the serial number j of the speed sequence point, adding 1 to j, enabling the window length l to be equal to the minimum length of the window, and executing the fifteenth subunit.
And a fourteenth subunit, configured to update the window length, add 1 to the window length l, and execute the fifteenth subunit.
And the fifteenth subunit is used for judging whether j + l is greater than or equal to n or not to obtain a fifth judgment result. If the fifth judgment result is yes, executing a sixteenth subunit; and if the fifth judgment result is negative, executing a seventeenth subunit.
And the sixteenth subunit is used for acquiring the driving state division point.
And the seventeenth subunit is used for updating the sequence to be detected according to the updated window length and the updated sequence number of the speed sequence point and executing the sixth subunit.
And the driving behavior segment unit is used for acquiring all driving state segmentation points, and driving behavior data between any two adjacent driving state segmentation points is a segmented driving behavior segment.
And the driving behavior segment characteristic frequency statistical matrix module 204 is used for preprocessing the driving behavior data to obtain a driving behavior segment characteristic frequency statistical matrix.
The driving behavior segment characteristic frequency statistical matrix module 204 specifically includes:
and the normalization unit is used for performing normalization processing on the driving behavior characteristics in the driving behavior data to obtain normalized driving behavior characteristics. The driving behavior characteristics include a speed characteristic, a current characteristic, and an acceleration characteristic.
The characteristic interval unit is used for equally dividing each driving behavior characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the driving behavior characteristic, and specifically equally dividing the normalized speed characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the speed characteristic; equally dividing the normalized current characteristic into 20 characteristic intervals according to the difference value between the maximum value and the minimum value of the current characteristic; the normalized acceleration characteristic is equally divided into 20 characteristic intervals according to the difference between the maximum value and the minimum value of the acceleration characteristic.
And the matrix unit is used for calculating the frequency of the speed, the current and the acceleration in each driving behavior segment in each characteristic interval by a counting statistical method to obtain a frequency matrix. The frequency matrix is a 1 × 60 matrix including 3 driving behavior features and 20 feature intervals per driving behavior feature.
And the driving behavior segment characteristic frequency statistical matrix unit is used for obtaining frequency matrixes of all driving behavior segments after counting all the driving behavior segments, and the frequency matrixes of all the driving behavior segments are combined into a driving behavior segment characteristic frequency statistical matrix. The statistical matrix of the characteristic frequency of the driving behavior segments is a matrix D multiplied by 60, and D represents the number of the driving behavior segments.
An initialization module 205, configured to obtain an LDA model and initialize the number of clusters of the LDA model fig. 4 is a structural diagram of the LDA model provided in an embodiment of the present invention, where in fig. 4, α represents a hyper-parameter of Dirichlet (Dirichlet) distribution of a topic under each driving behavior segment, θdRepresenting the distribution of driving behavior categories under the d-th driving behavior segment; phi is akA distribution representing driving behavior characteristics under a kth topic; z is a radical ofd,mRepresenting the driving behavior category of the jth speed sequence point in the jth driving behavior segment; omegad,mRepresenting the driving behavior category corresponding to the jth sequence point in the d driving behavior segment which is finally generated; mdRepresenting the number of 3 driving behavior features of speed, current and acceleration, D representing the number of driving behavior segments, D representing the sequence number of the driving behavior segments, D ∈ D, K representing the number of potential driving behavior classes-topics, K representing the topic sequence number, K ∈ K, β representing the hyper-parameter of Dirichlet distribution of physical features under each topic in the LDA model, each driving behavior segment is constructed as a combination of K topics, assuming K is a known number and does not changeTwo parameters of category distribution and driving behavior category-driving behavior feature distribution need to be inferred through training and learning; the embodiment adopts the EM algorithm to train and learn the LDA model.
The first driving behavior category module 206 is configured to input the characteristic frequency statistical matrix of the driving behavior segment into the LDA model to obtain a first driving behavior category.
And the confusion degree module 207 is used for calculating the confusion degree of the LDA model according to the first driving behavior category and recording the confusion degree and the LDA model corresponding to the confusion degree.
The confusion module 207 specifically includes:
a confusion unit for calculating a confusion according to equation (5):
in the formula (5), p (D) represents the degree of confusion; d represents the number of the driving behavior segments, D represents the sequence number of the driving behavior segments, and D belongs to D; p (omega)d) Representing the occurrence probability of each driving behavior feature in each driving behavior segment; p (omega)d) P (z | d) × p (ω | z), p (z | d) representing the probability of occurrence of each driving behavior category in each driving behavior segment, i.e., θd(ii) a p (ω | z) represents the probability of occurrence of each driving behavior feature in each driving behavior class, i.e., φk;RdRepresenting the total length of the driving behavior segment.
The first judging module 208 is configured to judge whether the clustering number of the LDA model is equal to a preset number, so as to obtain a first judgment result; if the first judgment result is negative, executing an updating module; the first determination is yes, and the minimum confusion module is executed.
And the updating module 209 is used for increasing the clustering number of the LDA models, executing the first driving behavior classification module, and updating and recording the confusion degree and the LDA models corresponding to the confusion degree.
A minimum confusion module 210 for comparing the confusion of all records to obtain the minimum confusion.
And an LDA model module 211, configured to obtain an LDA model corresponding to the minimum confusion degree according to the minimum confusion degree. In this embodiment, the number of clusters of the LDA model corresponding to the minimum confusion degree is 13.
The driving behavior classification module 212 is configured to input the characteristic frequency statistical matrix of the driving behavior segment into the LDA model corresponding to the minimum confusion degree, so as to obtain a driving behavior classification; and the LDA model corresponding to the minimum confusion degree is input into each driving behavior segment, so that the probability of each driving behavior segment corresponding to each driving behavior category can be obtained.
4 driving behavior states can be obtained by the driving behavior recognition system of the present embodiment: acceleration behavior, deceleration behavior, uniform speed behavior, and idle behavior, 13 driving behavior categories: low-speed slow acceleration, medium-speed fast acceleration, medium-low speed slow acceleration, medium-low speed-medium speed fast acceleration, medium-speed braking deceleration, low-speed deceleration, medium-low speed deceleration, medium-high speed uniform speed, high-speed uniform speed, medium-low speed uniform speed, medium-speed uniform speed and idle speed.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.