Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a PCANet-based method for extracting electrocardiographic features, which is robust to noise, does not require removal of heart beat noise during preprocessing, does not require equalization of the number of heart beats, can obtain a better classification effect, and reduces the pressure of a doctor on analyzing an illness state through electrocardiography.
The technical scheme adopted by the invention for solving the problems is as follows:
an electrocardio-feature extraction method based on PCANet comprises the following steps: s10, preprocessing the electrocardiogram to obtain a training set and a to-be-classified set, S20, respectively extracting the heartbeat features of the training set and the to-be-classified set by using PCANet, and S30, training a classifier by using the heartbeat features extracted by the training set and using the classifier for classifying the heartbeat features of the to-be-classified set;
the step S10 includes:
s11, detecting R wave peak points of the electrocardiogram signals and taking the R wave peak points as reference points to respectively intercept a certain number of sampling points in the front and back direction as a single heartbeat;
s12, dividing the whole electrocardiogram into a plurality of single heartbeats;
s13, normalizing the single-heart beat amplitude;
s14, dividing the normalized single-heart beat into a training set and a to-be-classified set;
the step S20 includes:
s21, performing second-order convolution processing on the training set and/or the to-be-classified set by using a PCA algorithm to obtain an output matrix corresponding to the heartbeat;
s22, carrying out binary hash coding and block histogram processing on the output matrix of the training set and/or the heart beat of the set to be classified to obtain the feature vector of the training set and/or the heart beat of the set to be classified;
the step S30 includes:
s31, training a classifier by using the feature vector of the training set heartbeat;
and S32, inputting the feature vectors of the heart beats of the to-be-classified set into a trained classifier for classification and outputting a classification result.
Further, in the step S11, the number of the truncated sampling points depends on the sampling frequency.
Further, in step S21, the performing the second-order convolution processing by using the PCA algorithm includes:
extraction of L1A first layer PCA filter;
mixing L with1Performing first convolution processing on the first layer of PCA filter and each heart beat matrix to obtain L1A local feature matrix;
extraction of L2A second layer PCA filter;
mixing L with2Performing a second convolution process on the second layer of PCA filter and the local feature matrix to obtain L2A secondary local feature matrix.
Further, when extracting the features of the sample to be classified, the first-layer PCA filter and the second-layer PCA filter extracted through the training set are directly applied.
Further, the method for extracting the PCA filter comprises the following steps:
reconstructing the cardiac beat vector into a cardiac beat matrix;
centralizing the heart beat matrix;
using the centralized heartbeat matrix to construct a matrix to be processed;
and performing principal component analysis on the matrix to be processed to obtain the PCA filter.
Further, the PCA filter is represented as follows:
wherein XXTCovariance matrix of X, ql() Extracting eigenvectors of the matrix in brackets, matk1,k2() And respectively reconstructing vectors in brackets into matrixes, wherein the matrixes are the PCA filters.
Further, the binary hash encoding includes: and carrying out binarization on all the matrixes in the primary local characteristic matrix group, carrying out hash coding and decimal by a hash function, and combining into a matrix.
Further, the block histogram processing includes: block size selection, overlapping coefficient selection, vector conversion, histogram statistics and vector connection to obtain the feature vector of the heart beat.
Further, the classifiers include, but are not limited to, linear SVM, KNN classifier, BP neural network classifier, and random forest.
The invention has the beneficial effects that: the PCANet-based electrocardio-feature extraction method has stronger robustness to noise, and can obtain better classification effect by extracting the features of the heart beats without removing the noise and inputting the heart beats into the classifier; the heart beats with unbalanced number can be classified more accurately; the dimension of the heart beat features is improved by two orders of magnitude, and then the classifier is adopted to classify the high-dimension heart beat features to obtain a considerable effect.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments. The system and method of game graphic requirements and design of the present invention is applicable to the development of game animation images.
Referring to fig. 1, the method for extracting the electrocardio-features based on the PCANet comprises the following steps: s10, preprocessing the electrocardiogram to obtain a training set and a to-be-classified set, S20, respectively extracting the heartbeat features of the training set and the to-be-classified set by using PCANet, and S30, training a classifier by using the heartbeat features extracted by the training set and using the classifier for classifying the heartbeat features of the to-be-classified set;
the step S10 includes:
s11, detecting R wave peak points of the electrocardiogram signals and taking the R wave peak points as reference points to respectively intercept a certain number of sampling points in the front and back direction as a single heartbeat;
s12, dividing the whole electrocardiogram into a plurality of single heartbeats;
s13, normalizing the single-heart beat amplitude;
s14, dividing the normalized single-heart beat into a training set and a to-be-classified set;
the step S20 includes:
s21, performing second-order convolution processing on the training set and/or the to-be-classified set by using a PCA algorithm to obtain an output matrix corresponding to the heartbeat;
s22, carrying out binary hash coding and block histogram processing on the output matrix of the training set and/or the heart beat of the set to be classified to obtain the feature vector of the training set and/or the heart beat of the set to be classified;
the step S30 includes:
s31, training a classifier by using the feature vector of the training set heartbeat;
and S32, inputting the feature vectors of the heart beats of the to-be-classified set into a trained classifier for classification and outputting a classification result.
In order to make the description more specific, the present embodiment describes the invention by referring to some specific data and formulas.
Referring to fig. 2 and 2-a through 2-D, for the pre-processing stage, it includes:
an electrocardiogram is detected and segmented into a number of single heart beats:
in this embodiment, a large number of ECG signals of patients or two leads of MLII and V5 in MIT-BIH related database are used as experimental subjects, and the sampling frequency is 360 to 250 Hz. In the embodiment, a Pan-Tompkins algorithm is adopted to perform R wave detection on a large number of ECG signals, and the positions of the wave peaks of the R waves are labeled; all R wave peak points on the ECG signal are taken as the middle points of each heart beat, the 149 th sample point on the left side of the R wave peak point of each heart beat is set as the starting point of the corresponding heart beat, the 150 th heart beat on the right side of the R wave peak point is set as the terminal point of the corresponding heart beat, and a total of 300 points from the starting point to the terminal point are taken as a heart beat vector sample.
And (3) carrying out heart beat vector normalization treatment:
for the normalization process, the amplitude of all the heart beat curves is reduced to be between 0 and 1, the method can improve the final classification result to a certain extent, and the normalization method adopted in the embodiment is standardized by Min-Max.
Wherein y is the original value of a certain point in the heart beat, min is the minimum value in one heart beat, max is the maximum value in the heart beat, and x is the normalized amplitude of the point.
Establishing a training set and a to-be-classified set:
classifying the normalized heartbeats according to labels, taking out five types of heartbeats according to AAMI standard, wherein each type of heartbeats has a certain difference in quantity, taking out a small part (about one tenth) of each type of heartbeats, and establishing one tenth of the five types of heartbeats as a training set heart beat, wherein the training set heart beat is used for generating a filter and training a classifier in the next characteristic extraction step. And (4) building the rest heartbeats into a to-be-classified set, wherein the heartbeats of the to-be-classified set are used for classification of the classifier.
Referring to fig. 3-5, the feature extraction stage for the training set heartbeat includes:
referring to fig. 3, before extracting the first-layer PCA filter, all the cardiac vectors need to be reconstructed into a cardiac matrix, in this embodiment, it is assumed that N cardiac vectors are provided, each cardiac vector contains 300 sampling points, and the cardiac vectors are reconstructed into a cardiac matrix of m × N size, where the reconstruction method is as follows:
and performing the processing method on all the cardiac beat vectors to finally obtain N cardiac beat matrixes.
And (3) extracting a first-layer PCA filter by using a principal component analysis (PCA algorithm), wherein the extraction process of each heartbeat matrix is as follows:
first-stage block sampling:
sliding the heart beat matrix by the step length of 1 by using a window p1 with the size of k1xk2, extracting the matrix obtained by each sliding, namely finally obtaining 300 sampling block matrixes, and cascading the sampling block matrixes, wherein the sampling block matrix used as the ith heart beat matrix is represented as follows:
first-stage centralization:
carrying out zero equalization on 300 sampling block matrixes of the ith heartbeat matrix, namely, obtaining the average value of all elements in each sampling block matrix after averaging is 0, and carrying out zero equalization on the ith heartbeat matrix to obtain 300 sampling blocks to be processed, wherein the zero equalization comprises the following specific steps:
the first stage reconstructs and combines:
carrying out block sampling and centralization processing on all the heart beat matrixes to obtain N groups of 300 sampling blocks to be processed, reconstructing each group of sampling blocks to be processed into a vector form again, and combining the sampling blocks into the heart beat matrixes to be processed containing 300N columns of vectors:
obtaining a first layer PCA filter:
assuming that the number of PCA filters in the first layer is L1The objective of the PCA algorithm is to minimize the reconstruction error by finding a series of orthonormal matrices:
the solution to this problem is the classical principal component analysis, i.e. the first L of the covariance matrix of matrix X1A characteristic directionThen, each feature vector is subjected to reconstruction processing to obtain L1A first layer PCA filter represented as follows:
referring to FIG. 5, L is obtained1After the first layer of PCA filter, performing convolution processing on the first layer of PCA filter and each heartbeat matrix for the first time, wherein the convolution processing is as follows:
wherein gamma is
iIs a matrix of the heart beat,
is a local feature matrix. After this step, each original cardiac matrix is mapped to L
1A new matrix named as local feature matrix L
1The set of new matrices is a local feature matrix set, so that N heartbeats will eventually be mapped to N local feature matrix sets.
Referring to fig. 4, extracting a second-layer PCA filter includes:
second-stage block sampling:
the process is similar to the first-stage block sampling process, and the sampling objects are all L obtained after the first convolution processing of all heartbeats1And (3) obtaining a plurality of local feature matrixes, wherein the sampling window p2 is r1xr2 in size, and finally obtaining a plurality of sampling block matrixes.
Second stage centralization:
centralizing all the sampling block matrixes obtained in the previous step, wherein the processing method is consistent with the method of centralizing in the first stage, and is not described again; finally obtaining a certain amount of sample blocks to be processed,
where m and n are the number of rows and columns, respectively, of the block of samples.
And second-stage reconstruction and recombination:
reconstructing each sampling block to be processed to obtain sampling vectors, combining all the obtained sampling vectors into a matrix to be processed,
the local characteristic matrix groups of all heartbeats are processed to obtain
Acquiring a second-layer PCA filter:
consistent with the method for obtaining the first-layer PCA filter, by selecting the eigenvector composition corresponding to the covariance matrix,
wherein YYTCovariance matrix of Y, qλ() Extracting eigenvectors of the matrix in brackets, matk1,k2() Reconstructing vectors in brackets into matrixes respectively, wherein the matrixes are the PCA filters of the second layer, and obtaining L2And a second layer PCA filter.
Referring to FIG. 5, L is obtained2After the second layer of PCA filter, performing second convolution processing on the second layer of PCA filter and each local feature matrix, wherein the second convolution processing is as follows:
wherein
For the ith local feature matrix to pass through
And convolving the obtained secondary local feature matrix group.
Referring to fig. 6, the PCANet output layer process includes:
binary hash encoding:
performing binarization processing on all the secondary local feature matrixes, and performing binarization processing on one secondary local feature group to obtain L2Binarizing the sub-local feature matrix, and then encoding L by Hash2Converting the local feature matrix of the binary order into a decimal matrix, wherein the formula is as follows:
where the H () function, similar to a unit step function, maps element values to 0 and 1,
for a decimal matrix, the function is to
2Each element of the sub-local feature matrix is 0 to 2
λ-1In the same way, the obtained elements only contain integers and zeros, and L is added
2The corresponding elements of each matrix are added to obtain a decimal matrix.
Processing a block histogram:
mixing L with1All the sub-local feature groups are subjected to binary Hash coding to obtain L1Firstly, carrying out block sampling on each decimal matrix under a certain overlapping coefficient, converting the obtained sampling blocks into sampling vectors, combining all the sampling vectors obtained by one decimal matrix into a matrix to be processed, dividing each matrix to be processed into B blocks, processing the matrix by adopting a histogram statistical method, and converting the characteristic matrix into a vector form. Finally, L is added1L mapped by decimal matrix1The vectors are cascaded to form a characteristic long vector which is the extracted heart beat characteristic; the formula is as follows:
the feature extraction method for the cardioid of the to-be-classified set is basically consistent with that of the training set, and a specific method is not repeated, but the first-layer PCA filter and the second-layer PCA filter extracted through the training set can be directly applied when the features of the to-be-classified sample are extracted, so that the PCA filter extraction step does not need to be executed again when the features of the to-be-classified set are extracted.
In the embodiment, a linear SVM, a KNN classifier, a BP neural network classifier and a random forest multiple classifier are adopted to verify the feature extraction effect of the invention.
According to the heart beat category standard specified by the American Association for medical instrumentation and advancement (AAMI), 15 heart beats are integrated into 5 heart beats, namely N, S, V, F and Q, and the contents of the five heart beats are shown in Table 1.
TABLE 1 AAMI assign centering beat class division Table
In this embodiment, noisy ECG signals of MIT-BIH database are used to test the classification effect of the present invention, and table 2 shows the number of heart beats of each type after preprocessing, and the total number of heart beats is 107168. It is evident from table 2 that the number of heartbeats in each category is severely unbalanced, with the least F heartbeats being less than one percent of the N heartbeats. In this embodiment, 10 classification experiments are performed, about one tenth of heartbeats is randomly extracted from each classification to form a training set to train the classifier, the total number of heartbeats is 10715, and the rest heartbeats are used as the classification set to test the classification effect.
Categories
|
Total number of heart beats
|
Training set of heart beat counts
|
Heart beat number of classification collection
|
N
|
90411
|
9041
|
89370
|
S
|
2778
|
277
|
2501
|
V
|
7227
|
722
|
6505
|
F
|
802
|
80
|
722
|
Q
|
5950
|
595
|
5355
|
Total of
|
107168
|
10715
|
96453 |
TABLE 2 Heart beat sample number table
When various classifiers are used for classification, the parameter configuration is the same, as shown in table 3.
Item
|
Parameter(s)
|
Heart beat matrix size
|
15x20
|
First stage PCA Filter count
|
9
|
Second stage PCA Filter count
|
9
|
Block sample size in PCA filter extraction process
|
7x7
|
Block histogram processing stage block sample size
|
7x7
|
Block histogram processing stage block sample overlap rate
|
0.5 |
Table 3 table for configuring classifier parameters of this embodiment
The obtained classification result is shown in fig. 7, and it can be seen that the method for extracting the electrocardio-features based on the PCANet has robustness on noise, does not need to remove the noise when performing the cardiac beat feature extraction and classification, can prevent the adverse effect of the noise removal process on the electrocardio-feature extraction, and has significant advantages when classifying the noisy cardiac beats; and the classification effect on the unbalanced heartbeat is also better.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and the present invention shall fall within the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.