CN108596142B

CN108596142B - PCANet-based electrocardiogram feature extraction method

Info

Publication number: CN108596142B
Application number: CN201810434968.3A
Authority: CN
Inventors: 司玉娟; 杨维熠; 王迪; 刘奇; 郎六琪
Original assignee: Jilin University; Zhuhai College of Jilin University
Current assignee: Jilin University; Zhuhai College of Jilin University
Priority date: 2018-05-09
Filing date: 2018-05-09
Publication date: 2022-01-11
Anticipated expiration: 2038-05-09
Also published as: CN108596142A; WO2019214026A1

Abstract

The technical solution of the present invention includes an ECG feature extraction method based on PCANet, which is characterized in that it includes the following steps: S10, performing preprocessing on the electrocardiogram to obtain a training set and a set to be classified, and S20, using PCANet to separate the training set and the set to be classified. Performing heartbeat feature extraction on the classification set, S30, using the heartbeat feature extracted from the training set to train a classifier and use it for classifying the heartbeat feature of the set to be classified; the beneficial effects of the present invention are: robustness to the noise of the electrocardiogram signal, Simplifies the steps of noise removal, has a better classification effect on unbalanced heartbeats, improves the efficiency and accuracy of ECG feature extraction, reduces the pressure of doctors to identify ECGs, and reduces the probability of misdiagnosis by doctors.

Description

PCANet-based electrocardiogram feature extraction method

Technical Field

The invention relates to a PCANet-based electrocardiogram feature extraction method, and belongs to the field of medical signal processing.

Background

At present, with the development of computer technology, pattern recognition techniques such as data mining and deep learning have been gradually applied to medical signal processing. The currently known fields of pattern recognition include the technical fields of electrocardiograms, electroencephalograms, medical image processing and the like. In the field of electrocardiogram, an electrocardiogram auxiliary diagnosis and treatment device has been developed greatly, which can dig deep information in electrocardiogram and perform efficient automatic identification.

The electrocardiogram automatic identification technology comprises three steps of preprocessing, feature extraction and classification, namely, deep features are mined by the preprocessed heart beat through the feature extraction step, and the features are identified by using a classifier. The feature extraction step is particularly important and is also the key point of the invention.

However, before feature extraction, electrocardiographic signal preprocessing is often required, and denoising in the process is particularly important. Firstly, when we sample by heart beat through instrument, the waveform of heart beat obtained by us has noise pollution due to the influence of other organs in human body, generally, the noise of Electrocardiogram (ECG) signal includes baseline drift noise, power frequency noise, etc. in the prior art, necessary measures are needed to remove the noise from heart beat in the preprocessing stage, there are many noise removing methods such as wavelet analysis and median filter, but the removal of noise inevitably causes the loss of useful information in signal. Secondly, although many classification algorithms exist today, better evaluation criteria can be obtained by training and classifying a plurality of heart beats with balanced quantity, when training heart beats with unbalanced sampling quantity, the heart beats with the superior quantity often have significant classification advantages, and the heart beat identification accuracy rate with less training is low. Therefore, the invention provides a heart beat feature extraction method aiming at the robustness of the noise imbalance and the robustness of the sample quantity imbalance. .

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a PCANet-based method for extracting electrocardiographic features, which is robust to noise, does not require removal of heart beat noise during preprocessing, does not require equalization of the number of heart beats, can obtain a better classification effect, and reduces the pressure of a doctor on analyzing an illness state through electrocardiography.

The technical scheme adopted by the invention for solving the problems is as follows:

an electrocardio-feature extraction method based on PCANet comprises the following steps: s10, preprocessing the electrocardiogram to obtain a training set and a to-be-classified set, S20, respectively extracting the heartbeat features of the training set and the to-be-classified set by using PCANet, and S30, training a classifier by using the heartbeat features extracted by the training set and using the classifier for classifying the heartbeat features of the to-be-classified set;

the step S10 includes:

s11, detecting R wave peak points of the electrocardiogram signals and taking the R wave peak points as reference points to respectively intercept a certain number of sampling points in the front and back direction as a single heartbeat;

s12, dividing the whole electrocardiogram into a plurality of single heartbeats;

s13, normalizing the single-heart beat amplitude;

s14, dividing the normalized single-heart beat into a training set and a to-be-classified set;

the step S20 includes:

s21, performing second-order convolution processing on the training set and/or the to-be-classified set by using a PCA algorithm to obtain an output matrix corresponding to the heartbeat;

s22, carrying out binary hash coding and block histogram processing on the output matrix of the training set and/or the heart beat of the set to be classified to obtain the feature vector of the training set and/or the heart beat of the set to be classified;

the step S30 includes:

s31, training a classifier by using the feature vector of the training set heartbeat;

and S32, inputting the feature vectors of the heart beats of the to-be-classified set into a trained classifier for classification and outputting a classification result.

Further, in the step S11, the number of the truncated sampling points depends on the sampling frequency.

Further, in step S21, the performing the second-order convolution processing by using the PCA algorithm includes:

extraction of L₁A first layer PCA filter;

mixing L with₁Performing first convolution processing on the first layer of PCA filter and each heart beat matrix to obtain L₁A local feature matrix;

extraction of L₂A second layer PCA filter;

mixing L with₂Performing a second convolution process on the second layer of PCA filter and the local feature matrix to obtain L₂A secondary local feature matrix.

Further, when extracting the features of the sample to be classified, the first-layer PCA filter and the second-layer PCA filter extracted through the training set are directly applied.

Further, the method for extracting the PCA filter comprises the following steps:

reconstructing the cardiac beat vector into a cardiac beat matrix;

centralizing the heart beat matrix;

using the centralized heartbeat matrix to construct a matrix to be processed;

and performing principal component analysis on the matrix to be processed to obtain the PCA filter.

Further, the PCA filter is represented as follows:

wherein XX^TCovariance matrix of X, q_l() Extracting eigenvectors of the matrix in brackets, mat_k1,k2() And respectively reconstructing vectors in brackets into matrixes, wherein the matrixes are the PCA filters.

Further, the binary hash encoding includes: and carrying out binarization on all the matrixes in the primary local characteristic matrix group, carrying out hash coding and decimal by a hash function, and combining into a matrix.

Further, the block histogram processing includes: block size selection, overlapping coefficient selection, vector conversion, histogram statistics and vector connection to obtain the feature vector of the heart beat.

Further, the classifiers include, but are not limited to, linear SVM, KNN classifier, BP neural network classifier, and random forest.

The invention has the beneficial effects that: the PCANet-based electrocardio-feature extraction method has stronger robustness to noise, and can obtain better classification effect by extracting the features of the heart beats without removing the noise and inputting the heart beats into the classifier; the heart beats with unbalanced number can be classified more accurately; the dimension of the heart beat features is improved by two orders of magnitude, and then the classifier is adopted to classify the high-dimension heart beat features to obtain a considerable effect.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a schematic of the pretreatment of the present invention;

FIG. 2-A is a schematic electrocardiogram of FIG. 2-A of FIG. 2 in accordance with the present invention;

FIG. 2-B is a schematic electrocardiogram of FIG. 2-B of the present invention;

FIG. 2-C is a schematic electrocardiogram of FIG. 2-C of the present invention;

FIG. 2-D is a schematic electrocardiogram of FIG. 2-D of the present invention;

FIG. 3 is a schematic diagram of a first layer PCA filter setup of the present invention;

FIG. 4 is a schematic diagram of a second layer PCA filter setup of the present invention;

FIG. 5 is a schematic diagram of the two convolution process of the present invention;

FIG. 6 is a schematic diagram of the heartbeat feature output process of the present invention;

fig. 7 is a diagram of the classification result according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments. The system and method of game graphic requirements and design of the present invention is applicable to the development of game animation images.

Referring to fig. 1, the method for extracting the electrocardio-features based on the PCANet comprises the following steps: s10, preprocessing the electrocardiogram to obtain a training set and a to-be-classified set, S20, respectively extracting the heartbeat features of the training set and the to-be-classified set by using PCANet, and S30, training a classifier by using the heartbeat features extracted by the training set and using the classifier for classifying the heartbeat features of the to-be-classified set;

the step S10 includes:

s13, normalizing the single-heart beat amplitude;

the step S20 includes:

the step S30 includes:

In order to make the description more specific, the present embodiment describes the invention by referring to some specific data and formulas.

Referring to fig. 2 and 2-a through 2-D, for the pre-processing stage, it includes:

an electrocardiogram is detected and segmented into a number of single heart beats:

in this embodiment, a large number of ECG signals of patients or two leads of MLII and V5 in MIT-BIH related database are used as experimental subjects, and the sampling frequency is 360 to 250 Hz. In the embodiment, a Pan-Tompkins algorithm is adopted to perform R wave detection on a large number of ECG signals, and the positions of the wave peaks of the R waves are labeled; all R wave peak points on the ECG signal are taken as the middle points of each heart beat, the 149 th sample point on the left side of the R wave peak point of each heart beat is set as the starting point of the corresponding heart beat, the 150 th heart beat on the right side of the R wave peak point is set as the terminal point of the corresponding heart beat, and a total of 300 points from the starting point to the terminal point are taken as a heart beat vector sample.

And (3) carrying out heart beat vector normalization treatment:

for the normalization process, the amplitude of all the heart beat curves is reduced to be between 0 and 1, the method can improve the final classification result to a certain extent, and the normalization method adopted in the embodiment is standardized by Min-Max.

Wherein y is the original value of a certain point in the heart beat, min is the minimum value in one heart beat, max is the maximum value in the heart beat, and x is the normalized amplitude of the point.

Establishing a training set and a to-be-classified set:

classifying the normalized heartbeats according to labels, taking out five types of heartbeats according to AAMI standard, wherein each type of heartbeats has a certain difference in quantity, taking out a small part (about one tenth) of each type of heartbeats, and establishing one tenth of the five types of heartbeats as a training set heart beat, wherein the training set heart beat is used for generating a filter and training a classifier in the next characteristic extraction step. And (4) building the rest heartbeats into a to-be-classified set, wherein the heartbeats of the to-be-classified set are used for classification of the classifier.

Referring to fig. 3-5, the feature extraction stage for the training set heartbeat includes:

referring to fig. 3, before extracting the first-layer PCA filter, all the cardiac vectors need to be reconstructed into a cardiac matrix, in this embodiment, it is assumed that N cardiac vectors are provided, each cardiac vector contains 300 sampling points, and the cardiac vectors are reconstructed into a cardiac matrix of m × N size, where the reconstruction method is as follows:

and performing the processing method on all the cardiac beat vectors to finally obtain N cardiac beat matrixes.

And (3) extracting a first-layer PCA filter by using a principal component analysis (PCA algorithm), wherein the extraction process of each heartbeat matrix is as follows:

first-stage block sampling:

sliding the heart beat matrix by the step length of 1 by using a window p1 with the size of k1xk2, extracting the matrix obtained by each sliding, namely finally obtaining 300 sampling block matrixes, and cascading the sampling block matrixes, wherein the sampling block matrix used as the ith heart beat matrix is represented as follows:

first-stage centralization:

carrying out zero equalization on 300 sampling block matrixes of the ith heartbeat matrix, namely, obtaining the average value of all elements in each sampling block matrix after averaging is 0, and carrying out zero equalization on the ith heartbeat matrix to obtain 300 sampling blocks to be processed, wherein the zero equalization comprises the following specific steps:

the first stage reconstructs and combines:

carrying out block sampling and centralization processing on all the heart beat matrixes to obtain N groups of 300 sampling blocks to be processed, reconstructing each group of sampling blocks to be processed into a vector form again, and combining the sampling blocks into the heart beat matrixes to be processed containing 300N columns of vectors:

obtaining a first layer PCA filter:

assuming that the number of PCA filters in the first layer is L₁The objective of the PCA algorithm is to minimize the reconstruction error by finding a series of orthonormal matrices:

the solution to this problem is the classical principal component analysis, i.e. the first L of the covariance matrix of matrix X₁A characteristic directionThen, each feature vector is subjected to reconstruction processing to obtain L₁A first layer PCA filter represented as follows:

referring to FIG. 5, L is obtained₁After the first layer of PCA filter, performing convolution processing on the first layer of PCA filter and each heartbeat matrix for the first time, wherein the convolution processing is as follows:

wherein gamma is_iIs a matrix of the heart beat,

is a local feature matrix. After this step, each original cardiac matrix is mapped to L₁A new matrix named as local feature matrix L₁The set of new matrices is a local feature matrix set, so that N heartbeats will eventually be mapped to N local feature matrix sets.

Referring to fig. 4, extracting a second-layer PCA filter includes:

second-stage block sampling:

the process is similar to the first-stage block sampling process, and the sampling objects are all L obtained after the first convolution processing of all heartbeats₁And (3) obtaining a plurality of local feature matrixes, wherein the sampling window p2 is r1xr2 in size, and finally obtaining a plurality of sampling block matrixes.

Second stage centralization:

centralizing all the sampling block matrixes obtained in the previous step, wherein the processing method is consistent with the method of centralizing in the first stage, and is not described again; finally obtaining a certain amount of sample blocks to be processed,

where m and n are the number of rows and columns, respectively, of the block of samples.

And second-stage reconstruction and recombination:

reconstructing each sampling block to be processed to obtain sampling vectors, combining all the obtained sampling vectors into a matrix to be processed,

the local characteristic matrix groups of all heartbeats are processed to obtain

Acquiring a second-layer PCA filter:

consistent with the method for obtaining the first-layer PCA filter, by selecting the eigenvector composition corresponding to the covariance matrix,

wherein YY^TCovariance matrix of Y, q_λ() Extracting eigenvectors of the matrix in brackets, mat_k1,k2() Reconstructing vectors in brackets into matrixes respectively, wherein the matrixes are the PCA filters of the second layer, and obtaining L₂And a second layer PCA filter.

Referring to FIG. 5, L is obtained₂After the second layer of PCA filter, performing second convolution processing on the second layer of PCA filter and each local feature matrix, wherein the second convolution processing is as follows:

wherein

For the ith local feature matrix to pass through

And convolving the obtained secondary local feature matrix group.

Referring to fig. 6, the PCANet output layer process includes:

binary hash encoding:

performing binarization processing on all the secondary local feature matrixes, and performing binarization processing on one secondary local feature group to obtain L₂Binarizing the sub-local feature matrix, and then encoding L by Hash₂Converting the local feature matrix of the binary order into a decimal matrix, wherein the formula is as follows:

where the H () function, similar to a unit step function, maps element values to 0 and 1,

for a decimal matrix, the function is to₂Each element of the sub-local feature matrix is 0 to 2^λ-1In the same way, the obtained elements only contain integers and zeros, and L is added₂The corresponding elements of each matrix are added to obtain a decimal matrix.

Processing a block histogram:

mixing L with₁All the sub-local feature groups are subjected to binary Hash coding to obtain L₁Firstly, carrying out block sampling on each decimal matrix under a certain overlapping coefficient, converting the obtained sampling blocks into sampling vectors, combining all the sampling vectors obtained by one decimal matrix into a matrix to be processed, dividing each matrix to be processed into B blocks, processing the matrix by adopting a histogram statistical method, and converting the characteristic matrix into a vector form. Finally, L is added₁L mapped by decimal matrix₁The vectors are cascaded to form a characteristic long vector which is the extracted heart beat characteristic; the formula is as follows:

the feature extraction method for the cardioid of the to-be-classified set is basically consistent with that of the training set, and a specific method is not repeated, but the first-layer PCA filter and the second-layer PCA filter extracted through the training set can be directly applied when the features of the to-be-classified sample are extracted, so that the PCA filter extraction step does not need to be executed again when the features of the to-be-classified set are extracted.

In the embodiment, a linear SVM, a KNN classifier, a BP neural network classifier and a random forest multiple classifier are adopted to verify the feature extraction effect of the invention.

According to the heart beat category standard specified by the American Association for medical instrumentation and advancement (AAMI), 15 heart beats are integrated into 5 heart beats, namely N, S, V, F and Q, and the contents of the five heart beats are shown in Table 1.

TABLE 1 AAMI assign centering beat class division Table

In this embodiment, noisy ECG signals of MIT-BIH database are used to test the classification effect of the present invention, and table 2 shows the number of heart beats of each type after preprocessing, and the total number of heart beats is 107168. It is evident from table 2 that the number of heartbeats in each category is severely unbalanced, with the least F heartbeats being less than one percent of the N heartbeats. In this embodiment, 10 classification experiments are performed, about one tenth of heartbeats is randomly extracted from each classification to form a training set to train the classifier, the total number of heartbeats is 10715, and the rest heartbeats are used as the classification set to test the classification effect.

Categories	Total number of heart beats	Training set of heart beat counts	Heart beat number of classification collection
				N	90411	9041	89370
S	2778	277	2501
				V	7227	722	6505
F	802	80	722
				Q	5950	595	5355
Total of	107168	10715	96453

TABLE 2 Heart beat sample number table

When various classifiers are used for classification, the parameter configuration is the same, as shown in table 3.

Item	Parameter(s)
		Heart beat matrix size	15x20
First stage PCA Filter count	9
		Second stage PCA Filter count	9
Block sample size in PCA filter extraction process	7x7
		Block histogram processing stage block sample size	7x7
Block histogram processing stage block sample overlap rate	0.5

Table 3 table for configuring classifier parameters of this embodiment

The obtained classification result is shown in fig. 7, and it can be seen that the method for extracting the electrocardio-features based on the PCANet has robustness on noise, does not need to remove the noise when performing the cardiac beat feature extraction and classification, can prevent the adverse effect of the noise removal process on the electrocardio-feature extraction, and has significant advantages when classifying the noisy cardiac beats; and the classification effect on the unbalanced heartbeat is also better.

The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and the present invention shall fall within the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims

1. An electrocardio-feature extraction method based on PCANet is characterized by comprising the following steps: s10, preprocessing the electrocardiogram to obtain a training set and a to-be-classified set, S20, respectively extracting the heartbeat features of the training set and the to-be-classified set by using PCANet, and S30, training a classifier by using the heartbeat features extracted by the training set and using the classifier for classifying the heartbeat features of the to-be-classified set;

the step S10 includes:

s13, normalizing the single-heart beat amplitude;

the step S20 includes:

s21, performing second-order convolution processing on the training set and the to-be-classified set by using a PCA algorithm to obtain an output matrix corresponding to the heartbeat;

s22, carrying out binary Hash coding and block histogram processing on the output matrix of the training set and the heart beat of the set to be classified to obtain the feature vectors of the training set and the heart beat of the set to be classified;

the step S30 includes:

s32, inputting the feature vectors of the heart beats of the collection to be classified into a trained classifier for classification and outputting a classification result;

in step S21, the performing of the second-order convolution process using the PCA algorithm includes:

extraction of L₁A first layer PCA filter;

extraction of L₂A second layer PCA filter;

2. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: in step S11, the number of truncated samples depends on the sampling frequency.

3. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: when extracting the characteristics of the sample to be classified, the first-layer PCA filter and the second-layer PCA filter extracted through the training set are directly applied.

4. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: the method for extracting the PCA filter comprises the following steps:

reconstructing the cardiac beat vector into a cardiac beat matrix;

centralizing the heart beat matrix;

using the centralized heartbeat matrix to construct a matrix to be processed;

5. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: the PCA filter is represented as follows:

wherein XX^TCovariance matrix of X, q_l() Extracting eigenvectors of the matrix in brackets, mat_k1,k2() The vectors in brackets are respectively reconstructed into matrixes, the matrixes are PCA filters, and L represents the number of the PCA filters in the first layer, such as L₁Showing the number of PCA filters in the first layer as L₁,

A matrix of size k1x k2 is represented and all elements in the matrix belong to the real number domain.

6. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: the binary hash encoding includes: and carrying out binarization on all the matrixes in the primary local characteristic matrix group, carrying out hash coding and decimal by a hash function, and combining into a matrix.

7. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: the block histogram processing includes: block size selection, overlapping coefficient selection, vector conversion, histogram statistics and vector connection to obtain the feature vector of the heart beat.

8. The PCANet-based electrocardiogram feature extraction method according to claim 1, wherein: the classifiers include, but are not limited to, linear SVM, KNN classifier, BP neural network classifier, and random forest.