CN116211316A

CN116211316A - Type identification method, system and auxiliary system for multi-lead ECG signals

Info

Publication number: CN116211316A
Application number: CN202310400960.6A
Authority: CN
Inventors: 赵韡; 周亚; 袁靖; 刁晓林; 霍燕妮
Original assignee: Fuwai Hospital of CAMS and PUMC
Current assignee: Fuwai Hospital of CAMS and PUMC
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-06-06
Anticipated expiration: 2043-04-14
Also published as: CN116211316B

Abstract

The application discloses a type identification method, a system and an auxiliary system of multi-lead electrocardiosignals, which are characterized in that firstly, multi-lead electrocardiosignals and patient characteristic information and electrocardiosignal type labels which are related to partial multi-lead electrocardiosignals in the multi-lead electrocardiosignals are acquired through a data acquisition module, then a data preprocessing module and a data set dividing module are sequentially input to respectively complete data preprocessing and data set dividing, then an electrocardiosignal self-supervision model is trained and stored through a model generating module, finally, when a service computing module receives a type identification request of the electrocardiosignals, the trained electrocardiosignal self-supervision model is automatically called, probability information corresponding to various set electrocardiosignal types is acquired based on data carried by the request, and meanwhile, an electrocardiosignal interpretation model built in an electrocardiosignal interpretation module can be called to interpret an electrocardiosignal. The method and the system can train and obtain the type recognition model of the multi-lead electrocardiosignal based on fewer label data, and improve the model recognition accuracy.

Description

Type identification method, system and auxiliary system for multi-lead ECG signals

技术领域technical field

本申请涉及人工智能技术领域，特别涉及一种多导联心电信号的类型识别方法、系统及辅助系统。The present application relates to the technical field of artificial intelligence, in particular to a type identification method, system and auxiliary system for multi-lead ECG signals.

背景技术Background technique

目前，许多心血管疾病的诊断金标准以影像学检查为主，常见的有超声、CT、核磁、介入造影等，这类检查手段费用相对高昂、专业医师有限、患者等待时间长、对患者身体造成辐射或者创伤损害。因此亟需低成本、便捷、安全的检查方法以应对当前大量心血管患病人数的现状。At present, the gold standard for the diagnosis of many cardiovascular diseases is based on imaging examinations, such as ultrasound, CT, MRI, and interventional angiography. cause radiation or trauma damage. Therefore, there is an urgent need for low-cost, convenient and safe inspection methods to deal with the current situation of a large number of cardiovascular patients.

相比于影像学检查，心电图 (信号) 检查具有非侵入性、操作简便、经济有效等优点。近年来深度学习在心电领域发展显著，已有许多深度学习模型可辅助完成识别心律失常等传统的心电任务，更进一步地，有研究表明深度学习能捕捉到心血管病患者心电图中人为难以识别的模式，提高心电图类型识别的准确性。但当前基于心电信号进行类型识别的方法主要采用有监督的深度学习模型。这类模型的训练需要使用大量心电信号及与多导联心电信号相关联的患者心电信号类型标签信息（例如疾病标签信息等）。一方面，大量的没有患者心电信号类型标签信息的心电信号数据被弃用了，另一方面，在许多实际场景下并没有足够的患者心电信号类型标签信息，因而也缺失了一些心电信号类型识别的模型。 Compared with imaging examinations, electrocardiogram (signal) examination has the advantages of non-invasiveness, simple operation, and cost-effectiveness. In recent years, deep learning has developed significantly in the field of ECG. There are many deep learning models that can assist in traditional ECG tasks such as identifying arrhythmias. Further studies have shown that deep learning can capture artificially difficult to identify in the ECG of patients with cardiovascular disease. The pattern can improve the accuracy of electrocardiogram type recognition. However, the current methods for type recognition based on ECG signals mainly use supervised deep learning models. The training of this type of model requires the use of a large number of ECG signals and patient ECG signal type label information (such as disease label information, etc.) associated with multi-lead ECG signals. On the one hand, a large amount of ECG data without patient ECG type label information is discarded; on the other hand, in many practical scenarios, there is not enough patient ECG type label information, so some ECG data A model for electrical signal type identification.

当前已有许多基于深度学习的心电信号类型识别系统可以较好的完成心律失常的识别等传统的心电任务。但是，由于缺乏大量可与心电匹配的多模态标签数据，又缺少心电信号无监督学习方法，大多数机构都难以建立基于心电信号来识别成人先心病、瓣膜病、冠心病以及心肌病等心电信号类型的深度学习模型。因此，目前基于心电信号以较高精度进行类型识别的系统较少。At present, there are many ECG signal type recognition systems based on deep learning that can better complete traditional ECG tasks such as arrhythmia identification. However, due to the lack of a large amount of multimodal label data that can be matched with ECG, and the lack of unsupervised learning methods for ECG signals, it is difficult for most institutions to establish a system based on ECG signals to identify adult congenital heart disease, valvular disease, coronary heart disease, and myocardial infarction. A deep learning model for ECG signal types such as heart disease. Therefore, at present, there are few systems that perform type recognition with high accuracy based on ECG signals.

发明内容Contents of the invention

本申请提供一种多导联心电信号的类型识别方法及类型识别系统、辅助系统，能够基于更少的标签数据训练得到类型识别模型，提高类型识别的准确性。The present application provides a multi-lead electrocardiographic signal type identification method, type identification system, and auxiliary system, which can obtain a type identification model based on less label data training and improve the accuracy of type identification.

为实现上述目的，本申请采用如下技术方案：一种多导联心电信号的类型识别方法，包括如下步骤：In order to achieve the above object, the present application adopts the following technical solution: a method for identifying the type of a multi-lead ECG signal, comprising the following steps:

数据采集，采集n个多导联心电信号，以及与所述n个多导联心电信号中的部分多导联心电信号相关联的患者特征信息与心电信号类型标签；Data collection, collecting n multi-lead ECG signals, and patient characteristic information and ECG signal type labels associated with some of the n multi-lead ECG signals;

数据预处理，基于所述n个多导联心电信号生成表示所有心电信号的多导联心电信号数据集 D₁，对应样本量为n；基于所述部分多导联心电信号以及与所述部分多导联心电信号相关联的患者特征信息与心电信号类型标签，生成关联数据集D₂，对应样本量为m，其中m≤n；Data preprocessing, generating a multi-lead ECG data set D ₁ representing all ECG signals based on the n multi-lead ECG signals, the corresponding sample size is n; based on the part of the multi-lead ECG signals and The patient characteristic information and the ECG signal type label associated with the part of the multi-lead ECG signal generate an associated data set D ₂ , and the corresponding sample size is m, where m≤n;

数据集划分，将所述多导联心电信号数据集D₁划分为多导联心电信号训练集D_1,train和多导联心电信号验证集D_1,vali，将所述关联数据集D₂划分为关联数据训练集D_2,train,、关联数据验证集D_2,vali、关联数据测试集D_2,test；dividing the data set, dividing the multi-lead ECG data set D ₁ into a multi-lead ECG training set D _1,train and a multi-lead ECG verification set D _1,vali , and dividing the associated data Set D ₂ is divided into associated data training set D _2,train , associated data verification set D _2,vali , associated data test set D _2,test ;

心电自监督模型框架构建，所述模型框架基于Transformer模块，包括切割器、双分类掩蔽器、编码器、解码器及分类器；ECG self-supervised model framework construction, said model framework is based on the Transformer module, including cutter, double classification masker, encoder, decoder and classifier;

模型预训练，对模型参数初始化，再将所述多导联心电信号训练集D_1,train、多导联心电信号验证集D_1,vali、关联数据训练集D_2,train、关联数据验证集D_2,vali输入模型框架，进行自监督学习、惩罚自监督学习，得到预训练后的心电自监督模型；Model pre-training, initialize model parameters, and then the multi-lead ECG signal training set D _1,train , multi-lead ECG signal verification set D _1,vali , associated data training set D _2,train , associated data Verification set D _{2, vali} input model framework, self-supervised learning, penalty self-supervised learning, get pre-trained ECG self-supervised model;

模型微调，基于所述关联数据训练集D_2,train、关联数据验证集D_2,vali对预训练后的心电自监督模型进行微调，完成模型的训练；Model fine-tuning, based on the associated data training set D _2,train and the associated data verification set D _2,vali fine-tuning the pre-trained ECG self-supervised model to complete the training of the model;

模型测试，基于所述关联数据测试集D_2,test，对微调后的心电模型进行测试，评估模型效果，如果模型评估结果不符合预设要求，则调整模型参数，重复所述模型预训练以及所述模型微调，直至模型评估结果符合预设要求；Model testing, based on the associated data test set D _2,test , testing the fine-tuned ECG model, evaluating the model effect, if the model evaluation result does not meet the preset requirements, adjusting the model parameters, and repeating the model pre-training and fine-tuning the model until the evaluation result of the model meets the preset requirements;

在应用阶段，将采集的多导联心电信号、患者特征信息输入训练好的模型中，得到与设定的各种心电信号类型对应的概率信息。In the application stage, the collected multi-lead ECG signals and patient characteristic information are input into the trained model, and the probability information corresponding to various types of ECG signals is obtained.

较佳地，所述患者特征信息包括年龄、性别以及心电信号异常情况，所述心电信号类型标签代表与所述部分多导联心电信号相匹配的多模态数据是否含有某种心血管疾病的信息，所述多模态数据为针对同一患者采集的CT、超声、造影或核磁数据，所述心血管疾病包括成人先心病、瓣膜病、冠心病、心肌病、肺血管疾病中的至少之一。Preferably, the patient characteristic information includes age, gender, and abnormality of the ECG signal, and the ECG signal type label represents whether the multi-modal data matching the part of the multi-lead ECG signal contains a certain type of ECG signal. Information on vascular diseases, the multimodal data is CT, ultrasound, contrast or nuclear magnetic data collected for the same patient, the cardiovascular diseases include adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary vascular disease at least one.

较佳地，所述模型预训练包括如下步骤：Preferably, the model pre-training includes the following steps:

a、对模型参数进行随机初始化；a. Randomly initialize the model parameters;

b、在所述多导联心电信号训练集D_1,train、所述多导联心电信号验证集D_1,vali上，基于所述模型的切割器、双分类掩蔽器、编码器、解码器进行自监督学习；b. On the multi-lead ECG signal training set D _1,train and the multi-lead ECG signal verification set D _1,vali , the model-based cutter, dual classification masker, encoder, Decoder for self-supervised learning;

c、在所述关联数据训练集D_2,train、关联数据验证集D_2,vali上，基于所述模型的切割器、双分类掩蔽器、编码器、解码器以及分类器进行惩罚自监督学习。c. On the associated data training set D _2,train and the associated data verification set D _2,vali , perform penalty self-supervised learning based on the model's cutter, dual classification masker, encoder, decoder and classifier .

较佳地，所述基于所述模型的切割器、双分类掩蔽器、编码器、解码器自监督学习包括如下步骤：Preferably, the self-supervised learning of the model-based cutter, double classification masker, encoder, and decoder comprises the following steps:

前向传播，将所述D_1,train中的心电信号，依次通过切割器和双分类掩蔽器，得到自训练向量组和变换后的自估计向量组，所述自训练向量组和一个分类向量拼接后依次通过编码器和解码器，输出一组预测向量，作为所述变换后的自估计向量组的估计结果；其中所述分类向量为预设的可学习分类向量；Forward propagation, the ECG signal in the D1 _{, train} is passed through the cutter and the double classification masker in turn to obtain the self-training vector group and the transformed self-estimated vector group, the self-training vector group and a classification After the vectors are spliced, they pass through the encoder and the decoder in turn to output a set of prediction vectors as the estimated result of the transformed self-estimated vector group; wherein the classification vector is a preset learnable classification vector;

参数更新，以反映所述预测向量与所述变换后的自估计向量组之间误差的自监督损失函数为目标函数，在D_1,train上使用优化器更新编码器和解码器中的所有可学习参数；Parameter update, with the self-supervised loss function reflecting the error between the predicted vector and the transformed self-estimated vector group as the objective function, using an optimizer on D _{1, train} to update all available parameters in the encoder and decoder learning parameters;

在验证集D_1,vali上选择最优的第一超参数组合使得自监督损失函数与解码器中所有可学习参数总量相比最小。Selecting the optimal first hyperparameter combination on the validation set D _1,vali minimizes the self-supervised loss function compared to the sum of all learnable parameters in the decoder.

较佳地，所述变换包括采样降维、元素对元素的幂指数、向量内的标准化、根据阈值分类中的至少一种，所述自监督损失函数包括l₁损失、l₂损失、交叉熵损失函数中的一种；所述第一超参数组合包括模型中所述解码器的Transformer子块的隐藏维度和注意力头。Preferably, the transformation includes at least one of sampling dimensionality reduction, element-to-element power exponent, normalization within a vector, and classification according to a threshold, and the self-supervised loss function includes l ₊₁ loss, l ₊₂ loss, cross-entropy One of the loss functions; the first hyperparameter combination includes hidden dimensions and attention heads of the Transformer sub-block of the decoder in the model.

较佳地，所述惩罚自监督学习包括如下步骤：Preferably, the penalized self-supervised learning includes the following steps:

前向传播，将D_2,train中的心电信号，依次通过切割器和双分类掩蔽器，得到自训练向量组和变换后的自估计向量组，所述自训练向量组和一个分类向量拼接后通过编码器，输出编码后的自训练向量组和编码后的分类向量，其中所述一个分类向量为预设的可学习分类向量；所述编码后的自训练向量组和所述编码后的分类向量进入分支一，所述编码后的分类向量和所述D_2,train中的患者特征信息进入分支二；Forward propagation, the ECG signal in D _{2, train} is passed through the cutter and the double classification masker in turn to obtain the self-training vector group and the transformed self-estimated vector group, and the self-training vector group is spliced with a classification vector After passing through the encoder, output the coded self-training vector group and the coded classification vector, wherein the one classification vector is a preset learnable classification vector; the coded self-training vector group and the coded The classification vector enters branch one, and the encoded classification vector and the patient characteristic information in the D2 _{, train} enter branch two;

分支一，所述解码器对输入的所述编码后的自训练向量组进行处理，得到预测向量，用于估计变换后的自估计向量组；Branch 1, the decoder processes the input encoded self-training vector group to obtain a prediction vector, which is used to estimate the transformed self-estimation vector group;

分支二，所述编码后的分类向量和所述D_2,train中的患者特征信息输入分类器进行处理，得到心电信号类型的预测概率；Branch 2, the encoded classification vector and the patient characteristic information in the D _{2, train} are input into a classifier for processing to obtain the predicted probability of the ECG signal type;

参数更新，以惩罚损失函数为目标函数，在D_2,train上使用优化器更新编码器、解码器和分类器中所有可学习参数，所述的惩罚损失函数为关于预测向量与变换后的自估计向量组的自监督损失函数+λCrossEntropy，其中CrossEntropy代表关于心电信号类型的预测概率与心电信号类型标签的交叉熵损失，λ为超参数十；Parameter update, with the penalty loss function as the objective function, use the optimizer to update all the learnable parameters in the encoder, decoder and classifier on D _2,train, the penalty loss function is about the prediction vector and the transformed self Estimate the self-supervised loss function of the vector group + λCrossEntropy, where CrossEntropy represents the cross-entropy loss of the predicted probability of the ECG type and the ECG type label, and λ is the hyperparameter ten;

在验证集D_2,vali上选择最优的超参数λ使得心电信号类型识别的选择度量指标最大，所述选择度量指标包括AUC、F_β-score、准确率中的一种。Selecting the optimal hyperparameter λ on the verification set D _2,vali maximizes the selection metric for ECG type identification, and the selection metric includes one of AUC, F _β -score, and accuracy.

较佳地，所述心电自监督模型包括：Preferably, the ECG self-supervised model includes:

切割器，用于将输入的每个心电信号切割为行数为K，列数为d_patch的互不相交的d_v个子矩阵，并将子矩阵向量化，获得元素个数为d_v的心电信号全向量组{x_1,…,x_dv}，其中d_patch为超参数一, K为心电导联数；A cutter for cutting each input ECG signal into a number of K rows and a number of columns of d _patch disjoint d _v sub-matrixes, and vectorizing the sub-matrices to obtain d v sub-matrices whose number of elements is d _v ECG signal full vector set {x _1,…, x _dv }, where d _patch is hyperparameter 1, K is the number of ECG leads;

双分类掩蔽器，用于接收心电信号全向量组{x_1,…,x_dv}，从中不放回等概率随机抽取T+T′个向量，其中, T+T′≤d_v，前T个向量构成自训练向量组，后T′向量构成估计向量组，T和T′分别为超参数三和超参数四，输出为自训练向量组以及自估计向量组，再对自估计向量组进行变换，得到变换后的自估计向量组；The dual classification masker is used to receive the full vector set of ECG signals {x _1,…, x _dv }, and randomly extract T+T′ vectors with equal probability of non-replacement, where, T+T′≤d _v , the former T vectors constitute a self-training vector group, and the latter T′ vectors constitute an estimated vector group, T and T′ are hyperparameters three and hyperparameters four, respectively, and the output is a self-training vector group and a self-estimated vector group, and then the self-estimation The vector group is transformed to obtain the transformed self-estimated vector group;

编码器，由顺序连接的投影层，位置嵌入层，以及L个隐藏维度为d_encoder,注意力头为h_encoder的 Transformer子块顺序连接而成，其中L为超参数五，d_encoder为超参数六，h_encoder为超参数七，在预训练阶段，编码器的输入为自训练向量组和一个分类向量，输出为编码后的自训练向量组和分类向量；微调与测试阶段，编码器的输入为心电信号全向量组和一个分类向量，输出为一组编码后的心电信号全向量组和一个分类向量，其中所述的分类向量为人工添加的可学习分类向量；The encoder is composed of a sequentially connected projection layer, a position embedding layer, and L Transformer sub-blocks whose hidden dimension is d _encoder and the attention head is h _encoder , where L is the hyperparameter five, and d _encoder is the super Parameter six, h _encoder is hyperparameter seven, in the pre-training phase, the input of the encoder is a self-training vector group and a classification vector, and the output is the encoded self-training vector group and classification vector; in the fine-tuning and testing phase, the encoding The input of the device is the full vector group of electrocardiographic signals and a classification vector, and the output is a group of encoded electrocardiographic signal full vector groups and a classification vector, wherein the classification vector is a learnable classification vector added artificially;

解码器，由复原层，位置嵌入层，1个隐藏维度为d_decoder，注意力头为h_decoder的Transformer子块，以及1个全连接层构成，其中d_decoder为超参数八，其中h_decoder为超参数九，解码器只在预训练阶段使用，解码器输入编码后的自训练向量组和分类向量，解码输出预测向量；The decoder consists of a restoration layer, a position embedding layer, a Transformer sub-block whose hidden dimension is d _decoder , and the attention head is h decoder, and a fully _connected layer, where d _decoder is the hyperparameter eight, where h _decoder For hyperparameter nine, the decoder is only used in the pre-training stage, the decoder inputs the encoded self-training vector group and classification vector, and decodes the output prediction vector;

分类器，由全连接层和激活层构成，输入为编码后的分类向量，以及特征信息，输出为心电信号类型的预测概率值。The classifier is composed of a fully connected layer and an activation layer. The input is the encoded classification vector and feature information, and the output is the predicted probability value of the ECG signal type.

较佳地，所述数据预处理包括如下步骤：Preferably, the data preprocessing includes the following steps:

a、对心电信号进行滤波去噪处理；a. Filter and denoise the ECG signal;

b、滤波处理后的心电信号标准化，使得数据范围在-1到1之间；b. Normalize the ECG signal after filtering, so that the data range is between -1 and 1;

c、对标准化后的心电信号填充数值为0的列，以确保填充后的列数能整除超参数一，获得心电信号X_i，i=1,...n；c. Fill the columns with a value of 0 for the standardized ECG signal to ensure that the number of columns after filling can divide the hyperparameter one, and obtain the ECG signal X _i , i=1,...n;

d、对患者特征信息中的数值型变量进行min-max标准化处理，对患者特征信息中的分类变量进行0-1编码，获得心电的患者特征信息z_j，j=1,...m；d. Perform min-max standardization processing on the numerical variables in the patient characteristic information, perform 0-1 coding on the categorical variables in the patient characteristic information, and obtain the patient characteristic information z _j of the ECG, j=1,...m ;

e、获取多导联心电信号数据集D₁、关联数据集D₂，其中D₁={X_i:i=1,…,n}表示所有的心电信号； D₂={(X_j,z_j, y_j) :j=1,…,m}表示关联数据集，其中X_j代表多导联心电信号，z_j代表与多导联心电信号关联的患者特征信息，y_j代表与多导联心电信号关联的心电信号类型标签。e. Obtain multi-lead ECG data set D ₁ and associated data set D ₂ , where D ₁ ={X _i :i=1,...,n} represents all ECG signals; D ₂ ={(X _{j ,} z _j , y _j ) :j=1,…,m} represents the associated data set, where X _j represents the multi-lead ECG signal, z _j represents the patient characteristic information associated with the multi-lead ECG signal, and y _j Represents the ECG signal type label associated with a multi-lead ECG signal.

较佳地，所述模型微调包括如下步骤：Preferably, the model fine-tuning includes the following steps:

前向传播，将 D_2,train中的心电信号输入切割器进行处理，得到心电信号全向量组，再将心电信号全向量组与分类向量拼接后输入编码器进行处理，得到编码后的心电信号全向量组以及编码后的分类向量；将所述编码后的分类向量和D_2,train中的患者特征信息输入分类器进行处理，得到心电信号类型的预测概率；Forward propagation, the ECG signal in D _{2, train} is input into the cutter for processing, and the full vector group of the ECG signal is obtained, and then the full vector group of the ECG signal and the classification vector are spliced and input to the encoder for processing, and the encoded The electrocardiographic signal full vector group and the classification vector after encoding; The patient characteristic information input classifier in the classification vector after described encoding and D _2'train is processed, obtains the predictive probability of electrocardiographic signal type;

参数更新，以关于心电信号类型的预测概率与D_2,train中的心电信号类型标签的交叉熵损失为目标函数，在D_2,train上使用优化器更新编码器和分类器中所有可学习参数；Parameter update, using the cross-entropy loss of the predicted probability of the ECG signal type and the ECG signal type label in D _{2, train} as the objective function, use the optimizer to update all available encoders and classifiers on D _{2, train} learning parameters;

在关联数据验证集D_2,vali上选择最优的第二超参数组合，使得心电信号类型识别的选择度量指标最大；所述第二超参数组合包括所述切割器中心电信号被切割后的列数d_patch、所述切割器中心电信号被切割后的子矩阵个数d_v、所述自训练向量组的向量个数T、所述自估计向量组的向量个数T′、所述编码器中所包括的Transformer子块的个数、所述编码器中Transformer子块的隐藏维度L、所述编码器中Transformer子块的注意力头h_encoder。Select the optimal second hyperparameter combination on the associated data verification set D _2,vali , so that the selection metric index of ECG type identification is the largest; the second hyperparameter combination includes The number of columns d _patch , the number of sub-matrices d _v after cutting the electrical signal of the center of the cutter, the number of vectors T of the self-training vector group, the number of vectors T' of the self-estimated vector group, and the The number of Transformer sub-blocks included in the encoder, the hidden dimension L of the Transformer sub-block in the encoder, and the attention head h _encoder of the Transformer sub-block in the encoder.

较佳地，所述模型测试包括如下步骤：Preferably, the model testing includes the following steps:

在关联数据测试集D_2,test上，通过选择度量指标评估模型效果，如果模型评估结果符合预设要求，则允许使用该模型，如果不符合，则调整模型参数，重复模型预训练以及微调步骤，直至模型评估结果符合预设要求。On the associated data test set D _2,test , evaluate the effect of the model by selecting the metric, if the model evaluation result meets the preset requirements, the model is allowed to be used, if not, adjust the model parameters, repeat the model pre-training and fine-tuning steps , until the model evaluation results meet the preset requirements.

一种多导联心电信号的类型识别系统，包括数据采集模块、数据预处理模块、数据集划分模块、模型生成模块、服务计算模块，其中A type recognition system for multi-lead ECG signals, including a data acquisition module, a data preprocessing module, a data set division module, a model generation module, and a service calculation module, wherein

所述的数据采集模块用于获取训练数据，包括n个多导联心电信号，以及与所述n个多导联心电信号中的部分多导联心电信号相关联的患者特征信息与心电信号类型标签；The data acquisition module is used to obtain training data, including n multi-lead ECG signals, and patient characteristic information associated with part of the multi-lead ECG signals in the n multi-lead ECG signals and ECG signal type label;

所述的数据预处理模块，用于基于所述n个多导联心电信号生成表示所有心电信号的多导联心电信号数据集D₁，对应样本量为n；基于n个多导联心电信号以及与多导联心电信号相关联的患者特征信息与心电信号类型标签，剔除患者特征信息或心电信号类型标签缺失的多导联心电信号，生成关联数据集D₂，对应样本量为m，其中m≤n；The data preprocessing module is used to generate a multi-lead ECG data set D ₁ representing all ECG signals based on the n multi-lead ECG signals, and the corresponding sample size is n; Combine the ECG signal and the patient characteristic information associated with the multi-lead ECG signal and the ECG signal type label, and eliminate the multi-lead ECG signal with missing patient characteristic information or ECG signal type label to generate an associated data set D ₂ , the corresponding sample size is m, where m≤n;

所述的数据集划分模块，用于将所述多导联心电信号数据集D₁划分为多导联心电信号训练集D_1,train和多导联心电信号验证集D_1,vali，将所述关联数据集D₂划分为关联数据训练集D_2,train、关联数据验证集D_2,vali、关联数据测试集D_2,test；The data set division module is used to divide the multi-lead ECG data set _D1 into a multi-lead ECG training set D1 _,train and a multi-lead ECG verification set D1 _,vali , dividing the associated data set D ₂ into an associated data training set D _2,train , an associated data verification set D _2,vali , an associated data test set D _2,test ;

所述的模型生成模块，用于基于多导联心电信号、特征信息、标签信息及搭建的模型框架，完成模型训练，获得并存储已训练的心电自监督模型；The model generation module is used to complete model training based on multi-lead ECG signals, feature information, label information and built model framework, and obtain and store a trained ECG self-supervised model;

所述的服务计算模块，用于接收心电信号类型识别的请求，并调用已训练的心电自监督模型，得到与设定的各种心电信号类型对应的概率信息。The service computing module is used to receive a request for ECG signal type identification, and invoke a trained ECG self-supervised model to obtain probability information corresponding to various types of ECG signals.

较佳地，所述的模型生成模块包括样本库、模型训练引擎和模型库，其中样本库为接收数据集划分模块发来的多导联心电信号数据集D₁和关联数据集D₂，并完成存储；模型训练引擎为基于样本库存储的多导联心电信号数据集和关联数据集，完成模型训练；模型库，用于存储已训练的心电自监督模型；Preferably, the model generation module includes a sample library, a model training engine and a model library, wherein the sample library is the multi-lead ECG data set D ₁ and the associated data set D ₂ sent by the receiving data set division module, And complete the storage; the model training engine is based on the multi-lead ECG signal data set and associated data set stored in the sample library, and completes the model training; the model library is used to store the trained ECG self-supervised model;

所述的服务计算模块包括服务触发引擎、模型计算引擎；其中，所述服务触发引擎，用于接收心电信号的类型识别请求以及请求携带的数据，并发送给模型计算引擎，其中请求携带的数据包括多导联心电信号、患者特征信息；模型计算引擎，用于调用已训练好的心电自监督模型，基于请求携带的多导联心电信号、患者特征信息，获得与设定的各种心电信号类型对应的概率信息，并完成结果存储。The service calculation module includes a service trigger engine and a model calculation engine; wherein, the service trigger engine is used to receive the ECG type identification request and the data carried in the request, and send them to the model calculation engine, wherein the request carried The data includes multi-lead ECG signals and patient characteristic information; the model calculation engine is used to call the trained ECG self-supervised model, based on the multi-lead ECG signals and patient characteristic information carried in the request, the obtained and set The probability information corresponding to various ECG signal types, and complete the result storage.

较佳地，所述的多导联心电信号的类型识别系统，还包括前端交互模块、动态监测模块，Preferably, the type identification system of the multi-lead ECG signal also includes a front-end interaction module and a dynamic monitoring module,

所述的前端交互模块，包括识别结果呈现子模块和标签存储子模块；其中，识别结果呈现子模块用于基于服务计算模块得到的与各种心电信号类型对应的概率信息，进行可视化提示；标签存储子模块用于自动获取应用过程中产生的最终的心电信号类型标签并完成存储；The front-end interaction module includes a recognition result presentation submodule and a label storage submodule; wherein the recognition result presentation submodule is used to provide visual prompts based on the probability information corresponding to various ECG signal types obtained by the service calculation module; The label storage sub-module is used to automatically obtain the final ECG signal type label generated during the application process and complete the storage;

所述的动态监测模块，包括服务监测评价子模块和服务更新触发引擎；其中，服务监测评价子模块用于基于自动积累的应用过程中产生的类型标签，实时评估模型识别效果；服务更新触发引擎，用于当模型效果不满足预设要求时自动触发模型及服务的更新，实现模型动态优化更新。The dynamic monitoring module includes a service monitoring and evaluation sub-module and a service update trigger engine; wherein, the service monitoring and evaluation sub-module is used to evaluate the model recognition effect in real time based on automatically accumulated type tags generated in the application process; the service update trigger engine , which is used to automatically trigger the update of the model and service when the model effect does not meet the preset requirements, so as to realize the dynamic optimization update of the model.

较佳地，所述的多导联心电信号的类型识别系统，还包括心电判读模块，所述心电判读模块内置有心电判读模型，用于对心电图进行判读，识别心电信号类型为心律失常的情况。Preferably, the system for identifying the type of the multi-lead ECG signal also includes an ECG interpretation module, the ECG interpretation module has a built-in ECG interpretation model for interpreting the ECG, and the type of the ECG signal is identified as Arrhythmia condition.

一种智能心电辅助系统，包括所述的多导联心电信号的类型识别系统，还包括知识库，所述的知识库存储了处理建议，当多导联心电信号的类型识别系统给出类型识别结果后，调用知识库，将知识库中符合预设条件的处理建议输出。An intelligent ECG auxiliary system, including the type recognition system of the multi-lead ECG signal, also includes a knowledge base, the knowledge base stores processing suggestions, when the type identification system of the multi-lead ECG signal gives After the type recognition result is obtained, the knowledge base is invoked, and the processing suggestions in the knowledge base that meet the preset conditions are output.

一种电子设备，包括了：处理器；An electronic device, comprising: a processor;

存储器，存储有程序，所述程序配置为在被所述处理器执行时实现所述的多导联心电信号的类型识别方法。The memory stores a program configured to implement the method for identifying the type of the multi-lead ECG signal when executed by the processor.

一种非瞬时计算机可读存储介质，所述非瞬时计算机可读存储介质存储指令，其特征在于，所述指令在由处理器执行时使得所述处理器执行所述的多导联心电信号的类型识别方法。A non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores instructions, it is characterized in that, when the instructions are executed by the processor, the processor executes the multi-lead electrocardiographic signal type identification method.

本发明构建心电自监督学习模型，通过自监督学习方法可以有效利用现有技术中不能利用的心电信号数据，提高心电信号数据利用率，适当降低对于多模态数据资源的样本量需求，推进从心电信号中挖掘其与各种心电信号类型（例如各种疾病类型）之间的关联关系研究，促进基于心电信号的类型识别，潜在地扩大当前心电图判读的应用边界。The present invention constructs an ECG self-supervised learning model, and can effectively use the ECG signal data that cannot be used in the prior art through the self-supervised learning method, improve the utilization rate of the ECG signal data, and appropriately reduce the sample size requirement for multi-modal data resources , promote the research on the relationship between mining ECG signals and various types of ECG signals (such as various types of diseases), promote type identification based on ECG signals, and potentially expand the application boundary of current ECG interpretation.

附图说明Description of drawings

图1为本发明实施例1的多导联心电信号的类型识别方法流程图；Fig. 1 is the flow chart of the type identification method of the multi-lead electrocardiogram signal of embodiment 1 of the present invention;

图2为本发明实施例1的模型预训练流程图；Fig. 2 is the model pre-training flowchart of embodiment 1 of the present invention;

图3为本发明实施例1的模型架构图；FIG. 3 is a model architecture diagram of Embodiment 1 of the present invention;

图4为本发明实施例1的多导联心电信号的类型识别系统示意图；4 is a schematic diagram of a type identification system for a multi-lead ECG signal according to Embodiment 1 of the present invention;

图5为本发明实施例1的模型生成模块示意图；5 is a schematic diagram of a model generation module in Embodiment 1 of the present invention;

图6为本发明实施例2的多导联心电信号的类型识别系统示意图；6 is a schematic diagram of a type identification system for multi-lead ECG signals according to Embodiment 2 of the present invention;

图7为本发明实施例2的多导联心电信号的类型识别系统示意图（有心电判读模块）；7 is a schematic diagram of a type identification system for multi-lead ECG signals in Embodiment 2 of the present invention (with an ECG interpretation module);

图8为本发明实施例2的一种电子设备的示意图。FIG. 8 is a schematic diagram of an electronic device according to Embodiment 2 of the present invention.

实施方式Implementation

为了使本申请的目的、技术手段和优点更加清楚明白，以下结合附图对本申请做进一步详细说明。In order to make the purpose, technical means and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含。例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "comprising" and "having" in the description and claims of the present application and the above drawings and any variations thereof are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device comprising a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include steps or units not explicitly listed or for these processes, methods, products, or Other steps or units inherent to equipment.

下面以具体实施例对本申请的技术方案进行详细说明。下面几个具体实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present application will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

实施例1；Embodiment 1;

如图1所示，一种多导联心电信号的类型识别方法，包括如下步骤：As shown in Figure 1, a kind of type identification method of multi-lead ECG signal, comprises the following steps:

数据采集，采集n个多导联心电信号，以及与n个多导联心电信号中部分多导联信号相关联的患者特征信息与心电信号类型标签信息；Data acquisition, collecting n multi-lead ECG signals, and patient characteristic information and ECG signal type label information associated with part of the multi-lead ECG signals in the n multi-lead ECG signals;

在数据采集过程中，不是每个多导联心电信号都能采集到与其相关联的患者心电信号类型标签信息，有一部分多导联心电信号缺失患者心电信号类型标签，这部分缺失相关联的心电信号类型标签信息的多导联心电信号在现有以监督学习为主的技术中无法利用；而本申请中，对于缺失相关联的心电信号类型标签信息的多导联心电信号，也可以应用于进行心电自监督模型的训练中；基于此，数据采集操作中采集的患者特征信息和心电信号类型标签可以是与n个多导联心电信号中的m'个多导联心电信号相关联的，也就是说，对于采集的n个多导联心电信号，只有其中部分多导联心电信号存在相关联的患者特征信息和心电信号类型标签；m'≤n；During the data acquisition process, not every multi-lead ECG signal can collect the associated patient ECG signal type label information. Some multi-lead ECG signals lack the patient ECG signal type label, and this part is missing The multi-lead ECG signal with the associated ECG signal type label information cannot be used in the existing supervised learning-based technology; and in this application, for the multi-lead ECG signal lacking the associated ECG signal type label information The ECG signal can also be applied to the training of the ECG self-supervised model; based on this, the patient feature information and the ECG signal type label collected in the data acquisition operation can be related to the m in the n multi-lead ECG signals 'The multi-lead ECG signals are associated, that is to say, for the collected n multi-lead ECG signals, only some of the multi-lead ECG signals have associated patient characteristic information and ECG signal type labels ;m'≤n;

数据预处理，基于所述n个多导联心电信号生成表示所有心电信号的多导联心电信号数据集D₁，对应样本量为n；基于部分多导联心电信号以及与部分多导联心电信号相关联的患者特征信息与心电信号类型标签，剔除心电信号类型标签缺失的多导联心电信号，生成关联数据集D₂，对应样本量为m，其中m≤m'≤n；即多导联心电信号数据集D₁中，存在相关联的患者特征信息和心电信号类型标签信息的多导联心电信号为m'个，m'中有m个多导联心电信号用于组成关联数据集D₂；Data preprocessing, generating a multi-lead ECG data set D ₁ representing all ECG signals based on the n multi-lead ECG signals, the corresponding sample size is n; based on part of the multi-lead ECG signals and The patient characteristic information associated with the multi-lead ECG signal and the ECG signal type label, the multi-lead ECG signal with missing ECG signal type label is eliminated, and the associated data set D ₂ is generated, and the corresponding sample size is m, where m≤ m'≤n; that is, in the multi-lead ECG data set D ₁ , there are m' multi-lead ECG signals with associated patient characteristic information and ECG signal type label information, and there are m in m' The multi-lead ECG signals are used to form the associated data set D ₂ ;

也就是说，将前述数据采集步骤中采集的n个多导联心电信号进行预处理后得到n个心电信号X_i，i=1,...n，n个心电信号组成多导联心电信号数据集D₁={X_i:i=1,…,n}，对于采集的n个多导联心电信号中存在关联心电信号类型标签的m个多导联心电信号，将这m个多导联心电信号以及m个多导联心电信号各自关联的患者特征信息和心电信号类型标签信息进行预处理，得到m个心电信号X_j、m个预处理后的患者特征信息z_j、m个心电信号类型标签信息y_j，组成关联数据集D2={(X_j,z_j,y_j):j=1,…,m}。That is to say, after preprocessing the n multi-lead ECG signals collected in the aforementioned data collection step, n ECG signals X _i are obtained, where i=1,...n, n ECG signals form a multi-lead Joint ECG signal data set D ₁ ={X _i :i=1,...,n}, for m multi-lead ECG signals with associated ECG signal type labels in the collected n multi-lead ECG signals , preprocess the m multi-lead ECG signals and the patient characteristic information and ECG signal type label information associated with the m multi-lead ECG signals to obtain m ECG signals X _j and m preprocessed The final patient characteristic information z _j and m ECG signal type label information y _j form the associated data set D2={(X _j ,z _j ,y _j ):j=1,...,m}.

本实施例中，m=8000，n=300000，m数量远小于n。In this embodiment, m=8000, n=300000, and the number of m is much smaller than n.

数据集划分，将所述多导联心电信号数据集D₁划分为多导联心电信号训练集D_1,train和多导联心电信号验证集D_1,vali，将所述关联数据集D₂划分为关联数据训练集D_2,train、关联数据验证集D_2,vali、关联数据测试集D_2,test；dividing the data set, dividing the multi-lead ECG data set D ₁ into a multi-lead ECG training set D _1,train and a multi-lead ECG verification set D _1,vali , and dividing the associated data Set D ₂ is divided into associated data training set D _2,train , associated data verification set D _2,vali , associated data test set D _2,test ;

其中划分时，D_2,train中的心电信号数据属于D_1,train，D_2,vali中的心电信号数据属于D_1,vali。其中D_2,train,D_2,vali和D_2,test中属于同一类型心电信号类型的概率几乎一致。划分出的多导联心电信号训练集D_1,train和多导联心电信号验证集D_1,vali中包括大量现有技术中无法利用的，缺失相关联的患者特征信息与心电信号类型标签信息的心电信号。其中，优选地，多导联心电信号训练集D_1,train和多导联心电信号验证集D_1,vali互不相交，关联数据训练集D_2,train、关联数据验证集D_2,vali和关联数据测试集D_2,test互不相交。When dividing, the ECG signal data in D _{2, train} belongs to D _{1, train} , and the ECG signal data in D _{2, vali} belongs to D _{1, vali} . Among them, the probability of D _{2, train} , D _{2, vali} and D _{2, test} belonging to the same type of ECG signal type is almost the same. The divided multi-lead ECG signal training set D _1,train and multi-lead ECG signal verification set D _1,vali include a large number of missing associated patient characteristic information and ECG signals that cannot be used in the prior art ECG signal of type label information. Wherein, preferably, the multi-lead ECG signal training set D _1,train and the multi-lead ECG signal verification set D _1,vali are mutually disjoint, and the associated data training set D _2,train and the associated data verification set D _{2, vali} and associated data test set D _2,test are mutually disjoint.

心电自监督模型框架构建，如图3所示，所述模型基于Transformer模块，包括切割器、双分类掩蔽器、编码器、解码器及分类器；ECG self-supervised model frame construction, as shown in Figure 3, said model is based on Transformer module, including cutter, double classification masker, encoder, decoder and classifier;

模型测试，基于所述关联数据测试集D_2,test，对微调后的心电模型进行测试，评估模型效果，如果模型评估结果不符合预设要求，则调整模型参数，重复模型预训练以及模型微调的步骤，直至心电自监督模型的评估结果符合预设要求；Model testing, based on the associated data test set D _2,test , test the fine-tuned ECG model, evaluate the model effect, if the model evaluation result does not meet the preset requirements, adjust the model parameters, repeat the model pre-training and model Steps of fine-tuning until the evaluation results of the ECG self-monitoring model meet the preset requirements;

在应用阶段，将采集的多导联心电信号以及相关联的患者特征信息输入训练好的心电自监督模型中，得到与设定的各种心电信号类型对应的概率信息。In the application stage, the collected multi-lead ECG signals and associated patient characteristic information are input into the trained ECG self-supervised model, and the probability information corresponding to various types of ECG signals is obtained.

本发明构建心电自监督学习模型，模型预训练中通过自监督学习方法利用了现有技术不能利用的心电信号数据，如多导联心电信号训练集D_1,train、多导联心电信号验证集D_1,vali中缺失心电信号类别标签的心电信号，考虑到了心电信号自身的信息以及心电信号对应的心电信号类别信息，仅使用少量的心电信号类型标签信息，如关联数据训练集D_2,train、关联数据验证集D_2,vali、关联数据测试集D_2,test中的数据，就可以得到进行心电信号类型识别的心电自监督模型，可以用于根据心电信号进行类型识别。本发明提供的模型可识别出人眼难以发现的多种心电信号类型，包括成人先心病、瓣膜病、冠心病、心肌病、肺血管疾病相关联的心电信号类型。因为使用的心电信号类型标签信息少，为许多数据资源不富裕的中型医疗机构/研究室提供了解决问题的新方案。The present invention constructs an ECG self-supervised learning model. In the model pre-training, the self-supervised learning method utilizes ECG signal data that cannot be used in the prior art, such as the multi-lead ECG signal training set D _1,train , and the multi-lead ECG signal training set D 1,train . The electrocardiographic signal verification set D _{1, the ECG signal missing the ECG signal category label in vali} , takes into account the information of the ECG signal itself and the ECG signal category information corresponding to the ECG signal, and only uses a small amount of ECG signal type label information , such as the data in the associated data training set D _2,train , the associated data verification set D _2,vali , and the associated data test set D _2,test , the ECG self-supervised model for ECG signal type identification can be obtained, which can be used It is used for type identification based on ECG signals. The model provided by the present invention can identify various ECG signal types that are difficult for human eyes to detect, including ECG signal types associated with adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy, and pulmonary vascular disease. Because the ECG signal type label information used is less, it provides a new solution to the problem for many medium-sized medical institutions/research laboratories with insufficient data resources.

由上述心电信号的类型识别方法的整个处理流程可见，具体包括三个部分：训练数据的准备、心电自监督模型的训练过程、使用训练好的心电自监督模型进行心电信号的类型识别。其中，数据采集、数据预处理和数据集划分用于进行训练数据的准备，构建心电自监督模型的结构、模型预训练、模型微调和模型测试组成心电自监督模型的完整训练过程。通过训练得到一个合适的心电自监督模型，那么就可以利用该心电自监督模型方便有效地进行心电信号的类型识别。接下来，对心电自监督模型的训练过程进行详细描述。It can be seen from the entire processing flow of the type identification method of the above-mentioned ECG signal, which specifically includes three parts: preparation of training data, training process of the ECG self-supervised model, and using the trained ECG self-supervised model to identify the type of ECG signal. identify. Among them, data collection, data preprocessing and data set division are used to prepare the training data, construct the structure of the ECG self-supervised model, model pre-training, model fine-tuning and model testing to form the complete training process of the ECG self-supervised model. A suitable ECG self-supervised model is obtained through training, and then the ECG self-supervised model can be used to conveniently and effectively identify the types of ECG signals. Next, the training process of the ECG self-supervised model is described in detail.

在心电自监督模型的训练中，使用的训练数据除采集的多导联心电信号外，还包括与部分多导联心电信号相关联的患者特征信息和心电信号类型标签信息。In the training of the ECG self-supervised model, the training data used includes not only the collected multi-lead ECG signals, but also patient characteristic information and ECG signal type label information associated with some multi-lead ECG signals.

其中，所述的患者特征信息包括年龄、性别、心电信号异常情况，其中，心电信号异常情况可以由专家对心电信号进行人工标注获得，所述异常包括但不限于ST-T改变、T波异常、左心室高电压、窦性心动过缓、异常Q波、心房颤动、完全性右束支阻滞、窦性心律不齐、ST段改变、室性早搏、P波异常、不完全性右束支阻滞、电轴左偏、一度房室阻滞、窦性心动过速、房性早搏未下传、右心室肥厚、心室起搏心律、非特异性室内传导阻滞、完全性左束支阻滞、左前分支阻滞；Wherein, the patient characteristic information includes age, gender, abnormality of ECG signal, wherein, abnormality of ECG signal can be obtained by manual marking of ECG signal by experts, and the abnormality includes but not limited to ST-T change, Abnormal T wave, left ventricular high voltage, sinus bradycardia, abnormal Q wave, atrial fibrillation, complete right bundle branch block, sinus arrhythmia, ST segment changes, ventricular premature beats, abnormal P wave, incomplete Right bundle branch block, left axis deviation, first-degree atrioventricular block, sinus tachycardia, atrial premature beats without conduction, right ventricular hypertrophy, ventricular pacing rhythm, nonspecific intraventricular block, complete left Bundle branch block, left anterior fascicular block;

，所述心电信号类型标签代表与所述部分多导联心电信号相匹配的多模态数据是否含有某种心血管疾病的信息，即该标签为心电信号属于某种类型的标签；可选地，可以由专家人工判断，其中所述心血管疾病包括成人先心病、瓣膜病、冠心病、心肌病、肺血管疾病。瓣膜病包括但不限于主动脉瓣狭窄、主动脉瓣关闭不全、二尖瓣狭窄、二尖瓣关闭不全。先心病包括但不限于房间隔缺损、室间隔缺损。心肌病包括但不限于肥厚性心肌病。冠心病包括不限于急性心肌梗死和心绞痛。肺血管疾病包括但不限于肺动脉高压和肺栓塞。对于瓣膜病、心肌病、冠心病、肺血管疾病，所述的与患者心电信号相匹配的多模态数据为同一患者在采集心电信号前后各90天内采集的CT 、超声、造影或核磁数据；对于成人先心病，所述的与患者心电信号相匹配的多模态数据为同一患者任意时间的超声数据。, the ECG signal type label represents whether the multimodal data matching the part of the multi-lead ECG signal contains information about a certain cardiovascular disease, that is, the label is a label that the ECG signal belongs to a certain type; Optionally, it can be manually judged by experts, wherein the cardiovascular diseases include adult congenital heart disease, valvular heart disease, coronary heart disease, cardiomyopathy, and pulmonary vascular disease. Valvular disease includes, but is not limited to, aortic stenosis, aortic regurgitation, mitral stenosis, mitral regurgitation. Congenital heart disease includes but not limited to atrial septal defect, ventricular septal defect. Cardiomyopathy includes, but is not limited to, hypertrophic cardiomyopathy. Coronary heart disease includes but is not limited to acute myocardial infarction and angina pectoris. Pulmonary vascular disease includes, but is not limited to, pulmonary hypertension and pulmonary embolism. For valvular disease, cardiomyopathy, coronary heart disease, and pulmonary vascular disease, the multimodal data that matches the patient's ECG signal is the CT, ultrasound, contrast-enhanced or MRI data collected by the same patient within 90 days before and after the ECG signal is collected. Data; for adult congenital heart disease, the multimodal data matching the patient's ECG signal is the ultrasound data of the same patient at any time.

如前所述，心电自监督模型的训练包括构建模型结构、模型预训练、模型微调和模型测试，下面结合构建的模型结构，对模型预训练、模型微调和模型测试三个处理分别进行详细描述。As mentioned above, the training of the ECG self-supervised model includes building the model structure, model pre-training, model fine-tuning and model testing. In the following, combined with the built model structure, the three processes of model pre-training, model fine-tuning and model testing will be detailed respectively. describe.

如图2所示，其中，预训练的处理具体可以包括如下步骤：As shown in Figure 2, wherein, the pre-training process may specifically include the following steps:

b、在多导联心电信号训练集D_1,train、多导联心电信号验证集D_1,vali上，基于心电自监督模型的切割器、双分类掩蔽器、编码器、解码器进行自监督学习；b. On the multi-lead ECG signal training set D _1,train and the multi-lead ECG signal verification set D _1,vali , the cutter, dual classification masker, encoder, and decoder based on the ECG self-supervised model conduct self-supervised learning;

如前所述，在对自监督模型进行训练前，通过数据采集、数据预处理和数据集划分的处理准备用于心电自监督模型的训练数据。其中，经过数据采集、数据预处理和数据集划分的处理后得到多导联心电信号训练集D_1,train、多导联心电信号验证集D_1,vali、关联数据训练集D_2,train、关联数据验证集D_2,vali和关联数据测试集D_2,test；As mentioned above, before training the self-supervised model, the training data for the ECG self-supervised model is prepared through data acquisition, data preprocessing and data set division. Among them, the multi-lead ECG signal training set D _1,train , the multi-lead ECG signal verification set D _1,vali , and the associated data training set D _{2, are obtained after data acquisition, data preprocessing and data set division. train} , associated data validation set D _2,vali and associated data test set D _2,test ;

c、在关联数据训练集D_2,train、关联数据验证集D_2,vali上，基于心电自监督模型的切割器、双分类掩蔽器、编码器、解码器以及分类器进行惩罚自监督学习。c. On the associated data training set D _2,train and the associated data verification set D _2,vali , perform penalty self-supervised learning based on ECG self-supervised model cutter, dual classification masker, encoder, decoder and classifier .

其中步骤a对模型参数的随机初始化可以包括对所有的Transformer块使用xavier uniform初始化，对其它参数使用正态分布初始化。 The random initialization of model parameters in step a may include using xavier uniform initialization for all Transformer blocks, and using normal distribution initialization for other parameters.

步骤b的自监督学习的处理具体可以包括如下步骤：The processing of the self-supervised learning of step b may specifically include the following steps:

前向传播，将D_1,train中的心电信号，依次通过切割器和双分类掩蔽器，得到自训练向量组和变换后的自估计向量组，然后，将自训练向量组和一个分类向量拼接（具体拼接方式可以与vision transformer中类似，例如将分类向量等同于自训练向量组中的一个自训练向量与自训练向量组进行拼接）后依次通过编码器和解码器，得到解码器输出的一组预测向量，用于估计变换后的自估计向量组，也就是说，将解码器输出的预测向量作为变换后的自估计向量组的估计值，其中所述的分类向量为预设（具体可以是人工添加）的可学习分类向量；其中，分类向量是一个长度为K ×d_patch的向量，它的每一个分量都是可学习（可训练）的参数；D_1,train中的心电信号的维度是K ×d_patch的整数倍，其中K为心电导联数，d_patch为超参数一。这里的可学习参数可以视为一种神经网络的权重参数，在模型的训练过程中，它跟神经网络中其它的权重参数一样，会迭代更新。Forward propagation, the ECG signals in D _{1, train} , are sequentially passed through the cutter and the double classification masker to obtain the self-training vector group and the transformed self-estimated vector group, and then, the self-training vector group and a classification vector Splicing (the specific splicing method can be similar to that in the vision transformer, for example, the classification vector is equivalent to a self-training vector in the self-training vector group and the self-training vector group is spliced) and then passed through the encoder and decoder in turn to obtain the output of the decoder A set of prediction vectors is used to estimate the transformed self-estimation vector group, that is to say, the prediction vector output by the decoder is used as the estimated value of the transformed self-estimation vector group, wherein the classification vector is preset (specifically can be artificially added) learnable classification vector; wherein, the classification vector is a vector with a length of K ×d _patch , and each of its components is a learnable (trainable) parameter; D _{1, ECG in train} The dimension of the signal is an integer multiple of K ×d _patch , where K is the number of ECG leads, and d _patch is hyperparameter one. The learnable parameter here can be regarded as a weight parameter of a neural network. During the training process of the model, it will be iteratively updated like other weight parameters in the neural network.

参数更新，以反映预测向量（即解码器输出的变换后的自估计向量组的估计值）与变换后的自估计向量组的实际取值（即双分类掩蔽器输出的变换后的自估计向量组）之间误差的自监督损失函数为目标函数，在D_1,train上使用优化器更新编码器和解码器中所有可学习参数，以缩小预测向量与变换后的自估计向量组的实际取值间的偏差，并同步更新下次迭代训练时需要输入编码器的与自训练向量组拼接的分类向量；本实施例中所使用的优化器为带有余弦学习率调度器的AdamW优化器，基础学习率0.001，权重衰减为0.05，批量大小为256，优化器冲量β₁=0.9, β₂=0.95，预热迭代次数为40，总迭代次数为400；The parameters are updated to reflect the prediction vector (i.e., the estimated value of the transformed self-estimated vector set output by the decoder) and the actual value of the transformed self-estimated vector set (i.e., the transformed self-estimated vector set output by the dual classification masker The self-supervised loss function of the error between groups) is the objective function, and the optimizer is used to update all the learnable parameters in the encoder and decoder on D _1,train to reduce the actual selection of the predicted vector and the transformed self-estimated vector group The deviation between the values, and synchronously update the classification vector that needs to be input into the encoder and the self-training vector group splicing when the next iteration is trained; the optimizer used in this embodiment is the AdamW optimizer with the cosine learning rate scheduler, The base learning rate is 0.001, the weight decay is 0.05, the batch size is 256, the optimizer impulse β ₁ =0.9, β ₂ =0.95, the number of warm-up iterations is 40, and the total number of iterations is 400;

前述前向传播和参数更新的过程不停迭代，直到满足训练终止条件（即达到设定的最大迭代次数），则得到一个训练模型，将经过多次迭代处理得到一个训练模型的过程称为一次自监督学习的训练过程。The aforementioned process of forward propagation and parameter update continues to iterate until the training termination condition is met (that is, the set maximum number of iterations is reached), then a training model is obtained, and the process of obtaining a training model after multiple iterations is called one-time The training process for self-supervised learning.

接下来，在双分类掩蔽器输出的验证集D_1,vali上选择最优的第一超参数组合，(该第一超参数组合包括超参数八d_decoder,超参数九h_decoder)，使得自监督损失函数与解码器中所有可学习参数总量相比最小。这里选择自监督损失函数与解码器中所有可学习参数总量的比较结果作为超参数八和超参数九的最优组合选择标准，是考虑到解码器中所有可学习参数总量越大，则解码器规模越大，自监督损失函数会越小，模型在预训练阶段对自估计向量组的变换的估计会更准，但是，相应地模型计算量会变大。更进一步地，通过数值实验表明，当解码器增长幅度远大于自监督损失函数较小的幅度，则在微调阶段模型泛化性会下降。基于此，本申请考虑在自监督损失与解码器参数量之间进行平衡，这个平衡，一方面可以平衡预训练阶段的自监督误差和计算量，另一方面可以有助于得到泛化性更强的模型。Next, select the optimal first hyperparameter combination on the verification set D _1,vali output by the double classification masker, (the first hyperparameter combination includes hyperparameter eight d _decoder , hyperparameter nine h _decoder ), Minimize the self-supervised loss function compared to the sum of all learnable parameters in the decoder. Here, the comparison result of the self-supervised loss function and the total amount of all learnable parameters in the decoder is selected as the optimal combination selection criteria of hyperparameters eight and hyperparameters nine, considering that the larger the total amount of all learnable parameters in the decoder is , the larger the scale of the decoder, the smaller the self-supervised loss function will be, and the model will be more accurate in estimating the transformation of the self-estimated vector group in the pre-training stage. However, the calculation amount of the model will increase accordingly. Furthermore, numerical experiments show that when the decoder grows much larger than the smaller magnitude of the self-supervised loss function, the model generalization decreases during the fine-tuning stage. Based on this, this application considers a balance between the self-supervised loss and the amount of decoder parameters. On the one hand, this balance can balance the self-supervised error and calculation amount in the pre-training stage, and on the other hand, it can help to obtain better generalization. strong model.

另外，说明一下在训练集上进行自监督学习的训练、在验证集上选择选择最优的第一超参数组合的具体操作。假定有N组待选的第一超参数组合，对于每一种第一超参数组合，利用训练集D_1,train执行前述一次完整的自监督学习的训练过程（即前述自监督学习中的前向传播和参数更新多次迭代生成模型的过程），得到一个对应的模型，该模型的解码器中所有可学习参数总量的值就是确定的；对于所有第一超参数组合，得到N个模型，该N个模型对应的解码器中所有可学习参数总量的值可以是不同的。对于N个模型中的每一个，将验证集D_1,vali的信号输入该模型，进行处理后得到自监督损失函数的取值，再与该模型的解码器中所有可学习参数总量进行比较，得到比较结果；从N个模型的比较结果中选择最小的一个，该模型对应的第一超参数组合即为最优的第一超参数组合，将选择出的模型作为下一步进行惩罚自监督训练的初始模型。In addition, explain the specific operations of performing self-supervised learning training on the training set and selecting the optimal first hyperparameter combination on the verification set. Assuming that there are N groups of first hyperparameter combinations to be selected, for each first hyperparameter combination, use the training set D _{1, train} to perform the aforementioned complete training process of self-supervised learning (that is, the previous self-supervised learning in the aforementioned self-supervised learning To the process of propagating and parameter updating multiple iterations to generate a model), a corresponding model is obtained, and the value of the total amount of all learnable parameters in the decoder of the model is determined; for all first hyperparameter combinations, N models are obtained , the value of the total amount of all learnable parameters in the decoder corresponding to the N models can be different. For each of the N models, the signal of the verification set D _1,vali is input to the model, after processing, the value of the self-supervised loss function is obtained, and then compared with the total amount of all learnable parameters in the decoder of the model , to get the comparison result; select the smallest one from the comparison results of the N models, the first hyperparameter combination corresponding to the model is the optimal first hyperparameter combination, and use the selected model as the next step for penalty self-supervision The initial model for training.

在上述自监督学习的过程中，自估计向量的变换可以包括采样降维、元素对元素的幂指数、向量内的标准化、根据阈值分类中的一种，自监督损失函数可以包括l₁损失、l₂损失、交叉熵损失函数中的一种，变换、自监督损失函数均为现有公知常识。In the above self-supervised learning process, the transformation of the self-estimated vector can include sampling dimensionality reduction, element-to-element power exponent, normalization within the vector, and classification according to a threshold. The self-supervised loss function can include l ₁ loss, l One of the ₂ loss, cross-entropy loss function, transformation, and self-supervised loss function are all existing common knowledge.

本申请心电自监督模型中的超参数包括超参数一至超参数十，后面会分别对超参数一至超参数十进行说明。The hyperparameters in the ECG self-supervised model of this application include hyperparameters 1 to 10, and hyperparameters 1 to 10 will be described later.

注意，由于双分类掩蔽器是对心电信号全向量组的抽取具有随机性，需要在验证阶段固定每个心电样本对应的自训练向量组以及自估计向量组，即对验证集中的每个心电信号只使用一次双分类掩蔽器，得到自训练向量组和自估计向量组，并将其保存，在预训练的验证阶段使用这些保存好的自训练向量组和自估计向量组来选择最优的第一超参数组合，即超参数八和超参数九；Note that since the double classification masker is random in the extraction of the full vector set of the ECG signal, it is necessary to fix the self-training vector set and the self-estimated vector set corresponding to each ECG sample in the verification stage, that is, for each The ECG signal only uses the dual classification masker once to obtain the self-training vector group and the self-estimated vector group, and save them, and use these saved self-training vector groups and self-estimated vector groups to select the best The optimal first hyperparameter combination, namely hyperparameter eight and hyperparameter nine;

步骤c的惩罚自监督学习包括如下步骤：The penalized self-supervised learning of step c includes the following steps:

前向传播，将D_2,train中的心电信号，依次通过切割器和双分类掩蔽器，得到自训练向量组和变换后的自估计向量组，其中自训练向量组和一个分类向量拼接（具体拼接方式可以与自监督学习中的拼接方式相同）后输入编码器进行处理，编码器输出编码后的自训练向量组和编码后的分类向量，其中所述的分类向量为预设（具体可以是人工添加）的可学习分类向量；编码后的自训练向量组和编码后的分类向量进入分支一进行处理，编码后的分类向量和D_2,train中的患者特征信息进入分支二进行处理；惩罚自监督学习的过程中，自估计向量组的变换与自监督学习中的变换相同，为现有技术。For forward propagation, the ECG signals in D _{2, train} are sequentially passed through the cutter and the double classification masker to obtain the self-training vector group and the transformed self-estimated vector group, wherein the self-training vector group is spliced with a classification vector ( The specific splicing method can be the same as the splicing method in self-supervised learning), and then input to the encoder for processing, and the encoder outputs the encoded self-training vector group and the encoded classification vector, wherein the classification vector is preset (specifically can be is the learnable classification vector of artificially added); the encoded self-training vector group and the encoded classification vector enter branch 1 for processing, and the encoded classification vector and patient feature information in D _{2, train} enter branch 2 for processing; In the process of punishing self-supervised learning, the transformation of the self-estimated vector group is the same as that in the self-supervised learning, which is a prior art.

分支一，通过解码器得到一个预测向量，用于估计变换后的自估计向量组，即将解码器输出的预测向量作为变换后的自估计向量组的估计值；Branch 1, a prediction vector is obtained through the decoder, which is used to estimate the transformed self-estimation vector group, that is, the prediction vector output by the decoder is used as the estimated value of the transformed self-estimation vector group;

分支二，编码后的分类向量和D_2,train中的患者特征信息通过分类器得到对应预设的各种心电信号类型的预测概率；Branch 2, the encoded classification vector and D _{2, the patient feature information in the train} are obtained through the classifier to obtain the corresponding preset prediction probabilities of various ECG signal types;

参数更新，以惩罚损失函数为目标函数，在D_2,train上使用优化器更新编码器、解码器和分类器中所有可学习参数，所述的惩罚损失函数为自监督损失函数+λCrossEntropy，其中自监督损失函数与自监督学习过程中的自监督损失函数相同，CrossEntropy代表关于心电信号类型的预测概率（也就是疾病预测概率）与心电信号类型标签信息的交叉熵损失，λ为超参数十；Parameter update, with the penalty loss function as the objective function, use the optimizer on D _{2, train} to update all the learnable parameters in the encoder, decoder and classifier, the penalty loss function is the self-supervised loss function + λCrossEntropy, where The self-supervised loss function is the same as the self-supervised loss function in the self-supervised learning process. CrossEntropy represents the cross-entropy loss of the prediction probability of the ECG signal type (that is, the disease prediction probability) and the ECG signal type label information, and λ is the hyperparameter dozens;

本实施例中所使用的优化器为带有余弦学习率调度器的AdamW优化器，基础学习率0.001，权重衰减为0.05，批量大小为256，优化器冲量β₁=0.9, β₂=0.999，预热迭代次数为10，总迭代次数为100；The optimizer used in this embodiment is an AdamW optimizer with a cosine learning rate scheduler, the basic learning rate is 0.001, the weight decay is 0.05, the batch size is 256, the optimizer impulse β ₁ =0.9, β ₂ =0.999, The number of warm-up iterations is 10, and the total number of iterations is 100;

λ为惩罚损失函数中的惩罚权重，可以取一个预设的值；λ is the penalty weight in the penalty loss function, which can take a preset value;

前述前向传播和参数更新的过程不停迭代，直到满足训练终止条件（例如达到设定的最大迭代次数），则得到一个训练模型，将经过多次迭代处理得到一个训练模型的过程称为一次惩罚自监督学习的训练过程。The process of forward propagation and parameter update is iterated continuously until the training termination condition is met (for example, the maximum number of iterations is reached), and then a training model is obtained. The process of obtaining a training model after multiple iterations is called one-time Penalizing the training process for self-supervised learning.

接下来，在验证集D_2,vali上选择最优超参数十λ，使得心电信号分类的选择度量指标最大，所述选择度量指标包括AUC、F_β-score、准确率中的一种。λ的取值范围可以为0.00001-10，本实施例中，λ的可选值可以为0.05，0.1或者0.5；Next, select the optimal hyperparameter λ on the verification set D _2,vali , so that the selection metric index of ECG signal classification is the largest, and the selection metric index includes one of AUC, F _β -score, and accuracy . The value range of λ can be 0.00001-10, in this embodiment, the optional value of λ can be 0.05, 0.1 or 0.5;

其中AUC（Area Under Curve）为受试者工作特征曲线（ROC）曲线下方与坐标轴围成的面积，F_β-score的定义如下：Among them, AUC (Area Under Curve) is the area enclosed by the receiver operating characteristic curve (ROC) curve and the coordinate axis, and the definition of F _β -score is as follows:

，

,

其中β取0.5，1或2，precision是精准率，代表被模型判定为属于某种心电信号类型的样本中真实属于相应类型的样本数量的占比，recall是召回率，代表在真实属于某种心电信号类型的样本中被模型判定为属于相应心电信号类型的样本数量的占比；准确率指的是疾病分类的准确率；上述可度量指标均为公知常识。Among them, β takes 0.5, 1 or 2, precision is the precision rate, which represents the proportion of the number of samples that actually belong to the corresponding type among the samples judged by the model to belong to a certain type of ECG signal, and recall is the recall rate, which represents the number of samples that actually belong to a certain type of ECG signal. Among the samples of one type of ECG signal, the proportion of the number of samples determined by the model to belong to the corresponding type of ECG signal; the accuracy rate refers to the accuracy rate of disease classification; the above measurable indicators are common knowledge.

另外，说明一下在训练集上进行惩罚自监督学习的训练、在验证集上选择最优的超参数十的完整操作。与前述选择最优的第一超参数组合类似地，假定有M个待选的超参数十λ，对于每一种取值，以自监督学习中最终选择的模型为初始模型，利用训练集D_2,train执行前述一次完整的惩罚自监督学习的训练过程（即前述惩罚自监督学习中的前向传播和参数更新多次迭代生成模型的过程），得到一个对应的模型；对于所有λ的取值，得到M个模型，对于M个模型中的每一个，将验证集D_2,vali中的心电信号输入切割器进行处理得到心电全信号向量组，再将心电全信号向量组和分类向量拼接后输入编码器进行处理，得到编码后的心电全信号向量组和编码后的分类向量，其中，根据transformer的结构，编码后的分类向量依赖于心电信号全向量组；最后将编码后的分类向量和D_2,vali中的患者特征信息输入分类器，进行处理后得到心电信号类型的预测概率，再综合所有的预测概率确定心电信号类型的选择度量指标的取值；从M个模型的选择度量指标取值中选择最大的一个，该模型对应的超参数十λ的取值即为最优的λ，将选择出的模型作为下一步进行模型微调处理的初始模型。In addition, explain the complete operation of punishing self-supervised learning training on the training set and selecting the optimal hyperparameters on the verification set. Similar to the aforementioned selection of the optimal first hyperparameter combination, assuming that there are M hyperparameters λ to be selected, for each value, the model finally selected in self-supervised learning is used as the initial model, and the training set is used D _{2, train} executes the aforementioned complete training process of penalty self-supervised learning (that is, the process of forward propagation and parameter update multiple iterations in the aforementioned penalty self-supervised learning), and obtains a corresponding model; for all λ Take the value to get M models, for each of the M models, input the ECG signal in the verification set D _2,vali to the cutter for processing to obtain the ECG full signal vector group, and then convert the ECG full signal vector group After splicing with the classification vector, input the encoder for processing, and obtain the encoded ECG full signal vector group and the encoded classification vector, wherein, according to the structure of the transformer, the encoded classification vector depends on the ECG signal full vector group; finally Input the encoded classification vector and the patient characteristic information in D _2,vali into the classifier, and obtain the predicted probability of the ECG signal type after processing, and then integrate all the predicted probabilities to determine the value of the selected measurement index for the ECG signal type ; Select the largest one from the selection metric values of the M models, and the value of the hyperparameter λ corresponding to the model is the optimal λ, and the selected model will be used as the initial stage of the model fine-tuning process in the next step Model.

如前所述，如图3所示，本申请中构建的心电自监督模型包括切割器、双分类掩蔽器、编码器、解码器和分类器，接下来，对于各个组成的具体功能进行详细介绍，并对十个超参数进行详细描述：As mentioned above, as shown in Figure 3, the ECG self-supervised model constructed in this application includes a cutter, a dual classification masker, an encoder, a decoder, and a classifier. Next, the specific functions of each component are described in detail. Introduction, and a detailed description of the ten hyperparameters:

切割器，用于将输入的每个心电信号切割为行数为K，列数为d_patch的互不相交的d_v个子矩阵，并将子矩阵向量化，获得元素个数为d_v的心电信号全向量组{x₁,…,x_dv}，其中d_patch为超参数一,d_v为超参数二，K为心电导联数；A cutter for cutting each input ECG signal into a number of K rows and a number of columns of d _patch disjoint d _v sub-matrixes, and vectorizing the sub-matrices to obtain d v sub-matrices whose number of elements is d _v ECG signal full vector group {x ₁ ,…,x _dv }, where d _patch is hyperparameter 1, d _v is hyperparameter 2, and K is the number of ECG leads;

可选地，K为12，超参数一d_patch为10-200，超参数二d_v为25-500，本实施例中K=12，d_patch=25，以及d_v=200。Optionally, K is 12, hyperparameter one d _patch is 10-200, and hyperparameter two d _v is 25-500. In this embodiment, K=12, d _patch =25, and d _v =200.

双分类掩蔽器，用于接收心电信号全向量组{x₁,…,x_dv}，从中不放回等概率随机抽取 T+T′个向量，其中, T+T′≤d_v，前T个向量构成自训练向量组，后T′向量构成估计向量组，T和T′分别为超参数三和超参数四，输出为自训练向量组以及自估计向量组；再对自估计向量组进行变换，得到变换后的自估计向量组；The dual classification masker is used to receive the full vector set of ECG signals {x ₁ ,…,x _dv }, and randomly extract T+T′ vectors with equal probability of no replacement, where, T+T′≤d _v , the former T vectors constitute a self-training vector group, and the latter T′ vectors constitute an estimated vector group, T and T′ are hyperparameters three and hyperparameters four, respectively, and the output is a self-training vector group and a self-estimated vector group; then the self-estimation The vector group is transformed to obtain the transformed self-estimated vector group;

超参数三T为5-400，超参数四T′为5-400，本实施例中，T=50，T′=100。Hyperparameter three T is 5-400, hyperparameter four T' is 5-400, in this embodiment, T=50, T'=100.

其中超参数五L为1-32，超参数六d_encoder为128-1280，超参数七h_encoder为3-16，本实施例中L=12，d_encoder=384，h_encoder=6或者L=12，d_encoder=256，h_encoder=4。Among them, the hyperparameter five L is 1-32, the hyperparameter six d _encoder is 128-1280, and the hyperparameter seven h _encoder is 3-16. In this embodiment, L=12, d _encoder =384, h _encoder =6 Or L=12, d _encoder =256, h _encoder =4.

解码器，由复原层，位置嵌入层，1个隐藏维度为d_decoder，注意力头为h_decoder，的Transformer子块，以及1个全连接层构成，其中d_decoder为超参数八，其中h_decoder为超参数九，解码器只在预训练阶段使用，解码器输入编码后的自训练向量组和分类向量，解码输出预测向量；The decoder consists of a restoration layer, a position embedding layer, a hidden dimension of d _decoder , an attention head of h _decoder , a Transformer sub-block, and a fully connected layer, where d _decoder is a hyperparameter eight, where h _Decoder is hyperparameter nine, and the decoder is only used in the pre-training stage. The decoder inputs the encoded self-training vector group and classification vector, and decodes and outputs the prediction vector;

超参数八d_decoder为128-1280，超参数九h_decoder为4-10，本实施例中d_decoder=256，h_decoder=8或者d_decoder=128，h_decoder=4。The hyperparameter eight d _decoder is 128-1280, and the hyperparameter nine h _decoder is 4-10. In this embodiment, d _decoder =256, h _decoder =8 or d _decoder =128, h _decoder =4.

分类器，由1个全连接层和1个以sigmoid为激活函数的激活层构成，输入为编码器输出的编码后的自训练向量组，输出为心电信号类型的预测概率值。The classifier consists of a fully connected layer and an activation layer with sigmoid as the activation function. The input is the encoded self-training vector group output by the encoder, and the output is the predicted probability value of the ECG signal type.

另外，前述的数据预处理可以按照如下步骤进行：In addition, the aforementioned data preprocessing can be performed according to the following steps:

d、对患者特征信息中的数值型变量进行min-max标准化处理，对特征信息中的分类变量进行0-1编码，获得心电的患者特征信息z_j，j=1,...m；d. Perform min-max standardization processing on the numerical variables in the patient characteristic information, perform 0-1 coding on the categorical variables in the characteristic information, and obtain the patient characteristic information z _j of the ECG, j=1,...m;

e、获取多导联心电信号数据集D₁、关联数据集D₂，其中D₁={X_i:i=1,…,n}表示所有的心电信号；D₂={(X_j,z_j,y_j):j=1,…,m}表示剔除患者特征信息或心电信号类型标签缺失的多导联心电信号后，剩余的心电信号及患者特征信息、心电信号类型标签，其中X_j为剔除患者特征信息或心电信号类型标签缺失的多导联心电信号后，剩余的心电信号，z_j为患者特征信息，y_j为心电信号类型标签。e. Obtain multi-lead ECG data set D ₁ and associated data set D ₂ , where D ₁ ={X _i :i=1,...,n} represents all ECG signals; D ₂ ={(X _j ,z _j ,y _j ):j=1,…,m} means that after removing the multi-lead ECG signal with missing patient characteristic information or ECG signal type label, the remaining ECG signal and patient characteristic information, ECG signal Type label, where X _j is the remaining ECG signal after excluding the multi-lead ECG signal with patient characteristic information or ECG signal type label missing, z _j is patient characteristic information, and y _j is the ECG signal type label.

所述的多导联心电信号是一个K×S的数值矩阵，其中K为导联数，S为采集的样本点数，先对心电信号进行滤波去噪处理；再将滤波处理后的心电信号标准化，使得数据范围在-1到1之间；对标准化后的心电信号填充数值为0的列，以确保填充后的列数能整除超参数一，获得心电信号X_i，i=1,...n；对特征信息中的数值型变量进行min-max标准化处理，对特征信息中的分类变量进行0-1编码，获得心电的人工特征z_j，j=1,...m，其中数值型变量指取值是数值型数据，比如年龄，分类型变量是事物类别的一个名称，取值是分类数据，比如性别；获取多导联心电信号数据集D₁、关联数据集D₂，其中D₁={X_i:i=1,…,n}表示所有的心电信号观测值；D₂={(X_j,z_j,y_j):j=1,…,m}表示剔除患者特征信息或心电信号类型标签信息缺失的多导联心电信号后，剩余的心电信号及患者特征信息、心电信号类型标签信息；其中y_j为心电信号类型标签。The multi-lead electrocardiographic signal is a numerical matrix of K×S, wherein K is the number of leads, and S is the number of sample points collected. First, the electrocardiographic signal is filtered and denoised; Standardize the electrical signal so that the data range is between -1 and 1; fill the column with a value of 0 for the standardized ECG signal to ensure that the number of filled columns can divide the hyperparameter one, and obtain the ECG signal X _i , i=1,...n; perform min-max standardization processing on the numerical variables in the feature information, and perform 0-1 encoding on the categorical variables in the feature information to obtain the artificial feature z _j of ECG, j=1, ...m, where the numerical variable means that the value is numerical data, such as age, the categorical variable is a name of the object category, and the value is categorical data, such as gender; obtain multi-lead ECG signal data set D ₁ , associated data set D ₂ , where D ₁ ={X _i :i=1,…,n} represents all ECG signal observations; D ₂ ={(X _j ,z _j ,y _j ):j=1 ,...,m} represent the remaining ECG signals, patient characteristic information, and ECG signal type label information after excluding multi-lead ECG signals with missing patient characteristic information or ECG signal type label information; where y _j is ECG Signal type label.

举例来说，i=1，X₁是个12*5000的数值矩阵；j=1 ,z₁是一个一维数组,（0.5,1，1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,0），其中0.5表示是年龄，第二个1表示是男性，后面的1或者0表示某项心电信号是否有某种异常；y₁=0，表示未患有某种心血管疾病。For example, i=1, X ₁ is a 12*5000 numerical matrix; j=1, z ₁ is a one-dimensional array, (0.5,1,1,1,0,1,1,1,0,1 ,1,1,0,1,1,1,0,1,1,1,0,1,0), where 0.5 means age, the second 1 means male, and the following 1 or 0 means something Whether there is any abnormality in the ECG signal; y ₁ =0, which means that there is no cardiovascular disease.

所述模型微调包括如下步骤：The model fine-tuning includes the following steps:

前向传播，将D_2,train中的心电信号输入切割器进行处理，得到心电信号全向量组，再将心电信号全向量组和分类向量拼接后输入编码器进行处理，得到编码后的心电信号全向量组和编码后的分类向量；将编码后的分类向量和D_2,train中的患者特征信息输入分类器进行处理，得到心电信号类型的预测概率；Forward propagation, the ECG signal in D _{2, train} is input to the cutter for processing, and the full vector group of the ECG signal is obtained, and then the full vector group of the ECG signal and the classification vector are spliced and input to the encoder for processing, and the encoded The full vector group of electrocardiographic signals and the classification vector after encoding; The classification vector after encoding and the patient feature information input classifier in D _{2, train} are processed, obtain the prediction probability of electrocardiographic signal type;

参数更新，以反映心电信号类型的预测概率与D_2,train中的心电信号类型标签之间的交叉熵损失函数为目标函数，在D_2,train上使用优化器更新编码器和分类器中所有可学习参数；Parameter update, using the cross-entropy loss function between the predicted probability of ECG type and the ECG type label in D _{2, train} as the objective function, using the optimizer to update the encoder and classifier on D _{2, train} All learnable parameters in ;

前述前向传播和参数更新的过程不停迭代，直到满足训练终止条件（例如达到设定的最大迭代次数），则得到一个训练模型，将经过多次迭代处理得到一个训练模型的过程称为一次微调的训练过程。The process of forward propagation and parameter update is iterated continuously until the training termination condition is met (for example, the maximum number of iterations is reached), and then a training model is obtained. The process of obtaining a training model after multiple iterations is called one-time Fine-tuned training process.

接下来，在关联数据验证集D_2,vali上选择第二最优超参数组合(具体包括超参数一d_patch，超参数二d_v，超参数三T，超参数四T′，超参数五L，超参数六d_encoder,超参数七h_encoder)，使得心电信号分类的选择度量指标最大。Next, select the second optimal hyperparameter combination on the associated data verification set D _2,vali (specifically including hyperparameter one d _patch , hyperparameter two d _v , hyperparameter three T, hyperparameter four T′ , hyperparameter five L, hyperparameter six d _encoder , hyperparameter seven h _encoder ), so that the selection index for ECG signal classification is the largest.

其中交叉熵损失函数为公知常识，选择度量指标包括AUC、F_β-score、准确率中的一种；本实施例中所使用的优化器为带有余弦学习率调度器的AdamW优化器，基础学习率0.001，权重衰减为0.05，批量大小为256，优化器冲量β₁=0.9, β₂=0.999，预热迭代次数为5，总迭代次数为50。Wherein the cross-entropy loss function is common knowledge, and the selected metrics include one of AUC, F _β -score, and accuracy; the optimizer used in this embodiment is an AdamW optimizer with a cosine learning rate scheduler, based on The learning rate is 0.001, the weight decay is 0.05, the batch size is 256, the optimizer impulse β ₁ =0.9, β ₂ =0.999, the number of warm-up iterations is 5, and the total number of iterations is 50.

另外，结合前述预训练，说明一下预训练和微调的完整操作。假定有X个待选的第二超参数组合，对于每一种超参数组合，利用训练集D_1,train和D_2,train、以及验证集D_1,vali和D_2,vali，通过预训练的处理，得到最优的第一超参数组合、最优的超参数十和对应的惩罚自监督学习后的模型A，以该惩罚自监督学习后的模型A为初始模型，利用训练集执行前述微调中一次完整的微调训练过程（即前述微调中前向传播和参数更新多次迭代生成模型的过程），得到一个对应的模型；对于所有的第二超参数组合，得到X个模型，对于X个模型中的每一个，将验证集D_2,vali的信号输入该模型的编码器和分类器，进行处理后得到心电信号分类的选择度量指标的取值；从X个模型的选择度量指标取值中选择最大的一个，该模型对应的第二超参数组合即为最优的第二超参数组合，将选择出的模型作为待测试的模型，执行后续的测试处理；In addition, combined with the aforementioned pre-training, explain the complete operation of pre-training and fine-tuning. Assuming that there are X second hyperparameter combinations to be selected, for each hyperparameter combination, using the training set D _1,train and D _2,train and the verification set D _1,vali and D _2,vali , through pre-training The optimal first hyperparameter combination, the optimal hyperparameter ten and the corresponding model A after penalty self-supervised learning are obtained, and the model A after penalty self-supervised learning is used as the initial model, and the training set is used to execute A complete fine-tuning training process in the aforementioned fine-tuning (that is, the process of forward propagation and multiple iterations of parameter updating in the aforementioned fine-tuning to generate a model), obtains a corresponding model; for all second hyperparameter combinations, X models are obtained, and for For each of the X models, the signal of the verification set D _{2, vali} is input into the encoder and classifier of the model, and after processing, the value of the selection metric index for ECG signal classification is obtained; from the selection metric of the X models Select the largest one of the index values, the second hyperparameter combination corresponding to the model is the optimal second hyperparameter combination, and use the selected model as the model to be tested, and perform subsequent test processing;

所述模型测试包括如下步骤：Described model test comprises the following steps:

在关联数据测试集D_2,test上，通过选择度量指标评估模型效果，当模型评估结果符合预设要求，则可以使用该模型，如果不符合，则调整模型参数，重复模型预训练以及微调步骤，直至模型评估结果符合预设要求。On the associated data test set D _2,test , evaluate the effect of the model by selecting metrics. When the model evaluation result meets the preset requirements, the model can be used. If not, adjust the model parameters and repeat the model pre-training and fine-tuning steps. , until the model evaluation results meet the preset requirements.

微调与测试阶段，编码器的输入为心电信号全向量组和一个分类向量，输出为一组编码后的心电信号全向量组和编码后的一个分类向量，其中所述的分类向量为人工添加的可学习分类向量；编码后的分类向量同患者特征信息一起进入分类器，输出为心电信号类型的预测概率值；将心电信号类型的预测概率值与真实值比较，通过选择度量指标评估模型效果。预设要求是：AUC 0.9以上或F1-score 0.75以上或准确率80%以上。本实施例中，AUC 0.9以上。 In the stage of fine-tuning and testing, the input of the encoder is a full vector group of ECG signals and a classification vector, and the output is a set of encoded full vector groups of ECG signals and a classification vector after encoding, wherein the classification vector is artificial The added learnable classification vector; the encoded classification vector enters the classifier together with the patient feature information, and the output is the predicted probability value of the ECG signal type; compare the predicted probability value of the ECG signal type with the real value, and select the measurement index Evaluate model performance. The preset requirements are: AUC above 0.9 or F1-score above 0.75 or accuracy above 80%. In this embodiment, the AUC is above 0.9.

如图4所示，一种多导联心电信号的类型识别系统，包括数据采集模块、数据预处理模块、数据集划分模块、模型生成模块、服务计算模块，其中As shown in Figure 4, a type recognition system for multi-lead ECG signals includes a data acquisition module, a data preprocessing module, a data set division module, a model generation module, and a service calculation module, wherein

所述的数据采集模块用于获取训练数据，包括n个多导联心电信号，以及与多导联心电信号中的部分多导联心电信号相关联的患者特征信息与心电信号类型标签；The data acquisition module is used to obtain training data, including n multi-lead ECG signals, and patient characteristic information and ECG signal types associated with part of the multi-lead ECG signals in the multi-lead ECG signals Label;

数据预处理模块，用于基于所述n个多导联心电信号生成表示所有心电信号的多导联心电信号数据集D₁，对应样本量为n；基于n个多导联心电信号以及与多导联心电信号相关联的患者特征信息与心电信号类型标签信息，剔除患者特征信息或心电信号类型标签信息的多导联心电信号，生成关联数据集D₂，对应样本量为m，其中m≤n；The data preprocessing module is used to generate a multi-lead ECG data set D ₁ representing all ECG signals based on the n multi-lead ECG signals, and the corresponding sample size is n; based on the n multi-lead ECG signals signal and the patient characteristic information and ECG signal type label information associated with the multi-lead ECG signal, the multi-lead ECG signal with the patient characteristic information or ECG signal type label information is eliminated, and the associated data set D ₂ is generated, corresponding to The sample size is m, where m≤n;

所述的模型生成模块，用于基于多导联心电信号、特征信息、标签信息及搭建的模型框架，完成模型训练，获得已训练的心电自监督模型；The model generation module is used to complete model training based on multi-lead ECG signals, feature information, label information and built model framework, and obtain a trained ECG self-supervised model;

该类型识别系统基于所述的心电自监督模型，通过自监督学习方法利用了现有技术不能利用的心电信号数据，仅使用少量的多模态数据就可以得到各种心电信号类型（例如各类心血管疾病相关联的心电信号类型）对应的概率信息，能对成人先心病、瓣膜病、冠心病、心肌病、肺心病等现有技术无法识别的心血管疾病进行早期筛选。This type identification system is based on the ECG self-supervised model, utilizes the ECG signal data that cannot be used in the prior art through the self-supervised learning method, and can obtain various ECG signal types with only a small amount of multi-modal data ( For example, the probability information corresponding to various types of cardiovascular diseases (ECG signal types) can provide early screening for adult congenital heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary heart disease and other cardiovascular diseases that cannot be identified by existing technologies.

如图5所示，所述的模型生成模块包括样本库、模型训练引擎和模型库，其中样本库为基于数据采集模块、数据预处理模块、数据集划分模块生成多导联心电信号数据集D₁和关联数据集D₂，并完成存储；模型训练引擎为基于样本库存储的多导联心电信号数据集和关联数据集，完成模型训练；模型库，用于存储已训练的心电自监督模型；As shown in Figure 5, the described model generation module includes a sample library, a model training engine and a model library, wherein the sample library generates multi-lead ECG data sets based on the data acquisition module, data preprocessing module, and data set division module D ₁ and the associated data set D ₂ are completed and stored; the model training engine is based on the multi-lead ECG signal data set and associated data set stored in the sample library to complete the model training; the model library is used to store the trained ECG self-supervised model;

所述的服务计算模块包括服务触发引擎、模型计算引擎；其中服务触发引擎，用于实现接收心电信号的类型识别请求，并发送给数据采集模块；数据采集模块，用于对于接收的心电信号的类型识别请求，自动采集该条请求对应的模型预测所需的数据，包括多导联心电信号、特征信息，并发送给模型计算引擎；模型计算引擎，用于调用已训练好的心电自监督模型，基于多导联心电信号、特征信息，获得与设定的各种心电信号类型对应的概率信息，并完成结果存储。The service calculation module includes a service trigger engine and a model calculation engine; wherein the service trigger engine is used to realize the type identification request of the received ECG signal, and sends it to the data acquisition module; the data acquisition module is used for receiving the ECG signal The signal type identification request automatically collects the data required for model prediction corresponding to the request, including multi-lead ECG signals and feature information, and sends them to the model calculation engine; the model calculation engine is used to call the trained heart The electrical self-supervision model, based on the multi-lead ECG signal and feature information, obtains the probability information corresponding to the various types of ECG signal set, and completes the result storage.

实施例2Example 2

如图6所示，本实施例所述的多导联心电信号的类型识别系统与实施例1基本相同，区别在于还包括前端交互模块、动态监测模块，所述的前端交互模块，包括识别结果呈现子模块和标签存储子模块；其中，识别结果呈现子模块用于基于服务计算模块得到的与各种心电信号类别对应的概率信息，进行显示；标签信息存储子模块指根据与各种心电信号类型对应的概率信息，确定最终的心电信号类型标签；特别地，将模型应用过程中产生的心电信号类型标签动态更新至模型生成模块的样本库，不断积累新数据，便于后续模型的更新优化与迭代；As shown in Figure 6, the type identification system of the multi-lead ECG signal described in this embodiment is basically the same as that of Embodiment 1, the difference is that it also includes a front-end interaction module and a dynamic monitoring module, and the front-end interaction module includes an identification The result presentation sub-module and the label storage sub-module; wherein, the identification result presentation sub-module is used for displaying the probability information corresponding to various ECG signal categories obtained by the service calculation module; The probability information corresponding to the ECG signal type determines the final ECG signal type label; in particular, the ECG signal type label generated during the model application process is dynamically updated to the sample library of the model generation module, and new data is continuously accumulated to facilitate follow-up Model update optimization and iteration;

所述的动态监测模块，包括服务监测评价子模块和服务更新触发引擎；其中，服务监测评价子模块用于基于自动积累的应用过程中产生的心电信号类别标签信息，实时评估模型识别效果；服务更新触发引擎，用于当模型效果不满足预设要求时自动触发模型及服务的更新，实现模型动态优化更新。The dynamic monitoring module includes a service monitoring and evaluation sub-module and a service update trigger engine; wherein, the service monitoring and evaluation sub-module is used to evaluate the model recognition effect in real time based on automatically accumulated ECG signal category label information generated during the application process; The service update trigger engine is used to automatically trigger the update of the model and service when the model effect does not meet the preset requirements, so as to realize the dynamic optimization update of the model.

前端交互模块，为临床医师提供了可视化的类型识别结果，辅助临床医师进行诊断，同时将模型应用过程中产生的心电信号类型标签可实时动态更新至模型生成模块的样本库，不断积累新数据，便于后续模型的更新优化与迭代；The front-end interaction module provides clinicians with visual type recognition results to assist clinicians in diagnosis. At the same time, the ECG signal type labels generated during the model application process can be dynamically updated to the sample library of the model generation module in real time to continuously accumulate new data. , to facilitate subsequent model update optimization and iteration;

动态监测模块基于心电信号类型的概率信息和标签信息，实时评估模型识别效果，当模型效果不满足预设要求时自动触发模型及服务的更新，实现模型动态优化更新，预设要求为AUC 0.9以上或F1-score 0.75以上或准确率80%以上。本实施例中，AUC 0.9以上。The dynamic monitoring module evaluates the model recognition effect in real time based on the probability information and label information of the ECG signal type. When the model effect does not meet the preset requirements, it automatically triggers the update of the model and service to realize the dynamic optimization and update of the model. The preset requirement is AUC 0.9 Above or F1-score above 0.75 or accuracy rate above 80%. In this embodiment, the AUC is above 0.9.

如图7所示，所述的多导联心电信号的类型识别系统，还包括心电判读模块，所述的心电判读模块内置有心电判读模型，可对心电图进行判读，识别类别为心律失常的情况。As shown in Figure 7, the type identification system of the multi-lead ECG signal also includes an ECG interpretation module, the ECG interpretation module has a built-in ECG interpretation model, which can interpret the ECG, and the identification category is heart rhythm abnormal situation.

所述的心电判读模型为现有技术，所述的心律失常包括但不限于窦性心律不齐、房性早搏、室性早搏、房室传导阻滞、心房颤动。The ECG interpretation model is the prior art, and the arrhythmia includes but not limited to sinus arrhythmia, atrial premature beat, ventricular premature beat, atrioventricular block, and atrial fibrillation.

智能心电辅助系统，包括所述的多导联心电信号的类型识别系统，还包括知识库，所述的知识库存储了处理建议，当多导联心电信号的类型识别系统给出类型识别结果后，调用知识库，将知识库中符合预设条件的处理建议输出。预设条件为处理建议与心电类别识别结果相匹配。The intelligent ECG auxiliary system includes the type identification system of the multi-lead ECG signal, and also includes a knowledge base, the knowledge base stores processing suggestions, when the type identification system of the multi-lead ECG signal gives the type After the recognition result, the knowledge base is invoked, and the processing suggestions in the knowledge base that meet the preset conditions are output. The preset condition is that the processing suggestion matches the ECG category identification result.

如图8所示，一种电子设备，包括：处理器；As shown in Figure 8, an electronic device includes: a processor;

一种非瞬时计算机可读存储介质，所述非瞬时计算机可读存储介质存储指令，所述指令在由处理器执行时使得所述处理器执行所述的多导联心电信号的类型识别方法。以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。A non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores instructions, and when the instructions are executed by a processor, the processor executes the method for identifying the type of the multi-lead ECG signal . The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

本申请附图中的流程图和框图，示出了按照本申请公开的各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或者代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应该注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同附图中所标准的顺序发生。例如，两个连接地表示的方框实际上可以基本并行地执行，它们有时也可以按照相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或者流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the drawings of the present application show the architecture, functions and operations of possible implementations of the systems, methods and computer program products according to various embodiments disclosed in the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the different figures. For example, two blocks shown connected in series may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

本领域技术人员可以理解，本申请的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合，即使这样的组合或结合没有明确记载于本申请中。特别地，在不脱离本申请精神和教导的情况下，本申请的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合，所有这些组合和/或结合均落入本申请公开的范围。Those skilled in the art can understand that various combinations and/or combinations can be made of the features described in the various embodiments and/or claims of the present application, even if such combinations or combinations are not explicitly recorded in the present application. In particular, without departing from the spirit and teaching of the present application, various combinations and/or combinations can be made of the features recorded in the various embodiments and/or claims of the present application, and all these combinations and/or combinations fall into The scope disclosed in this application.

本文中应用了具体实施例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思路，并不用于限制本申请。对于本领域的技术人员来说，可以依据本申请的思路、精神和原则，在具体实施方式及应用范围上进行改变，其所做的任何修改、等同替换、改进等，均应包含在本申请保护的范围之内。In this paper, specific examples are used to illustrate the principles and implementation methods of the present application. The descriptions of the above examples are only used to help understand the method and core idea of the present application, and are not intended to limit the present application. For those skilled in the art, changes can be made in the specific implementation and application scope according to the ideas, spirit and principles of this application, and any modifications, equivalent replacements, improvements, etc. made by it should be included in this application within the scope of protection.

Claims

1. The type identification method of the multi-lead electrocardiosignal is characterized by comprising the following steps of:

data acquisition, namely acquiring n multi-lead electrocardiosignals, and patient characteristic information and an electrocardiosignal type label which are associated with part of the multi-lead electrocardiosignals in the n multi-lead electrocardiosignals;

data preprocessing, generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; generating an associated dataset D based on the partial multi-lead electrocardiograph signal and patient characteristic information and an electrocardiograph signal type tag associated with the partial multi-lead electrocardiograph signal ₂ The corresponding sample size is m, wherein m is less than or equal to n;

data set division, namely dividing the multi-lead electrocardiosignal data set D ₁ Divided into multiple lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali The associated data set D ₂ Dividing into associated data training sets D _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

An electrocardiograph self-supervision model framework is constructed, and the model framework is based on a transducer module and comprises a cutter, a double-classification masker, an encoder, a decoder and a classifier;

model pre-training, initializing model parameters, and then training the multi-lead electrocardiosignal training set D _1,train Multi-lead electrocardiosignal verification set D _1,vali Training set D of associated data _2,train Associated data verification set D _2,vali Inputting a model framework, performing self-supervision learning, punishing the self-supervision learning, and obtaining a pre-trained electrocardiographic self-supervision model;

model fine tuning based on the associated data training set D _2,train Number of correlationsAuthenticated set D _2,vali Fine tuning the pre-trained electrocardiographic self-supervision model to finish the training of the model;

model testing, based on the associated data test set D _2,test Testing the trimmed electrocardiograph model, evaluating the model effect, if the model evaluation result does not meet the preset requirement, adjusting model parameters, and repeating the model pre-training and the model trimming until the model evaluation result meets the preset requirement;

In the application stage, the acquired multi-lead electrocardiosignals and the characteristic information of the patient are input into a trained model to obtain probability information corresponding to various set electrocardiosignal types.

2. The method of claim 1, wherein the patient characteristic information includes age, gender, and an electrocardiographic anomaly, the electrocardiographic type tag representing information whether multimodal data matching the partial multi-lead electrocardiographic signal contains a cardiovascular disease, the multimodal data being CT, ultrasound, contrast, or nuclear magnetic data acquired for the same patient, the cardiovascular disease including at least one of adult coronary heart disease, valvular disease, coronary heart disease, cardiomyopathy, pulmonary vascular disease.

3. The method according to claim 2, characterized in that the model pre-training comprises the steps of:

a. randomly initializing model parameters;

b. in the multi-lead electrocardiosignal training set D _1,train The multi-lead electrocardiosignal verification set D _1,vali Performing self-supervision learning on a cutter, a double-classification masker, an encoder and a decoder based on the model;

c. at the associated data training set D _2,train Associated data verification set D _2,vali And performing punishment self-supervision learning on the basis of the cutter, the double-classification masker, the encoder, the decoder and the classifier of the model.

4. A method according to claim 3, wherein the model-based cutter, dual class mask, encoder, decoder self-supervised learning comprises the steps of:

forward propagation of the D _1,train Sequentially passing through a cutter and a double-classification masker to obtain a self-training vector group and a transformed self-estimation vector group, splicing the self-training vector group and one classification vector, sequentially passing through an encoder and a decoder, and outputting a group of prediction vectors as estimation results of the transformed self-estimation vector group; wherein the classification vector is a preset learnable classification vector;

parameter updating, at D, with a self-monitoring loss function reflecting errors between the prediction vector and the transformed set of self-estimated vectors as an objective function _1,train Updating all the learnable parameters in the encoder and decoder using the optimizer;

in verification set D _1,vali The optimal first superparameter combination is chosen such that the self-supervising loss function is minimal compared to the total amount of all learnable parameters in the decoder.

5. The method of claim 4, wherein the transforming comprises at least one of sampling a dimension reduction, an element-to-element power exponent, a normalization within a vector, classifying according to a threshold, the self-supervising loss function comprising l ₁ Loss/l ₂ One of the loss and cross entropy loss functions; the first hyper-parameter combination includes a hidden dimension and an attention header of a transducer sub-block of the decoder in a model.

6. The method of claim 5, wherein said penalized self-supervised learning comprises the steps of:

forward propagation, D _2,train The electrocardiosignal in the heart is sequentially transmitted through a cutter and a double-classification masking device to obtain a self-training vector group and a transformed self-estimated vector group, and the self-training vector group and a classification vector are spliced and then transmitted through an encoderOutputting a coded self-training vector group and a coded classification vector, wherein one classification vector is a preset learnable classification vector; the encoded self-training vector group and the encoded classification vector enter branch one, the encoded classification vector and the D _2,train The patient characteristic information in the model is entered into a branch II;

the decoder processes the input self-training vector group after coding to obtain a predictive vector which is used for estimating the self-estimated vector group after transformation;

Branch two, the encoded classification vector and the D _2,train Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating, taking penalty loss function as objective function, at D _2,train Updating all the learnable parameters in the encoder, the decoder and the classifier by using an optimizer, wherein the penalty loss function is a self-supervision loss function +lambda cross Entropy of a self-estimated vector group after prediction vector and transformation, wherein lambda is tens of super-parameters, and the cross Entropy represents the prediction probability of the electrocardiosignal type and the cross entropy loss of the electrocardiosignal type label;

in verification set D _2,vali The optimal super parameter lambda is selected to maximize the selection measurement index of electrocardiosignal type identification, wherein the selection measurement index comprises AUC and F _β -one of score, accuracy.

7. The method of claim 6, wherein the electrocardiographic self-monitoring model comprises:

a cutter for cutting each input electrocardiosignal into a number of columns K and a number of columns d _patch D of mutually exclusive _v Vectorizing the submatrices to obtain the element number d _v Is { x } of the global vector group of electrocardiosignals ₁ ,…,x _dv }, where d _patch The super parameter is one, K is the electrocardio lead number;

double-classification masker for receiving electrocardiosignal full-vector group { x } ₁ ,…,x _dv Equal probability random extraction of T+T 'vectors from which T+T'. Ltoreq.d is not put back _v The first T vectors form a self-training vector group, the last T 'vectors form an estimated vector group, T and T' are respectively a super parameter III and a super parameter IV, the output is the self-training vector group and a self-estimated vector group, and then the self-estimated vector group is transformed to obtain a transformed self-estimated vector group;

encoder consisting of projection layer, position embedding layer, and L hidden dimensions d connected in sequence _encoder The attention head is h _encoder Is formed by sequentially connecting transducer sub-blocks, wherein L is super-parameter five and d _encoder Is super parameter six, h _encoder In the pre-training stage, the input of the encoder is a self-training vector group and a classification vector, and the output is the encoded self-training vector group and classification vector; in the fine tuning and testing stage, the input of the encoder is an electrocardiosignal full vector group and a classification vector, and the input of the encoder is a group of coded electrocardiosignal full vector group and a classification vector, wherein the classification vector is a learnable classification vector added manually;

Decoder, composed of restoring layer, position embedding layer, 1 hidden dimension d _decoder The attention head is h _decoder And 1 full link layer, wherein d _decoder Eight is a super parameter, wherein h _decoder For super-parameter nine, the decoder is only used in the pre-training stage, the decoder inputs the encoded self-training vector group and classification vector, and decodes and outputs the prediction vector;

the classifier consists of a full connection layer and an activation layer, inputs the classification vectors after coding and characteristic information, and outputs a prediction probability value of the electrocardiosignal type.

8. The method according to any of claims 4-7, wherein the data preprocessing comprises the steps of:

a. filtering and denoising the electrocardiosignal;

b. the electrocardiosignals after the filtering treatment are standardized, so that the data range is between-1 and 1;

c. filling the standardized electrocardiosignal with a column with the value of 0 to ensure that the number of the filled column can be divided by the super parameter one to obtain the electrocardiosignal X _i ，i=1,...n；

d. Performing min-max standardization processing on the numerical variables in the patient characteristic information, and performing 0-1 coding on the classification variables in the patient characteristic information to obtain the patient characteristic information z of the electrocardio _j ，j=1,...m；

e. Acquisition of a Multi-lead electrocardiographic Signal dataset D ₁ Associated dataset D ₂ Wherein D is ₁ ={X _i I=1, …, n } represents all electrocardiographic signals; d (D) ₂ ={(X _j ,z _j ,y _j ) J=1, …, m } represents an associated dataset, where X _j Representing multi-lead electrocardiosignals, z _j Representing patient characteristic information associated with a multi-lead electrocardiograph signal, y _j Representing an electrocardiograph signal type tag associated with a multi-lead electrocardiograph signal.

9. The method of claim 8, wherein the model fine tuning comprises the steps of:

forward propagation, D _2,train Inputting the central electrocardiosignal into a cutter for processing to obtain an electrocardiosignal full-vector group, splicing the electrocardiosignal full-vector group with the classification vector, inputting the spliced electrocardiosignal full-vector group into an encoder for processing to obtain an encoded electrocardiosignal full-vector group and an encoded classification vector; sum D of the encoded classification vectors _2,train Inputting the patient characteristic information into a classifier for processing to obtain the prediction probability of the electrocardiosignal type;

parameter updating to predict probability and D for electrocardiosignal type _2,train Cross entropy loss of the electrocardiosignal type label in the center is taken as an objective function, and the cross entropy loss is taken as a D _2,train Updating all the learnable parameters in the encoder and the classifier by using the optimizer;

in the associated data verification set D _2,vali Selecting the optimal second super-parameter combination to maximize the selection measurement index of electrocardiosignal type identification; the second super-parameter combination includes the cutter Column number d of cut center electric signal _patch The number d of the submatrices after the central electric signal of the cutter is cut _v The number of vectors T of the self-training vector group, the number of vectors T' of the self-estimated vector group, the number of transducer sub-blocks included in the encoder, the hidden dimension L of the transducer sub-blocks in the encoder, the attention header h of the transducer sub-blocks in the encoder _encoder 。

10. The method of claim 9, wherein the model test comprises the steps of:

in the associated data test set D _2,test And finally, evaluating the model effect by selecting the measurement index, if the model evaluation result meets the preset requirement, allowing the model to be used, and if the model evaluation result does not meet the preset requirement, adjusting the model parameters, and repeating the model pre-training and fine tuning steps until the model evaluation result meets the preset requirement.

11. A type recognition system of multi-lead electrocardiosignals is characterized by comprising a data acquisition module, a data preprocessing module, a data set dividing module, a model generating module and a service calculating module, wherein the data acquisition module, the data preprocessing module, the data set dividing module, the model generating module and the service calculating module are arranged in the system

The data acquisition module is used for acquiring training data, and comprises n multi-lead electrocardiosignals, and patient characteristic information and electrocardiosignal type labels associated with part of the multi-lead electrocardiosignals in the n multi-lead electrocardiosignals;

The data preprocessing module is used for generating a multi-lead electrocardiosignal data set D representing all electrocardiosignals based on the n multi-lead electrocardiosignals ₁ The corresponding sample size is n; based on the n multi-lead electrocardiosignals, the patient characteristic information and the electrocardiosignal type label which are related to the multi-lead electrocardiosignals, the multi-lead electrocardiosignals of which the patient characteristic information or the electrocardiosignal type label is missing are removed, and a related data set D is generated ₂ The corresponding sample size is m, wherein m is less than or equal to n;

the data set dividing module is used for dividing the multi-lead electrocardiosignal dataSet D ₁ Divided into multiple lead electrocardiosignal training set D _1,train And multi-lead electrocardiosignal verification set D _1,vali The associated data set D ₂ Dividing into associated data training sets D _2,train Associated data verification set D _2,vali Associated data test set D _2,test ；

The model generation module is used for completing model training based on the multi-lead electrocardiosignal, the characteristic information, the label information and the built model frame, and obtaining and storing a trained electrocardio self-supervision model;

the service calculation module is used for receiving the electrocardiosignal type identification request and calling the trained electrocardiosignal self-supervision model to obtain probability information corresponding to the set electrocardiosignal types.

12. The system of claim 11, wherein the model generation module comprises a sample library, a model training engine, and a model library, wherein the sample library is a multi-lead electrocardiographic data set D from the received data set partitioning module ₁ Associated data set D ₂ And finishing the storage; the model training engine is used for completing model training based on a multi-lead electrocardiosignal data set and an associated data set stored in a sample library; the model library is used for storing a trained electrocardiographic self-supervision model;

the service computing module comprises a service triggering engine and a model computing engine; the service triggering engine is used for receiving the type identification request of the electrocardiosignal and the data carried by the request, and sending the request to the model calculation engine, wherein the data carried by the request comprises the multi-lead electrocardiosignal and the characteristic information of the patient; the model calculation engine is used for calling a trained electrocardio self-supervision model, obtaining probability information corresponding to various set electrocardio signal types based on multi-lead electrocardio signals carried by the request and the characteristic information of the patient, and completing result storage.

13. The multi-lead electrocardiographic signal type recognition system of claim 12 further comprising a front-end interaction module, a dynamic monitoring module,

The front-end interaction module comprises an identification result presentation sub-module and a label storage sub-module; the recognition result presentation submodule is used for carrying out visual prompt based on probability information corresponding to various electrocardiosignal types obtained by the service calculation module; the label storage sub-module is used for automatically acquiring a final electrocardiosignal type label generated in the application process and completing storage;

the dynamic monitoring module comprises a service monitoring evaluation submodule and a service update trigger engine; the service monitoring and evaluating sub-module is used for evaluating the model identification effect in real time based on the type label generated in the automatic accumulation application process; and the service update triggering engine is used for automatically triggering the update of the model and the service when the model effect does not meet the preset requirement, and realizing the dynamic optimization update of the model.

14. The system of claim 13, further comprising an electrocardiograph interpretation module, wherein the electrocardiograph interpretation module is embedded with an electrocardiograph interpretation model for interpreting an electrocardiograph and recognizing that the electrocardiograph signal is of an arrhythmia type.

15. An intelligent electrocardio-assisted system, which is characterized by comprising the type recognition system of the multi-lead electrocardiosignal as claimed in any one of claims 11 to 14 and a knowledge base, wherein the knowledge base stores processing suggestions, and when the type recognition system of the multi-lead electrocardiosignal gives a type recognition result, the knowledge base is called to output the processing suggestions meeting preset conditions in the knowledge base.

16. An electronic device, comprising: a processor;

a memory storing a program configured to implement the type recognition method of a multi-lead electrocardiographic signal according to any one of claims 1-10 when executed by the processor.

17. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the type recognition method of a multi-lead electrocardiographic signal according to any one of claims 1-10.