Disclosure of Invention
The invention aims to provide a report generation method of ultrasonic multi-section data for congenital heart disease, which is established on the basis of clinical basic requirements and improves the report generation efficiency of ultrasonic images.
The technical scheme of the invention provides a report generation method of ultrasonic multi-section data for congenital heart disease, which is characterized by comprising the following steps:
step 1, finishing training data and preprocessing;
and 2, completing ultrasonic image feature extraction by using an ultrasonic image feature extractor.
In a feature extractor of an ultrasonic image, a residual structure is adopted to transfer texture and color information of a shallow layer, 4 convolution modules are adopted, and 2 convolution layers, 2 batch normalization layers and 2 activation functions are arranged in each module;
step 3, setting a pathological label graph, automatically extracting various major and minor guest combinations from the report by using a language analyzer, manually screening the combinations to summarize 25 pathological labels, wherein each label comprises a positive observation result and a negative observation result which respectively represent pathology, and after the pathological label graph is extracted, the pathological label graph is used as additional label data to guide a feature extractor to learn;
step 4, extracting information by adopting a multi-frame ultrasonic image attention mechanism;
and step 5, establishing a multi-frame ultrasonic image report generation model, performing theme division and pathology label extraction on the reports in the data set to obtain 5 theme sentences and 25 pathology labels, fusing the input multiple multi-view ultrasonic images by adopting an attention mechanism, and constructing an initial input full-link graph and a full-link adjacency matrix of the 5 theme sentences.
Further, in step 2, first, the picture is changed in size in the image preprocessing operation to 224 × 224 size suitable for the input network, then the picture is passed through 7 × 7 convolutional layers, the picture size is changed to 112 × 112, then the picture size is changed to 56 × 56 through one maximum pooling layer of 3 × 3 and step size 2, then the picture is passed through 4 convolution modules, each convolution module contains 2 3 × 3 convolutional layers, and after 2 3 × 3 convolutional layers, the same distribution of features of each channel is maintained by the batch normalization layer and the ReLu activation layer.
Further, in step 3, a 25-node pathology label graph structure is constructed to simulate the relationship between pathologies
The invention has the beneficial effects that: the identification efficiency of the ultrasonic image is improved by utilizing an artificial intelligence mode, the structure of the model is generated based on the multi-section report of the congenital heart disease ultrasonic, the model is established on the basis of clinical basic requirements, a very complex network structure is not selected due to the requirement on the network speed, and the model established according to the standard also reaches the precision standard of the clinical requirements.
Detailed Description
The technical scheme of the invention is explained in detail in the following with reference to the attached drawings 1-4.
In order to achieve the purpose of the invention, the work of the classification method based on the congenital heart disease multi-ultrasonic section comprises the following aspects:
and step 1, finishing training data and preprocessing.
The model training data comprises 310 cases, wherein 61 cases of normal person section data, 104 cases of congenital heart disease atrial septal defect patient section data and 145 cases of congenital heart disease ventricular septal defect patient section data. The data classification method is provided by Wuhan Asia heart disease hospitals, and is classified by professional doctors of ultrasonic departments of the Wuhan Asia heart disease hospitals, so that the accuracy of section data classification is guaranteed. The training data are stored in the DICOM format in the sequence shown in table 1, and the number of frames in each slice is different, so that the training data can be pre-processed.
TABLE 1 ultrasonic cardiogram section each classification name table
And 2, completing ultrasonic image feature extraction by using an ultrasonic image feature extractor.
In the feature extractor of the ultrasonic image, the embodiment adopts a residual structure to transfer the information of texture, color and the like of a shallow layer, and simultaneously avoids the problem of gradient disappearance.
This embodiment employs a design of 4 convolution modules, 2 convolution layers, 2 batch normalization layers and 2 activation functions inside each module. In general, each ultrasound image only needs to pass through an 18-layer network structure, and the method is suitable for efficient ultrasound image feature extraction.
Based on the ResNet18 network, an ultrasonic image feature extractor is designed, and a preliminarily designed model structure is shown in FIG. 2. In the design of the model, the short connection mode of the residual structure is considered in the embodiment, and shallow features in the image are also preserved, so that the embodiment adopts the design mode of the residual structure for a convolution module in a network.
While this embodiment uses only 4 convolution modules due to the total number of layers. For each picture of each slice data, the embodiment inputs it into the network shown in fig. 3.
First, the picture is changed in size in an image preprocessing operation to 224 × 224 size suitable for an input network, then the picture is passed through 7 × 7 convolutional layers, the picture size is changed to 112 × 112, then the picture size is changed to 56 × 56 by one maximum pooling layer of 3 × 3 and step size 2, then 4 convolution modules are passed, each convolution module contains 2 3 × 3 convolutional layers, and after 2 3 × 3 convolutional layers, it is required to pass through a Batch Normalization layer (BN layer) and a ReLu activation layer to keep the same distribution of features of each channel. Before the output of each convolution module, the input features and the convolved features are added and output after passing through a second ReLu activation layer, so that the problem of gradient disappearance is avoided. The structure is referred to work [18] of He et al. After the input image passes through 4 convolution modules, the embodiment classifies the obtained features by using a softmax layer, and the softmax function is a function which is normalized after a group of numbers are expressed by indexes, and is also called a normalized index function, and the formula is shown as (1):
that is, for each class, the weight of the class is calculated in an exponential manner, and the probability that the feature belongs to the jth class is obtained. Due to the characteristics of the exponential function, the classification with low probability can be inhibited during normalization, the classification with high probability is improved, and the method is widely applied to multi-classification problems. After the softmax function is used, a 1 × 10 vector can be obtained, wherein each position i represents the probability that the single-frame picture belongs to the ith classification, and the largest value in the vector is selected to be determined as the classification of the single-frame picture. For the classification of the pathology labels, the embodiment introduces an additional full-link layer output branch for the feature extraction network, predicts a 1 × 25 vector, where each position i corresponds to the output of the ith pathology label, and then the embodiment uses a sigmoid function, where each position i represents the probability that the picture contains the ith pathology label.
And 3, setting a pathological label graph, automatically extracting various main and predicate guest combinations from the report by using a language analyzer, manually screening to summarize into 25 pathological labels, wherein each label comprises a positive observation result and a negative observation result which respectively represent pathology, and after the pathological label graph is extracted, using the pathological label graph as additional label data to guide a feature extractor to learn.
In the training of the ultrasonic image feature extractor, the embodiment can perform classification training on the images according to the angles of the images and whether the images contain obvious heart disease features. However, the angle alone and whether or not pathological features are included do not provide sufficient guidance for the feature extractor, which can result in image intra-class differences that are the same in final angle and that both include ASD or VSD features being too small for the versatility of automatic report generation. Therefore, this embodiment requires an additional image prior to assist the feature extractor in learning.
In the embodiment, a pathology label graph is innovatively introduced, and the embodiment considers that medical reports need to accurately describe various pathology characteristics, and the accuracy of the pathology description is far more important than the generation of pathology-independent words. Thus this embodiment.
In the training process of the ultrasonic image feature extractor, the embodiment requires that the extractor can accurately predict the section of the ultrasonic image, and for the image with obvious focus, the embodiment additionally requires that the extractor can predict the type of the congenital heart disease from the image, and under the assistance of a pathological label graph, parameters are fixed after the feature extractor is trained, so that accurate prior information of the ultrasonic image is provided for a subsequent report generation part.
Therefore, this embodiment creatively adds a structure of a pathology label map in the network, and the pathology label map structure is shown in fig. 4.
And 3.1, analyzing the sentences by using a language analyzer for the whole report data set, extracting the main and predicate object structures, and grouping the extracted main and predicate objects according to the description subject.
And 3.2, dividing the main predicate structure of each group into two types of description directions which respectively correspond to normal and abnormal conditions of pathology, and constructing a pathology label graph structure of 25 nodes by using a graph neural network at the end of the feature extractor to simulate the relation between pathologies.
Considering that the occurrence of the pathologies is not independent of each other, the embodiment needs to consider the correlation of the pathologies in the actual prediction, so the embodiment uses the graph neural network at the end of the feature extractor to construct the pathology label graph structure with 25 nodes for simulating the relationship between the pathologies.
Step 4, extracting information by adopting a multi-frame ultrasonic image attention mechanism;
because the ultrasonic image data is a sequence image, the sequence image has great redundancy, and a key problem is how to extract important information from redundant information and generate a report. In order to extract important information and reduce redundancy, the embodiment designs a multi-frame ultrasonic image attention mechanism.
For 20 ultrasound images, this embodiment first performs feature extraction using a pre-trained feature extractor to obtain features in dimension B × 20 × D, and then performs extrusion in dimension D through the first fully connected layer to obtain features in dimension B × 20 × D/r, where r is a set multiple, here 4. After activation by ReLu, the features are again changed back to Bx 20 XD/r size by the second fully connected layer. And finally, mapping the weight between [0,1] after the function is activated through sigmoid. And carrying out bit-wise multiplication operation on the output characteristics and the original characteristics to obtain the characteristics after weighting. In order to retain the original feature information, the weighted feature and the original feature need to be added together bit by bit.
And step 5, establishing a multi-frame ultrasonic image report generation model, performing theme division and pathology label extraction on the reports in the data set to obtain 5 theme sentences and 25 pathology labels, fusing the input multiple multi-view ultrasonic images by adopting an attention mechanism, and constructing an initial input full-link graph and a full-link adjacency matrix of the 5 theme sentences.
Since the length of the medical report is different and the description format is flexible, the embodiment performs topic division and pathology label extraction on the report in the data set, so as to obtain 5 topic sentences and 25 pathology labels. For the input 20 multi-view ultrasound images, this embodiment first fuses using the attention mechanism and then constructs an initial input fully-connected graph of 5 subject sentences and a fully-connected adjacency matrix. The overall model uses a graph convolution network and a recurrent neural network LSTM, as shown in fig. 3. The reports are generated gradually as the network iterates over time. And generating a word in each iteration step, then carrying out relationship modeling on nodes of 5 subject sentences among different subject sentences through a graph convolution once, and entering the next iteration. In a continuously iterative process, the 5-topic reports are finally generated and combined into the final report result for the input.
This embodiment designs the structure of a multi-slice report generation model based on the cardiology ultrasound. The model is established on the basis of clinical basic requirements, and due to the requirement on network speed, a very complex network structure is not selected, and the model established according to the standard also reaches the precision standard required by clinical application.
In the ultrasonic image feature extraction model structure, the embodiment adopts a residual structure to transmit the information of texture, color and the like of a shallow layer, and simultaneously avoids the problem of gradient disappearance. Due to the speed limitation, the embodiment adopts the design of 4 convolution modules, 2 convolution layers, 2 batch normalization layers and 2 activation functions in each module, and does not adopt a complex network with a deeper layer number. In general, each ultrasonic image only needs to pass through an 18-layer network structure, and the method is suitable for extracting the clinical section features with high requirements on speed.
In the embodiment, a pathology label graph is innovatively introduced, and the embodiment considers that medical reports need to accurately describe various pathology characteristics, and the accuracy of the pathology description is far more important than the generation of pathology-independent words. Therefore, in this embodiment, various combinations of principal and predicate objects are automatically extracted from the report by using the language parser, and are manually screened to be summarized into 25 kinds of pathological labels, each label includes two different observation results, namely positive observation result and negative observation result, which respectively represent pathology. After extracting the pathology label map, the embodiment uses this as additional label data to guide the feature extractor to learn.