Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a flow chart of a neural network training method according to an embodiment of the present disclosure, as shown in fig. 1, the method including:
in step S11, a sample image is input into a first segmentation network, a first segmentation result of the sample image is obtained, the first segmentation result includes segmentation areas of a plurality of classes, the first segmentation result is used as prediction annotation information of the sample image, and the sample image includes an annotated first sample image and an annotated second sample image;
inputting the sample image into a second segmentation network, and obtaining a second segmentation result of the sample image, wherein the second segmentation result comprises a plurality of classes of segmentation areas in step S12;
in step S13, the first and second segmentation networks are trained based on at least the first and second segmentation results.
According to the neural network training method disclosed by the embodiment of the disclosure, the first segmentation result obtained by the first segmentation network can be used as the prediction marking information, so that when the number of sample images with the marking information is insufficient, the first segmentation network and the second segmentation network can be continuously trained, the training effect is improved, and the precision of the neural network is improved.
In one possible implementation, the neural network training method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.
In one possible implementation, the first segmentation network and the second segmentation network may each be a convolutional neural network to extract features of the sample image using a convolution process or the like and obtain the first segmentation result and the second segmentation result. The first and second split networks may also be other neural networks, such as a recurrent neural network, etc., and the present disclosure does not limit the types of the first and second split networks.
In one possible implementation, the sample image may be a medical image, such as a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image. The sample image may also be other images, such as a portrait image, a street view image, a remote sensing image, etc., and the present disclosure does not limit the type of the sample image.
In one possible implementation, the sample image is a medical image, and the labeling of the medical image requires a professional such as a doctor. For example, labeling of a lesion region in a CT image needs a professional such as a doctor to accurately determine the lesion region, thereby avoiding a labeling error. For another example, segmentation and labeling of multiple regions in a medical image need a professional such as a doctor to accurately distinguish multiple organs or regions, thereby avoiding labeling errors. The medical image is high in annotation difficulty and high in annotation cost.
In an example, the sample image comprises a medical image, the class of segmented regions comprises at least one of a left ventricular region, a left myocardial wall region, a right ventricular region, a left atrial region, a right atrial region, an aorta region, a pulmonary artery region. However, the labeling difficulty of the medical images is high, the labeling cost is high, and therefore the number of sample images with labeling information is small.
In one possible implementation, the first segmentation network and the second segmentation network can be trained together to make up for the shortage of the number of sample images with labeling information. In step S11, the sample image may be input to a first segmentation network to obtain a first segmentation result, and in step S12, the sample image may be input to a second segmentation network to obtain a second segmentation result. The first segmentation result may be used as prediction tagging information, that is, may be used as prediction segmentation result information of the second segmentation network during training to guide training of the second segmentation network, so that in step S13, network loss of the first segmentation network and the second segmentation network is determined according to a difference between the first segmentation result and the second segmentation result, and the first segmentation network and the second segmentation network are trained.
In a possible implementation manner, before training using the sample image, the sample image may be preprocessed, for example, the sample image is a CT image of a heart region, and since the resolutions of CT images of different patients are significantly different and the resolutions of the same sample image in three directions, namely x (length direction), y (width direction) and z (height direction), are also different, resampling operation needs to be performed on the image so that the resolutions of the sample image in the directions are the same, and the input requirement of the neural network is met. In order to enable stable training of the network, the resampled sample image may be normalized. The present disclosure is not limited as to the manner of pretreatment.
The sample images can include a first sample image having annotation information and a second sample image having no annotation information. In an example, the sample image may be a medical image such as a CT image, wherein the first sample image has labeling information, that is, the first sample image may have labels for a plurality of classes of segmented regions, for example, the classes of segmented regions include at least one of a left heart chamber region, a left myocardial wall region, a right ventricle region, a left atrium region, a right atrium region, an aorta region, and a pulmonary artery region, and the first sample image may include a segmentation label (e.g., a segmentation mask, etc.) for the above regions. The second sample image does not have the above-mentioned label information.
In one possible implementation, in the case that the sample image is the first sample image with the annotation information, the training may be guided by the annotation information and the first segmentation result output by the first segmentation network.
In one possible implementation, step S13 may include: determining a first loss based on the prediction marking information of the first sample image and a second segmentation result; determining a second loss according to the labeling information of the first sample image and a second segmentation result; training the first split network and the second split network according to the first loss and the second loss.
In a possible implementation manner, in a sample image, due to factors such as positions and shapes of regions, category errors may be caused, for example, on one hand, due to tight connection between chambers of a heart and influence of image quality, contrast of boundaries between chambers is not obvious, and on the other hand, due to large cardiac individual difference of different patients, training using the above samples may cause a phenomenon that a sample of the same category generates multiple classifications, and a phenomenon of misclassification is likely to occur, resulting in a reduction in segmentation accuracy.
In one possible implementation, to address the above problem, the first and second segmentation networks may be trained by determining a classification loss of features of each segmented region in the plurality of first sample images. Training the first segmentation network and the second segmentation network based at least on the first segmentation result and the second segmentation result, further comprising: determining a classification loss based on features of the plurality of classes of segmented regions of the first segmentation result of the first sample image or features of the plurality of classes of segmented regions of the second segmentation result of the first sample image; training the second segmentation network based on the classification loss.
In an example, the feature (e.g., a high-dimensional feature) of each region of the first sample image may be obtained through the first segmentation network and the second segmentation network, for example, the first segmentation result of the first segmentation network and the second segmentation result of the second segmentation network may include the feature of each region, and the class-center feature of each class of features is determined, for example, the class-center feature is obtained by clustering or the like, and the present disclosure does not limit the way of obtaining the class-center feature.
In one possible implementation, the classification loss may be determined according to the feature of each segmented region in the first sample image and the class center feature of the class to which the segmented region belongs. For example, the feature of each segmented region and the class center feature of the class to which the segmented region belongs may be represented by a high-dimensional vector, a feature distance (e.g., euclidean distance, cosine distance, etc., the feature distance is not limited by the present disclosure) between the feature of each segmented region and the class center feature may be determined, and a classification loss may be determined according to the feature distance, e.g., the classification loss may be calculated according to a center loss function (center loss). By training the second segmentation network through classification loss, the intra-class feature distance of the features of the same type of segmentation regions can be further reduced (compared with the intra-class feature distance obtained without using a central loss function), that is, the classification is more obvious, the features of the same type of segmentation regions are more concentrated in a feature space, and the determination of the class of each segmentation region is facilitated. Likewise, the first segmentation network may also be trained based on classification losses.
In one possible implementation, after determining the category of each region, the first loss may be determined according to the prediction labeling information (i.e., the first segmentation result determined by the first segmentation network) and the second segmentation result of the sample image. In an example, for a sample image, the first segmentation network may determine a first segmentation result for each segmented region in the sample image, e.g., may determine where each segmented region is located. The second segmentation network may determine a second segmentation result for each segmented region in the sample image, e.g., may determine where each segmented region is located. Further, the first loss may be determined according to a difference between positions of the same class of the divided regions in the first and second division results. The present disclosure does not limit the manner in which the first loss is determined.
In one possible implementation, the second loss may be determined according to the annotation information of the first sample image and the second segmentation result. In an example, for a first sample image, the first segmentation network may determine a first segmentation result for each segmented region in the first sample image, e.g., may determine where each segmented region is located. The marking information can mark the position of each divided area. Further, the second loss may be determined according to a difference between the labeling information and a position of a segmented region of the same category in the second segmentation result. The manner in which the second loss is determined is not limited by this disclosure.
In one possible implementation, the first split network and the second split network may be trained based on a first loss and a second loss. This step may include: carrying out weighted summation processing on the first loss and the second loss to obtain comprehensive network loss, wherein the weight of the first loss is positively correlated with the number of training iteration cycles; training the first split network and the second split network based on the integrated network loss.
In a possible implementation manner, the first loss and the second loss may be subjected to weighted summation to obtain an integrated network loss, where in the integrated network loss, a weight of each loss may change with an iteration cycle, and in an example, since the number of first sample images having the labeling information is small, the weight of the first loss may be increased and the weight of the second loss may be decreased as the precision of the first segmentation network is improved in the training. That is, the first segmentation network and the second segmentation network are trained using the predicted annotation information determined by the first segmentation network instead of the annotation information step by step, so that when the number of the first sample images is insufficient, the first segmentation network and the second segmentation network can continue to be trained through the second sample images without the annotation information (i.e., the network loss can be determined through the predicted annotation information and the second segmentation result).
In an example, in the weighted summation, the classification losses may also be summed, that is, the classification loss term is included in the integrated network loss, so that the features of the homogeneous segmentation regions are more concentrated in the feature space.
In a possible implementation manner, the first segmentation network and the second segmentation network may be trained according to the integrated network loss determined in the above manner, that is, the first sample image may be input multiple times, the integrated network loss may be obtained, and the network parameters of the first segmentation network and the second segmentation network may be adjusted according to the integrated network loss. Because the prediction marking information obtained by the first segmentation network can be used for guiding training, the precision of the first segmentation network can be higher than that of the second segmentation network, namely, the neural network with lower precision is trained by taking the output result of the neural network with higher precision as the prediction marking information.
In one possible implementation, the network structures of the first and second segmentation networks may be the same, and the accuracy of the first segmentation network may be relatively higher by the difference of the training modes. Training the first split network and the second split network according to the integrated network loss, comprising: carrying out gradient descent processing on the comprehensive network loss, and updating the network parameters of the second segmentation network; and carrying out gradient reduction and moving average processing on the comprehensive network loss, and updating the network parameters of the first segmentation network.
In one possible implementation, when updating the network parameters of the second segmented network, the gradient descent processing may be performed on the network parameters according to the integrated network loss. When updating the network parameters of the first segmented network, gradient descent and moving average processing can be performed according to the comprehensive network loss, for example, moving average processing can be performed on the comprehensive network loss of the previous training period and the comprehensive network loss of the current training period, and gradient descent processing can be performed through the network loss after moving average processing, so that the learning curve of the first segmented network can be smooth, the learning rate is higher, and the accuracy of the first segmented network can be improved quickly.
In one possible implementation, the first segmentation network may also be pre-trained such that the first segmentation network has a higher accuracy than the second segmentation network, e.g., the first segmentation network is pre-trained using the labeled first sample image such that the first segmentation network has a higher accuracy than the second segmentation network. And then, using the first segmentation result output by the first segmentation network as prediction marking information to guide the training of the second segmentation network.
In a possible implementation manner, through multiple iterations, the precision of the first segmentation network may reach a relatively high level, and the prediction tagging information obtained by the first segmentation network with higher precision than that of the second segmentation network may be used to guide training, so that not only the precision of the second segmentation network is continuously improved, but also the precision of the first segmentation network is continuously improved through the prediction tagging information and a first loss determined by a second segmentation result determined by the second segmentation network. The weight of the first loss can be gradually increased in iteration, and the weight of the second loss can be reduced, that is, the first loss gradually replaces the second loss, so that the dependence on the labeling information of the sample image is reduced, and when the sample image with the labeling information is lacked, the sample image without the labeling information can be used for continuously training the first segmentation network and the second segmentation network, and the precision of the two neural networks is continuously increased.
By the method, the precision of the first segmentation network can be improved quickly, the prediction annotation information determined by the first segmentation network is used for replacing the annotation information of the sample images step by step, and when the number of the sample images with the annotation information is insufficient, the first segmentation network and the second segmentation network can be trained continuously, so that the dependence on the annotation information is reduced, and the precision of the first segmentation network and the precision of the second segmentation network are improved.
In a possible implementation manner, after the training process, the precision of the first segmentation network and the second segmentation network is gradually improved, and as the training process progresses, the prediction label information determined by the first segmentation network may be used to train the first segmentation network and the second segmentation network. That is, when the second sample image having no annotation information is input, the first segmentation network and the second segmentation network may be trained based on the predicted annotation information (i.e., the segmentation result obtained by the first segmentation network) of the second sample image, the second segmentation result (i.e., the segmentation result obtained by the second segmentation network) of the second sample image, and the classification loss. Further, the training further comprises: determining a third loss according to the prediction marking information of the second sample image and a second segmentation result of the second sample image; training the first split network and the second split network according to the second classification loss and the third loss.
In one possible implementation, as described above, in the second sample image, there may be a phenomenon in which the classification of the region is misclassified, resulting in a problem in that the segmentation accuracy is reduced.
In one possible implementation, the first and second segmentation networks may be trained by determining a second classification loss for each segmented region in the plurality of second sample images. In an example, a class-centered feature of each segmented region in the plurality of first sample images may be determined, e.g., a first segmentation result of a first segmentation network or a second segmentation result of a second segmentation network may include features of each segmented region, and a classification loss may be determined based on the features of each segmented region in the second sample image and the class-centered feature of the class to which it belongs. The determination of the classification loss is not described in detail herein. The first segmentation network and the second segmentation network are trained through classification loss, so that the intra-class feature distance of the features of the segmentation regions of the same class can be reduced, namely, the classification is more obvious, the features of the segmentation regions of the same class are further concentrated in a feature space, and the classification of each segmentation region is favorably determined.
In one possible implementation, in the absence of annotation information, the predictive annotation information determined by the first segmentation network may be used to guide the training of the second segmentation network. The third loss may be determined from the prediction annotation information (i.e., the first segmentation result determined by the first segmentation network) of the second sample image and the second segmentation result of the second sample image. In an example, for a second sample image, the first segmentation network may determine a first segmentation result for each segmented region in the second sample image, e.g., may determine where each segmented region is located. The second segmentation network may determine a second segmentation result for each segmented region in the second sample image, e.g., may determine where each segmented region is located. Further, the third loss may be determined according to a difference between positions of the same class of the divided regions in the first and second division results. The present disclosure does not limit the manner in which the third loss is determined.
In a possible implementation manner, the network parameters of the first segmentation network may be updated by performing gradient descent and moving average processing on the first segmentation network according to the third loss, and the network parameters of the second segmentation network may be updated by performing gradient descent processing on the second segmentation network according to the third loss, so that the output result of the first segmentation network with relatively high precision is used as the prediction labeling information to guide training, and the precision of the first segmentation network and the precision of the second segmentation network are continuously improved. After the training, the difference between the first segmentation result output by the first segmentation network and the second segmentation result output by the second segmentation network is reduced, that is, the precision of the first segmentation network and the precision of the second segmentation network are both improved along with the training, and the precision difference is reduced along with the training.
In an example, the classification loss may also be summed with a third loss, and the summed network losses are used to train the first and second segmentation networks such that features of homogeneous segmented regions are more concentrated in the feature space.
By the method, the output result of the first segmentation network with relatively high precision is used as the prediction annotation information to guide training, so that the dependence on the annotation information of the sample image can be reduced, and the training can be continued under the condition of lacking the sample image with the annotation information, so that the precision of the first segmentation network and the second segmentation network is improved.
In one possible implementation, the first segmentation network and the second segmentation network may be used to perform segmentation processing on the input image, for example, a segmentation mask corresponding to the input image may be obtained, and a position and an outline of each segmentation region may be indicated in the segmentation mask. In an example, the input image is a CT image of a heart region, and the first segmentation network or the second segmentation network may obtain a segmentation result of at least one of a left heart cavity region, a left myocardial wall region, a right ventricle region, a left atrium region, a right atrium region, an aorta region, and a pulmonary artery region of the heart, for example, a segmentation mask of the region may be obtained to indicate a position and a contour of the region.
In one possible implementation, the contour of the segmented region may be further optimized to reduce the probability of the segmented region being malformed, such that the shape and contour of the segmented region are closer to the true contour, e.g., the shape of the right ventricle region in the heart is closer to the true right ventricle region.
In one possible implementation, the malformation of the shape of the segmented region can be reduced by a shape constraint network, i.e., the magnitude of the malformation is constrained to a small extent so that the contour of the segmented region is closer to the true contour. For example, the segmentation result (e.g., segmentation mask) of the input image may be input into a shape constraint network, and the shape of the segmented region may be optimized by reducing the deformity through the shape constraint network.
In one possible implementation, the shape constraint network may be trained before using the shape constraint network to optimize the shape of the segmented region. The method further comprises the following steps: and training a shape constraint network according to at least the first segmentation result and the second segmentation result, wherein the shape constraint network is used for constraining the shape of the segmentation region.
In one possible implementation, the shape constraint network may be trained using a first sample image with annotation information. Training the shape constraint network based at least on the first segmentation result and the second segmentation result, comprising: inputting the first segmentation result or the second segmentation result into the shape constraint network to obtain a sample segmentation result; determining a fourth loss of the shape constraint network based on the annotation information of the first sample image and the sample segmentation result; training the shape constraint network based on the fourth loss.
In an example, the second segmentation result (e.g., the predicted segmentation mask) output by the second segmentation network may be input into the shape constraint network, the sample segmentation result is obtained, and a fourth loss of the shape constraint network, such as a cross-entropy loss, is determined by using a difference between the annotation information (e.g., the real segmentation mask) of the first sample image and the sample segmentation result, and the determination manner of the fourth loss is not limited by the present disclosure. In an example, the shape constraint network may also be trained using the first segmentation results output by the first segmentation network, and the present disclosure does not limit the use of the first segmentation results and the second segmentation results.
In one possible implementation, the shape constraint network may be trained through a fourth loss until the training condition is satisfied or the first sample image with the labeling information has been completely input. The training condition may be an accuracy condition, for example, if the accuracy of the sample segmentation result meets a preset accuracy requirement, the training is completed. Alternatively, the training condition may be an iteration number condition, for example, if the iteration number has reached the number requirement, the training is completed. The present disclosure does not limit the training conditions.
By the method, the shape constraint network can be trained, so that the shape constraint network constrains the amplitude of the deformity in a smaller range, the outline of the segmented region is closer to the real outline, and the segmentation precision is improved.
In a possible implementation manner, after the training of the neural network is completed, the neural network may be tested, for example, the neural network may be tested in a data set different from the sample set to which the sample image belongs, for example, the precision of the neural network may be tested, if the precision of the neural network meets the precision requirement, the neural network may be used in image segmentation, and if the precision does not meet the precision requirement, the training of the neural network may be continued until the test is passed. The accuracy requirements may include accuracy of segmentation of samples in other data sets, and the present disclosure does not limit the testing method.
In a possible implementation manner, the present disclosure provides an image processing method, which may perform segmentation processing on an image to be processed by using the trained neural network, so as to obtain a segmentation result.
Fig. 2 illustrates a flow diagram of an image processing method according to an embodiment of the present disclosure, which may include, as illustrated in fig. 2:
in step S21, inputting the image to be processed into the first segmentation network or the second segmentation network trained according to the neural network training method to obtain a third segmentation result;
in step S22, the third segmentation result is input into the trained shape constraint network, and a target segmentation result is obtained.
In one possible implementation, the trained first segmentation network or second segmentation network may be used to perform segmentation processing on the image to be processed. In an example, the accuracy of both the first and second segmentation networks may meet the accuracy requirement, so any one neural network may be used for the segmentation process. For example, the image to be processed may be input into the first segmentation network, and a third segmentation result, for example, a segmentation result of a left heart cavity region, a left myocardial wall region, a right ventricle region, a left atrium region, a right atrium region, an aorta region, and a pulmonary artery region in the image to be processed (e.g., cardiac CT image) may be obtained.
In a possible implementation manner, the third segmentation result may be input into a trained shape constraint network, so as to obtain a target segmentation result after the shape of each region is optimized. So as to reduce the degree of shape distortion and improve the accuracy of segmentation.
According to the neural network training method disclosed by the embodiment of the disclosure, the precision of the first segmentation network can be improved quickly, the predicted labeling information determined by the first segmentation network gradually replaces the labeling information of the sample image, and when the number of the sample images with the labeling information is insufficient, the first segmentation network and the second segmentation network can be trained continuously, so that the dependence on the labeling information is reduced, and the precision of the first segmentation network and the precision of the second segmentation network are improved. And the shape constraint network can be trained to constrain the malformation amplitude in a smaller range, so that the contour of the segmented region is closer to the real contour, and the segmentation precision is improved.
Fig. 3 shows a schematic diagram of a neural network training method according to an embodiment of the present disclosure, and as shown in fig. 3, the sample image is a CT image of a heart region, and the sample image may be preprocessed, for example, resampled, normalized, and the like, so that the sample image meets the input requirement of the neural network. The first segmentation network may obtain a first segmentation result of the sample image and the second segmentation network may obtain a second segmentation result of the sample image. The segmentation result may be a segmentation mask for a plurality of segmented regions, for example, a segmentation mask for a left heart chamber region, a left myocardial wall region, a right ventricle region, a left atrium region, a right atrium region, an aorta region, and a pulmonary artery region.
In one possible implementation, in order to improve the segmentation accuracy and reduce the phenomenon of misclassification, the feature distance between the feature of each segmented region and the class center feature of the belonging class can be determined, and the classification loss can be determined according to the feature distance, for example, the classification loss can be calculated according to a center loss function.
In one possible implementation, in the case that the sample image is a first sample image having annotation information, the second loss may be determined according to the annotation information of the first sample image and the second segmentation result of the first sample image. Further, a first loss may be determined based on the first segmentation result and the second segmentation result. And the first loss, the second loss, and the classification loss may be weighted and summed to determine an aggregate network loss. In synthesizing network losses, the weight of each loss may change with iteration period. Because the number of the first sample images with the labeling information is small, the precision of the first segmentation network can be improved along with the increase of the iteration times in the training process, so that the weight of the first loss is increased, and the weight of the second loss is reduced. That is, the first segmentation network and the second segmentation network are trained using the predicted annotation information determined by the first segmentation network instead of the annotation information step by step, so that when the number of the first sample images is insufficient, the first segmentation network and the second segmentation network can continue to be trained through the second sample images without the annotation information.
In one possible implementation, when updating the network parameters of the second segmented network, the gradient descent processing may be performed on the network parameters according to the integrated network loss. When the network parameters of the first segmentation network are updated, gradient descent and moving average processing can be carried out according to the comprehensive network loss, so that the learning curve of the first segmentation network is smooth, the learning rate is higher, and the accuracy of the first segmentation network is improved quickly. The weight of the first loss can be gradually increased along with the improvement of the precision of the first segmentation network in the training process, so that the first segmentation result is used for guiding the training, and the dependence on the labeling information of the sample image is reduced. When the sample image with the labeling information is lacked, the first segmentation network and the second segmentation network can be continuously trained by using the sample image without the labeling information, and the precision of the two neural networks is continuously improved.
In a possible implementation manner, after all the first sample images with the labeling information are input, the training can be continued by using the second sample images without the labeling information, that is, under the condition that the precision of the first segmentation network is higher, the training can be guided by using the first segmentation result of the first segmentation network, so as to continue to improve the precision of the first segmentation network and the second segmentation network.
In one possible implementation, the shape constraint network may be trained to reduce shape distortion of the segmented region, the second segmentation result output by the second segmentation network or the first segmentation result output by the first segmentation network may be input into the shape constraint network to obtain the sample segmentation result, a fourth loss of the shape constraint network is determined by using a difference between the value labeling information of the sample image and the sample segmentation result, and the shape constraint network is trained according to the fourth loss to enable the shape constraint network to reduce shape distortion of the segmented region.
In one possible implementation manner, after the neural network training is completed, the neural network may be used to perform segmentation processing on an image to be processed (e.g., a CT image of a heart region), and the shape of each segmented region in the segmentation result is optimized through a shape constraint network, so as to obtain a target segmentation result. For example, the shape-optimized segmentation results of the left heart cavity region, the left myocardial wall region, the right ventricle region, the left atrium region, the right atrium region, the aorta region, and the pulmonary artery region can be obtained.
In a possible implementation manner, the neural network training method can train the neural network by using a small number of samples with labels under the condition that the labeling difficulty of the samples (such as medical images) is high or the labeling cost is high, and enables the neural network to have higher precision. The method can be used for identification and segmentation of medical images, for example, left heart cavity region, left myocardial wall region, right ventricle region, left atrium region, right atrium region, aorta region, pulmonary artery region can be segmented in cardiac CT images to assist doctors in diagnosis of structural diseases of heart, such as ventricular aneurysms, valvular diseases, heart expansion, thickening, etc. The application field of the neural network training method is not limited by the disclosure.
Fig. 4 shows a block diagram of a neural network training device, as shown in fig. 4, the device comprising: a first segmentation module 11, configured to input a sample image into a first segmentation network, and obtain a first segmentation result of the sample image, where the first segmentation result includes segmentation areas of multiple categories, and the first segmentation result is used as prediction annotation information of the sample image, where the sample image includes an annotated first sample image and an annotated second sample image; a second segmentation module 12, configured to input the sample image into a second segmentation network, and obtain a second segmentation result of the sample image, where the second segmentation result includes segmentation areas of multiple categories; a training module 13, configured to train the first segmentation network and the second segmentation network according to at least the first segmentation result and the second segmentation result.
In one possible implementation, the training module is further configured to: determining a first loss based on the prediction annotation information of the sample image and the second segmentation result; determining a second loss according to the labeling information of the first sample image and the second segmentation result; training the first split network and the second split network according to the first loss and the second loss.
In one possible implementation, the training module is further configured to: carrying out weighted summation processing on the first loss and the second loss to obtain comprehensive network loss, wherein the weight of the first loss is positively correlated with the number of training iteration cycles; training the first split network and the second split network based on the integrated network loss.
In one possible implementation, the training module is further configured to: carrying out gradient descent processing on the comprehensive network loss, and updating the network parameters of the second segmentation network; and carrying out gradient reduction and moving average processing on the comprehensive network loss, and updating the network parameters of the first segmentation network.
In one possible implementation, the training module is further configured to: determining a third loss according to the prediction marking information of the second sample image and a second segmentation result of the second sample image; training the first split network and the second split network according to the third loss.
In one possible implementation, the training module is further configured to: determining a classification loss based on features of the plurality of classes of segmented regions of the first segmentation result of the first sample image or features of the plurality of classes of segmented regions of the second segmentation result of the first sample image; and training the second segmentation network based on the classification loss.
In one possible implementation, the apparatus further includes: and the constraint training module is used for training a shape constraint network according to the first segmentation result or the second segmentation result, and the shape constraint network is used for constraining the shape of the segmentation region.
In one possible implementation, the constraint training module is further configured to: inputting the first segmentation result or the second segmentation result into the shape constraint network to obtain a sample segmentation result; determining a fourth loss of the shape constraint network based on the annotation information of the first sample image and the sample segmentation result; training the shape constraint network based on the fourth loss.
In one possible implementation, the sample image comprises a medical image, and the classification of the segmented region comprises at least one of a left ventricular region, a left myocardial wall region, a right ventricular region, a left atrial region, a right atrial region, an aorta region, and a pulmonary artery region.
Fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure, which includes, as shown in fig. 5: a third segmentation module 21, configured to input the image to be processed into the first segmentation network or the second segmentation network trained according to the neural network training method of any one of claims 1 to 9, so as to obtain a third segmentation result; and the constraint module 22 is configured to input the third segmentation result into the trained shape constraint network, so as to obtain a target segmentation result.
It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.
In addition, the present disclosure also provides a neural network training device, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the neural network training methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
The disclosed embodiments also provide a computer program product comprising computer readable code, which when run on a device, a processor in the device executes instructions for implementing the neural network training method provided in any of the above embodiments.
The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the neural network training method provided in any one of the embodiments.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.
Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932TM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMOr the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.