Background
From 1895X-ray discovery by roentgen to over one hundred years now, the development of imaging has gone through a process from original analog imaging to current digital imaging. The application of imaging medicine in the modern medical treatment industry is continuously expanding, and the imaging medicine becomes an indispensable important component in the modern medical diagnosis.
In current medical image diagnosis, a doctor diagnoses a disease by observing one or a set of two-dimensional images based on experience. With the development of computer technology, people gradually utilize computer image processing technology to analyze and process medical images, thereby greatly improving the accuracy and reliability of diagnosis.
Deep learning has become an important development in the field of computer vision, which can automatically abstract intermediate and high-level image features from raw data (images). Studies have shown that this method is very effective for medical diagnosis. Medical image processing facilities around the world have rapidly entered this field, and have applied the deep learning method to various fields of medical image analysis.
X-ray imaging is an important medical imaging technique. The traditional flat film is the earliest image form, and the front and back tissue structures of the human body are stacked and displayed on one image film. In 1971-1972, Hounsfield, uk, invented ct (computed tomogry), electronic computed tomography, which uses a precisely collimated X-ray beam and a highly sensitive detector to perform cross-sectional scans one by one around a certain part of a human body, and forms a complete cross-sectional image through computer processing. Such imaging technology is becoming an indispensable diagnostic imaging means in modern medicine.
The difference in the density of diseased tissue and normal tissue results in a difference in the rate of absorption of X-rays, thereby creating a differentiated presentation of different tissues on a medical image (plain film or CT, etc.), and a physician can diagnose a disease by distinguishing the difference. To quantitatively measure the absorption of X-rays by tissue, Hounsfield defines a new scale "CT value". To show up for his caretaking, the hind person assigned the unit of CT value as "Hu". The CT value range of human tissues is 1024-3071 HU. However, the human eye cannot distinguish such a minute gray scale difference, and generally can distinguish only 16 gray scales. On the basis of modern digital imaging, in order to improve the display of tissue structure details and distinguish tissues with small differences, doctors generally adjust the contrast and brightness of images according to diagnosis needs, namely, a windowing display technology of medical images.
The windowing display technology is that images in a window area are linearly converted into the maximum display range of a display through a window, and image data which are higher than or lower than the upper limit and the lower limit of the window are respectively set as the highest display value or the lowest display value. By thus dynamically adjusting the window width (the range of image data to be displayed) and the window level (the center value of the image data to be displayed) of the window, more information of the image can be observed.
Many times, the CT values of diseased tissue and normal tissue differ by only a few (3-5) HUs, or even less. Assuming that we choose a window width of 160, 160/16 ═ 10HU, i.e., when the CT values of two tissues differ by less than 10HU, the difference cannot be resolved, which makes the diagnosis difficult. In order to show the subtle differences in the CT values of tissues to the extent that the human eye can distinguish them and to make the images have differences in black and white gradients, windowing display techniques have been used.
The windowing display technology occupies an extremely important position in clinical diagnosis work, and is an important method for diagnosing diseases by imaging. If the window level and the window width are not adjusted properly, the human tissue structure and the adjacent relation can not be displayed, and even the lesion is covered and missed.
When a doctor uses an image to diagnose a disease, the doctor firstly observes the whole image, then selects some local areas from the image to pay attention, and puts more attention to the local areas to acquire more detailed information. The doctor focuses on the region of interest from the first image of interest to the subsequent image of interest globally, and the change of the focus point causes the window level and the window width used when focusing on the global previously not to be better used for observing the local part, so the doctor can continuously adjust the window level and the window width to obtain the optimal display effect. The window width empirical value of the lung image display is generally 1300Hu to 1700Hu, and the window level is between-600 Hu and-800 Hu. On the basis of the basic window width, if the shapes, the leaves, the pleural depression, the burr and the like of pulmonary vessels and pulmonary fund need to be observed in a focused manner, the window level and the window width need to be adjusted again. The data used for deep learning is usually a CT image at a certain window level and window width.
Taking the diagnosis process of the doctor on the disease as an example, the doctor can continuously adjust the window level and the window width of the region of interest on the basis of global observation so as to obtain the optimal display of the observed tissue. And a bitmap image formed under a single window level and window width is used as the only input of the neural network, so that a large amount of detailed information of diseases can be lost.
At present, the deep learning of medical images generally uses images under a single window level and a single window width, so that some key disease characteristics can be lost. That is, a single window level and window width bitmap are used as the input of deep learning, which results in the loss of a large amount of detailed information of diseases, and thus results in errors and even errors in diagnosis.
Aiming at the problem that the image recognition module trained through deep learning in the related technology has low accuracy in medical image recognition, no effective solution is provided at present.
Disclosure of Invention
The application mainly aims to provide an image recognition module training method and device to solve the problem that an image recognition module trained through deep learning in the related technology is low in medical image recognition accuracy.
In order to achieve the above object, in a first aspect, the present application provides an image recognition module training method, including:
performing feature extraction on the specified medical image based on a convolutional neural network to generate a first image feature;
processing the specified medical image based on an attention mechanism to obtain a second image characteristic and an extraction area;
acquiring an extraction area image corresponding to the extraction area;
performing feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image;
and fusing the first image characteristic, the second image characteristic and the third image characteristic to obtain a fused image characteristic, thereby finishing the training of an image recognition module based on the fused image characteristic.
Optionally, the method further comprises:
windowing the medical sample image based on the designated window level and the designated window width to generate a designated medical image.
Optionally, the acquiring an extraction area image corresponding to the extraction area includes:
calculating an extraction window level and an extraction window width corresponding to the extraction area;
windowing the medical sample image based on the extraction window level and the extraction window width to generate an extraction area image.
Optionally, the method further comprises:
judging whether the ratio of the extraction window width to the specified gray scale in the extraction area image is less than 1;
and when the ratio of the extraction window width to the specified gray scale in the extraction area image is less than 1, executing the step of carrying out feature extraction on the extraction area image based on the convolutional neural network.
Optionally, the method further comprises:
when the ratio of the extraction window width to a specified gray scale in the extraction area image is not less than 1, the step of processing the specified medical image based on the attention mechanism is executed again.
In a second aspect, the present application further provides an image recognition module training apparatus, including:
the first extraction module is used for extracting the characteristics of the specified medical image based on the convolutional neural network to generate first image characteristics;
the attention mechanism module is used for processing the specified medical image based on an attention mechanism to obtain a second image characteristic and an extraction area;
the acquisition module is used for acquiring an extraction area image corresponding to the extraction area;
the second extraction module is used for extracting the features of the extraction area image based on a convolutional neural network to obtain third image features of the extraction area image;
and the fusion module is used for fusing the first image characteristic, the second image characteristic and the third image characteristic to obtain a fused image characteristic, so that the training of the image recognition module is completed based on the fused image characteristic.
Optionally, the apparatus further comprises:
and the windowing module is used for windowing the medical sample image based on the specified window level and the specified window width to generate a specified medical image.
Optionally, the obtaining module is configured to:
calculating an extraction window level and an extraction window width corresponding to the extraction area;
windowing the medical sample image based on the extraction window level and the extraction window width to generate an extraction area image.
In a third aspect, the present application further provides a computer device, including: a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to execute a computer program stored in the memory;
the computer program is used for executing the image recognition module training method.
In a fourth aspect, the present application also provides a computer-readable storage medium storing computer code that, when executed, performs the above-described image recognition module training method.
In the image recognition module training method provided by the application, feature extraction is carried out on a specified medical image based on a convolutional neural network to generate a first image feature; processing the specified medical image based on an attention mechanism to obtain a second image characteristic and an extraction area; acquiring an extraction area image corresponding to the extraction area; performing feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image; and fusing the first image characteristic, the second image characteristic and the third image characteristic to obtain a fused image characteristic, thereby finishing the training of an image recognition module based on the fused image characteristic. Therefore, the first image features of the appointed medical image are extracted through the convolutional neural network, the extraction area is windowed again through the attention mechanism, and then the second image features and the third image features are obtained, so that the image recognition module training method can perform windowing again on the region of interest (the extraction area determined by the attention mechanism) on the basis of global image learning, and the window levels and the window widths are used as the input of the neural network, so that the accuracy of the image recognition module trained through deep learning for recognizing the medical images is improved. And the technical problem that the medical image identification accuracy of the image identification module trained through deep learning in the related technology is low is solved. Meanwhile, the method redistributes the learning ability of the network by concentrating on more important subtasks, thereby reducing the difficulty of the original task and enabling the network to be easier to train.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In one aspect, an embodiment of the present application provides a training method for an image recognition module, and fig. 1 is a schematic flowchart of the training method for an image recognition module provided in an embodiment of the present application, as shown in fig. 1, the method includes the following steps 110 to 150:
and 110, performing feature extraction on the specified medical image based on the convolutional neural network to generate a first image feature.
The designated medical image may be a CT image or a windowed CT image, or other type of medical image. The CT image is a CT value showing the image density of a tissue, and is not an absolute value, but a relative value obtained by comparing water with water, and the unit of the CT image is henry (Hu). Namely, the CT value of water is 0Hu, air is-1000H, and bone is more than 4000 grades of + 3720H.
Specifically, feature extraction is performed on the specified medical image through a CNN network (convolutional neural networks for short), and then a first image feature is output.
And 120, processing the specified medical image based on an attention mechanism to obtain a second image characteristic and an extraction area.
The Attention mechanism simulates a human visual Attention mechanism, learns the weight distribution of image features, applies the weight distribution to the original features, provides different feature influences for subsequent tasks such as image classification, image identification and the like, enables the whole machine learning process to pay more Attention to some key features, ignores some unimportant features, and improves task efficiency.
Specifically, the designated medical image is processed through an attention mechanism to obtain a second image feature and an extraction region, wherein the extraction region is specifically an interesting region coordinate generated by processing the designated medical image through an attention mechanism and is represented by x, y, tx, ty. Where (x, y) represents the coordinates of the center point and tx, ty represents the length and width of the region. Further note that the force mechanism can be re-windowed by the image gray scale condition to obtain a feature image (extraction area image) preferably representing the tissue.
It should be noted that, when the designated medical image is processed based on the attention mechanism, a plurality of extraction regions may be obtained, and step 130 is performed for each of the plurality of extraction regions.
And 130, acquiring an extraction area image corresponding to the extraction area.
Specifically, the window level and the window width of the extraction region may be calculated based on the extraction region (i.e., the region-of-interest coordinates), and then windowing is performed again according to the calculated window level and window width to obtain an extraction region image. Further, the present application generates an extracted region image having a better training effect with respect to a specified medical image in a progressive focus type manner, and generates a third image feature different from the first image feature by extracting the feature.
140, performing feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image;
specifically, feature extraction is performed on the extracted area image based on the convolutional neural network, so that a third image feature is obtained.
And 150, fusing the first image characteristic, the second image characteristic and the third image characteristic to obtain a fused image characteristic, thereby finishing the training of the image recognition module based on the fused image characteristic.
Specifically, the first image feature, the second image feature and the third image feature are fused to perform image recognition module training, and the image recognition module training can be used for diagnosis of a disease condition through the full connection layer.
Specifically, the first image features of the specified medical image are extracted through the convolutional neural network, the extraction region is windowed again through the attention mechanism, and then the second image features and the third image features are obtained, so that the region of interest (the extraction region determined by the attention mechanism) can be windowed again on the basis of the global image learning by the image recognition module training method, and a plurality of window levels and window widths are used as the input of the neural network, so that the accuracy of the image recognition module trained through deep learning for recognizing the medical image is improved. Meanwhile, the method redistributes the learning ability of the network by concentrating on more important subtasks, thereby reducing the difficulty of the original task and enabling the network to be easier to train.
Optionally, fig. 2 is a schematic flowchart of another image recognition module training method provided in the embodiment of the present application, and as shown in fig. 2, the method further includes the following steps 100:
and 100, windowing the medical sample image based on the designated window level and the designated window width to generate the designated medical image.
Specifically, the designated window level and the designated window width are set according to experience, that is, the designated window level and the designated window width are used for performing initial windowing on the medical sample image to obtain the designated medical image.
Optionally, in step 130, acquiring an extraction area image corresponding to the extraction area, where the acquiring includes:
calculating an extraction window level and an extraction window width corresponding to the extraction area;
windowing the medical sample image based on the extraction window level and the extraction window width to generate an extraction area image.
Specifically, an extraction window level and an extraction window width corresponding to the displayed extraction area are calculated, and windowing is performed in the original medical sample image according to the extraction window level and the extraction window width to obtain an extraction area image.
It should be noted that the extraction region image may be generated in the designated medical image, and the manner of obtaining the extraction region image is not limited to the windowing technology, and may be specifically set by those skilled in the art as needed.
In this embodiment, fig. 3 is a system framework diagram for implementing a training method of an image recognition module according to an embodiment of the present application, as shown in fig. 3:
denotes the original CT image.
And secondly, setting an initial window level and window width according to the normal CT value of human tissues and generating a corresponding image by using the image after the specified window level and window width are set.
And thirdly, representing the characteristic extraction of the second step through the CNN network.
And fourthly, selecting some local important attention areas, wherein an attention mechanism is applied.
Indicating the new image characteristic obtained by the attribute mechanism according to the input characteristic.
Sixthly, representing the coordinate (extraction area) of the region of interest generated by the attention mechanism, and calculating the window level and the window width value of the region of interest. Denoted x, y, tx, ty. Where (x, y) represents the coordinates of the center point and tx, ty represents the length and width of the region.
And (c) indicating the image (extracted region image) after the window is opened by using the new window position window.
And (b) represents feature extraction performed on the seventh aspect through a CNN network pair.
And ninthly, fusing the extracted three-dimensional image features.
The full junction layer is indicated in the r for diagnosis of the disease.
Optionally, the image recognition module training method provided in the embodiment of the present application further includes the following steps:
judging whether the ratio of the extraction window width to the specified gray scale in the extraction area image is less than 1;
and when the ratio of the extraction window width to the specified gray scale in the extraction area image is less than 1, executing the step of carrying out feature extraction on the extraction area image based on the convolutional neural network.
Since the gradation can be expressed in the range of 256 by 8-bit storage, the designated gradation can be set to 256.
Specifically, after the extraction area image is obtained, it is determined whether the ratio of the extraction window width to the designated gray scale in the extraction area image is less than 1, and when the ratio of the extraction window width to the designated gray scale in the extraction area image is less than 1, it can be understood that the extraction area image meets the training requirement, and then step 140 is performed.
Optionally, the method further comprises:
when the ratio of the extraction window width to a specified gray scale in the extraction area image is not less than 1, the step of processing the specified medical image based on the attention mechanism is executed again.
Specifically, when the ratio of the extraction window width to the designated gray scale in the extraction region image is not less than 1, it may be understood that the extraction region image does not meet the training requirement, step 120 is executed again, the extraction region is obtained again, and the extraction window level and the extraction window width are recalculated until the ratio of the extraction window width to the designated gray scale in the extraction region image is less than 1, and step 140 is executed.
In the image recognition module training method provided by the application, feature extraction is carried out on a specified medical image based on a convolutional neural network to generate a first image feature; processing the specified medical image based on an attention mechanism to obtain a second image characteristic and an extraction area; acquiring an extraction area image corresponding to the extraction area; performing feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image; and fusing the first image characteristic, the second image characteristic and the third image characteristic to obtain a fused image characteristic, thereby finishing the training of an image recognition module based on the fused image characteristic. Therefore, the first image features of the appointed medical image are extracted through the convolutional neural network, the extraction area is windowed again through the attention mechanism, and then the second image features and the third image features are obtained, so that the image recognition module training method can perform windowing again on the region of interest (the extraction area determined by the attention mechanism) on the basis of global image learning, and the window levels and the window widths are used as the input of the neural network, so that the accuracy of the image recognition module trained through deep learning for recognizing the medical images is improved. And the technical problem that the medical image identification accuracy of the image identification module trained through deep learning in the related technology is low is solved. Meanwhile, the method redistributes the learning ability of the network by concentrating on more important subtasks, thereby reducing the difficulty of the original task and enabling the network to be easier to train.
Based on the same technical concept, the present application further provides an image recognition module training device, and fig. 4 is a schematic structural diagram of the image recognition module training device provided in the embodiment of the present application, as shown in fig. 4, the device includes:
the first extraction module 10 is configured to perform feature extraction on a specified medical image based on a convolutional neural network to generate a first image feature;
an attention mechanism module 20, configured to process the specified medical image based on an attention mechanism, so as to obtain a second image feature and an extraction area;
an obtaining module 30, configured to obtain an extraction area image corresponding to the extraction area;
the second extraction module 40 is configured to perform feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image;
and the fusion module 50 is configured to fuse the first image feature, the second image feature, and the third image feature to obtain a fusion image feature, so as to complete training of the image recognition module based on the fusion image feature.
Optionally, the apparatus further comprises:
and the windowing module is used for windowing the medical sample image based on the specified window level and the specified window width to generate a specified medical image.
Optionally, the obtaining module 30 is configured to:
calculating an extraction window level and an extraction window width corresponding to the extraction area;
windowing the medical sample image based on the extraction window level and the extraction window width to generate an extraction area image.
Optionally, the apparatus further comprises:
the judging module is used for judging whether the ratio of the extraction window width to the specified gray scale in the extraction area image is less than 1;
when the ratio of the extraction window width to the designated gray scale in the extraction area image is less than 1, a second extraction module 40 is executed.
Optionally, the apparatus further comprises:
when the ratio of the extraction window width to the designated gray scale in the extraction area image is not less than 1, the attention mechanism module 20 is executed again.
In the image recognition module training device provided by the application, a first extraction module 10 is used for performing feature extraction on a specified medical image based on a convolutional neural network to generate a first image feature; an attention mechanism module 20, configured to process the specified medical image based on an attention mechanism, so as to obtain a second image feature and an extraction area; an obtaining module 30, configured to obtain an extraction area image corresponding to the extraction area; the second extraction module 40 is configured to perform feature extraction on the extracted region image based on a convolutional neural network to obtain a third image feature of the extracted region image; and the fusion module 50 is configured to fuse the first image feature, the second image feature, and the third image feature to obtain a fusion image feature, so as to complete training of the image recognition module based on the fusion image feature. Therefore, the first image features of the appointed medical image are extracted through the convolutional neural network, the extraction area is windowed again through the attention mechanism, and then the second image features and the third image features are obtained, so that the image recognition module training method can perform windowing again on the region of interest (the extraction area determined by the attention mechanism) on the basis of global image learning, and the window levels and the window widths are used as the input of the neural network, so that the accuracy of the image recognition module trained through deep learning for recognizing the medical images is improved. And the technical problem that the medical image identification accuracy of the image identification module trained through deep learning in the related technology is low is solved. Meanwhile, the method redistributes the learning ability of the network by concentrating on more important subtasks, thereby reducing the difficulty of the original task and enabling the network to be easier to train.
Based on the same technical concept, an embodiment of the present application further provides a computer device, including: a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to execute a computer program stored in the memory;
the computer program is used for executing the image recognition module training method.
Based on the same technical concept, embodiments of the present application also provide a computer-readable storage medium storing computer code, and when the computer code is executed, the above-mentioned training method for the image recognition module is executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the computer-readable storage medium described above may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The computer program referred to in the present application may be stored in a computer-readable storage medium, which may include: any physical device capable of carrying computer program code, virtual device, flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only computer Memory (ROM), Random Access computer Memory (RAM), electrical carrier wave signal, telecommunications signal, and other software distribution media, and the like.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.