CN117036703A - A two-stage 3D image segmentation method based on dimensionality decomposition attention - Google Patents
A two-stage 3D image segmentation method based on dimensionality decomposition attention Download PDFInfo
- Publication number
- CN117036703A CN117036703A CN202310994348.6A CN202310994348A CN117036703A CN 117036703 A CN117036703 A CN 117036703A CN 202310994348 A CN202310994348 A CN 202310994348A CN 117036703 A CN117036703 A CN 117036703A
- Authority
- CN
- China
- Prior art keywords
- channel
- feature
- dimension
- attention
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The application provides a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which effectively solves the problem of segmentation performance reduction caused by the close strength of a target area and adjacent tissues. In the first stage of the method, a three-dimensional U-shaped network is used for positioning the region of interest, so that the interference of irrelevant tissues in the image is reduced, and input is provided for the second stage; and in the second stage, the dimension decomposition space attention and the dimension decomposition channel attention are added in the three-dimensional U-shaped network, so that the accurate segmentation of the target is realized. The dimension decomposition space attention and the dimension decomposition channel attention both decompose the characteristics output by the encoder into three one-dimensional direction characteristics of length, width and depth, and generate attention weights; the weight of the two attentions is multiplied by the characteristics of the encoder output as the input to the decoder, generating the final segmentation result. The application can effectively utilize all information of the three-dimensional image, enhance the weight of the region of interest and improve the segmentation performance.
Description
Technical Field
The application relates to the field of computer vision, in particular to a two-stage three-dimensional image segmentation method and system based on dimension decomposition attention.
Background
Along with the development of deep learning, convolutional neural networks are widely applied to image segmentation tasks by virtue of the ability of the convolutional neural networks to automatically learn deep and more discriminative features from samples. Researchers have proposed a series of two-dimensional convolution models of image segmentation and exhibit excellent performance. But three-dimensional images can store more information than two-dimensional images, which is more efficient in real-world applications. Therefore, researchers have proposed a three-dimensional convolution model dedicated to three-dimensional images and achieve better results than a two-dimensional model.
Because the target area in the three-dimensional image is often similar to the adjacent tissue in strength, the two-dimensional model and the three-dimensional model are often prone to generating false separation when being segmented, and the segmentation performance is seriously affected. In three-dimensional images, the target area typically occupies only a small portion, which means that a large portion of the data is useless. Too much garbage can place an excessive burden on the computer during training. Therefore, the application introduces a two-stage strategy, and the irrelevant area in the image is removed through the extraction of the interested area in the first stage, thereby greatly reducing the calculated amount in the second stage.
To alleviate the performance degradation caused by the similarity of the target area to the background intensity, researchers have introduced attention methods in natural language processing. The method enables the model to concentrate on important information and ignore unimportant information, and improves segmentation performance. With the rapid development of deep learning, a series of attention methods have been proposed and have shown good results. However, most of these methods are designed for two-dimensional images, and are difficult to use directly in three-dimensional models without considering three-dimensional images. Therefore, the application provides a dimension decomposition attention module for three-dimensional image segmentation, which improves the segmentation accuracy.
In order to realize accurate and efficient three-dimensional image segmentation, the application provides a two-stage U-Net method based on dimension decomposition attention. First, a 3D U-Net is used as a rough segmentation network, and rough positioning is performed for region of interest extraction. In the second stage, the region of interest extracted in the first stage is used for segmentation, and the attention of the dimension decomposition space and the attention of the dimension decomposition channel are added to improve the final segmentation effect.
Disclosure of Invention
The application aims to provide a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which is used for enhancing the weight of a region of interest by utilizing all information in a three-dimensional image and solving the problem of segmentation performance reduction caused by the fact that the strength of a target region is close to that of an adjacent tissue.
In order to achieve the above object, the present application provides a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which is characterized in that a two-stage strategy and dimension decomposition attention are used to segment a three-dimensional image; the method comprises the following steps:
step S1: preprocessing and rough segmentation are carried out on an input image so as to obtain an interested region through clipping;
step S2: downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weights;
step S3: sending the depth features into a dimension decomposition channel attention to obtain channel weights;
step S4: and multiplying the depth feature with the space weight and the channel weight to obtain a final feature, and sending the final feature into an up-sampling network to obtain a segmentation result.
Preferably, the preprocessing and coarse segmentation of the input image to obtain the region of interest by cropping comprises:
cutting and readjusting the input image to a uniform size, and removing the influence of the image edge information;
sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results; and performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.
Preferably, after downsampling the obtained region of interest to obtain depth features, the method further comprises feeding the obtained depth features into a dimension decomposition spatial attention module to generate spatial features.
Preferably, said feeding the depth features into a dimension decomposition spatial attention to obtain spatial weights comprises:
the dimension decomposition space attention firstly generates a space feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O H And width direction feature O W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f 1 As shown in formula f 1 =δ 1 (F 1 ([t(O H ),t(O W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta 1 Is a nonlinear activation function Relu; then the transverse intermediate feature f 1 And depth direction feature O D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid s Wherein the spatial weight F s =δ 2 (F 2 ([t(f 1 ),t(O D )]) Of delta), wherein delta 2 Sigmoid is a nonlinear activation function.
Preferably, said feeding the depth features into the dimension decomposition channel attention to obtain channel weights comprises: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
generating channel characteristics f by pooling operation and convolution operation of input depth characteristics c Wherein the channel characteristics f c =δ 3 (F 3 (avg (X))), where X is the depth feature of the input, avg represents the global average pooling operation, F 3 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 3 Is a nonlinear activation function Relu, f c ∈R 3c×1×1×1 Obtaining a dimension channel feature set;
the obtained dimension channel feature set is split into three intermediate features and then is combined with one-dimensional direction feature O H 、O W And O D Respectively connecting the channels, and sending into a three-dimensional convolution with a convolution kernel size of 1 multiplied by 1 to be fused to obtain H, W, D three-dimensional channel characteristics, wherein the characteristic f in the H dimension H ∈R c×1×1×1 From formula f H =δ H (F H ([O H ,s(f c )]) Obtained by, among others [,]representing splicing along the channel dimension, s is a splitting operation, F H Is a convolution of 1 x 1 delta H Is a nonlinear activation function Relu;
the characteristics of the W dimension and the D dimension are represented by formula f W =δ W (F W ([O W ,s(f c )]) And f) D =δ D (F D ([O D ,s(f c )]) Obtained;
after the obtained three-dimensional channel characteristics are connected according to the channel, the three-dimensional channel characteristics are obtained through a formula F c =δ 4 (F 4 ([f H ,f W ,f Z ]) Generating final channel weights F) c Wherein F 4 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 4 Sigmoid is a nonlinear activation function.
Preferably, the multiplying the depth feature with the spatial weight and the channel weight to obtain a final feature, and sending the final feature to the upsampling network to obtain a segmentation result includes:
depth feature F and spatial weight F s Sum channel weight F c According to formula F f =F×F s ×F c And fusing to obtain final features, and sending the final features into an up-sampling network to obtain a segmentation result.
According to another aspect of the present application, there is provided a two-stage three-dimensional image segmentation system based on dimension resolution attention, characterized in that a three-dimensional image is segmented using a two-stage strategy and dimension resolution attention; the system comprises:
the clipping device is used for preprocessing and roughly dividing the input image to obtain an interested region through clipping;
the sampling device is used for downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weight;
obtaining means for sending the depth features into a dimension decomposition channel attention to obtain channel weights;
and the segmentation device is used for multiplying the depth characteristic with the space weight and the channel weight to obtain a final characteristic, and sending the final characteristic into the up-sampling network to obtain a segmentation result.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a block diagram of a two-stage method based on dimension resolution attention provided by an embodiment of the present disclosure;
FIG. 2 is a diagram of a dimension decomposition spatial attention structure provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a diagram of a dimension decomposition channel attention structure provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow chart of a two-stage three-dimensional image segmentation method based on dimension decomposition attention provided by an embodiment of the present disclosure; and
fig. 5 is a schematic structural diagram of a two-stage three-dimensional image segmentation system based on dimension decomposition attention according to an embodiment of the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any innovative effort, are intended to be within the scope of the application.
The application aims to provide a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which is used for enhancing the weight of a region of interest by utilizing all information in a three-dimensional image and solving the problem of segmentation performance reduction caused by the fact that the strength of a target region is close to that of an adjacent tissue.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
Fig. 1 is a two-stage method structure diagram based on dimension decomposition attention provided by an embodiment of the present disclosure, fig. 2 is a dimension decomposition space attention structure diagram provided by an embodiment of the present disclosure, and fig. 3 is a dimension decomposition channel attention structure diagram provided by an embodiment of the present disclosure. Referring to fig. 1 and 2, the two-stage three-dimensional image segmentation method based on dimension decomposition attention provided by the application in fig. 3 specifically comprises the following steps:
step S1: the input image is first trimmed to a uniform size after clipping the edges to remove the effect of the image edge information. And then sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results. And finally, performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.
Step S2: firstly, the region of interest obtained in the step S1 is sent to a downsampling network, and depth characteristics are obtained. And then sending the obtained depth features to a dimension decomposition space attention module to generate space features. The dimensional decomposition spatial attention includes a dimensional decomposition and a dimensional decomposition spatial attention generation section.
The dimension decomposition part firstly carries out pooling operation along the channel direction to generate space characteristics according to the traditional space attention method, and compresses channel information for subsequent dimension characteristic decomposition. For the generated spatial features, three pooling kernels with the sizes of (H, 1), (1, W, 1) and (1, D) are used for respectively carrying out feature decomposition along three dimensions of the 3D image to obtain the height direction feature O H Width direction feature O W And depth direction feature O D . Wherein the height direction feature O H Output O at h of (2) h Given by equation (1).
Similarly, width direction feature O W Output O at w w And an output O at depth d d Are given by equations (2) and (3), respectively.
The dimension decomposition space attention generation part firstly uses the height direction characteristic O H And the width direction O W After expansion to the same size, concatenation is performed, and then a 3 x 3 convolution change F is sent 1 And (3) performing feature fusion to obtain a formula (4).
f 1 =δ 1 (F 1 ([t(O H ),t(O W )])) (4)
Wherein [,]represents stitching along the channel dimension, t represents expansion, delta 1 Is a nonlinear activation function Relu, f 1 ∈R 1×H×W×1 Is a transverse intermediate feature for feature fusion of spatial information in the height direction and the depth direction. The resulting transverse middle f 1 Features and depth-direction features O D The sign is extended to the same size and then connected again, and a three-dimensional convolution F with the convolution kernel size of 3 multiplied by 3 is also sent 2 And (3) carrying out feature fusion and outputting space weights as shown in a formula (5).
F s =δ 2 (F 2 ([t(f 1 ),t(O D )])) (5)
Wherein delta 2 Sigmoid is a nonlinear activation function.
Step S3: the depth features are simultaneously fed into the dimension decomposition channel attention to obtain channel weights. The dimension decomposition channel attention comprises two parts, namely dimension decomposition and dimension decomposition channel attention generation. Wherein the dimension decomposition part is the same as the dimension decomposition process described in step S2.
The dimension decomposition channel attention generation part firstly generates a channel characteristic by global average pooling and compressing space information for subsequent dimension channel characteristic generation.
After the channel characteristics are generated, a dimension channel characteristic set is obtained through a formula (6).
f c =δ 3 (F 3 (avg(X))) (6)
Where X is the depth feature of the input, avg represents the global average pooling operation, F 3 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 3 Is a nonlinear activation function Relu, f c ∈R 3c×1×1×1 And obtaining a dimension channel characteristic set. Dividing the obtained dimension channel feature set, connecting the divided dimension channel feature set with the generated direction feature according to the channel, and sending the divided dimension channel feature set into a three-dimensional convolution fusion with a convolution kernel size of 1 multiplied by 1, wherein the output f in the H dimension H ∈R c×1×1×1 Given by equation (7).
f H =δ H (F H ([O H ,s(f c )])) (7)
Wherein [,]representing stitching along the channel dimension, s is split operation, F H Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 H Is a nonlinear activation function Relu. The outputs of the W-dimension and the Z-dimension are given by formulas (8), (9) similarly.
f W =δ W (F W ([O W ,s(f c )])) (8)
f D =δ D (F D ([O D ,s(f c )])) (9)
And (3) after the obtained three-dimensional channel characteristics are connected according to the channels, generating final channel weights through a formula (10).
F c =δ 4 (F 4 ([f H ,f W ,f D ])) (10)
Wherein F is 3 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 3 Sigmoid is a nonlinear activation function.
Step S4: depth feature F and spatial weight F obtained in step S3 and step S2 s Sum channel weight F c According to formula F f =F×F s ×F c And fusing to generate final characteristics, and sending the final characteristics into an up-sampling network to obtain a final segmentation result.
Fig. 4 is a flowchart of a two-stage three-dimensional image segmentation method based on dimension decomposition attention according to an embodiment of the present disclosure. As shown in fig. 4, the two-stage three-dimensional image segmentation method based on dimension decomposition attention includes:
step 401, preprocessing and rough segmentation are performed on the input image to obtain a region of interest through cropping, for example, including: cutting and readjusting the input image to a uniform size, and removing the influence of the image edge information; sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results; and performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.
Step 402, downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weights. After downsampling the obtained region of interest to obtain depth features, the method further comprises the step of sending the obtained depth features to a dimension decomposition spatial attention module to generate spatial features.
In one embodiment, feeding the depth features into a dimensional decomposition spatial attention to obtain spatial weights comprises: the dimension decomposition space attention firstly generates a space feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O H And width direction feature O W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f 1 As shown in formula f 1 =δ 1 (F 1 ([t(O H ),t(O W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta 1 Is not aA linear activation function Relu; then the transverse intermediate feature f 1 And depth direction feature O D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid s Wherein the spatial weight F s =δ 2 (F 2 ([t(f 1 ),t(O D )]) Of delta), wherein delta 2 Sigmoid is a nonlinear activation function.
Step 403, feeding the depth features into a dimension decomposition channel attention to obtain channel weights, e.g., comprising: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
generating channel characteristics f by pooling operation and convolution operation of input depth characteristics c Wherein the channel characteristics f c =δ 3 (F 3 (avg (X))), where X is the depth feature of the input, avg represents the global average pooling operation, F 3 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 3 Is a nonlinear activation function Relu, f c ∈R 3c×1×1×1 Obtaining a dimension channel feature set;
the obtained dimension channel feature set is split into three intermediate features and then is combined with one-dimensional direction feature O H 、O W And O D Respectively connecting the channels, and sending into a three-dimensional convolution with a convolution kernel size of 1 multiplied by 1 to be fused to obtain H, W, D three-dimensional channel characteristics, wherein the characteristic f in the H dimension H ∈R c×1×1×1 From formula f H =δ H (F H ([O H ,s(f c )]) Obtained by, among others [,]representing splicing along the channel dimension, s is a splitting operation, F H Is a convolution of 1 x 1 delta H Is a nonlinear activation function Relu;
the characteristics of the W dimension and the D dimension are represented by formula f W =δ W (F W ([O W ,s(f c )]) And f) D =δ D (F D ([O D ,s(f c )]) Obtained;
after the obtained three-dimensional channel characteristics are connected according to the channel, the three-dimensional channel characteristics are obtained through a formula F c =δ 4 (F 4 ([f H ,f W ,f Z ]) Generating final channel weights F) c Wherein F 4 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 4 Sigmoid is a nonlinear activation function.
Step S4: multiplying the depth feature with the spatial weight and the channel weight to obtain a final feature, and sending the final feature to an upsampling network to obtain a segmentation result, including, for example:
depth feature F and spatial weight F s Sum channel weight F c According to formula F f =F×F s ×F c And fusing to obtain final features, and sending the final features into an up-sampling network to obtain a segmentation result.
Fig. 5 is a schematic structural diagram of a two-stage three-dimensional image segmentation system based on dimension decomposition attention according to an embodiment of the present disclosure. The system comprises:
a cropping device 501, configured to perform preprocessing and rough segmentation on an input image, so as to obtain a region of interest through cropping;
the sampling device 502 is configured to downsample the obtained region of interest to obtain a depth feature, and send the depth feature into a dimension decomposition space attention to obtain a space weight;
obtaining means 503 for feeding the depth features into the dimension decomposition channel attention to obtain channel weights;
the segmentation device 504 is configured to multiply the depth feature with the spatial weight and the channel weight to obtain a final feature, and send the final feature to the upsampling network to obtain a segmentation result.
The application provides a two-stage three-dimensional image segmentation method based on light weight attention of dimension decomposition, which solves the problem of segmentation performance reduction caused by the close strength of a target area and adjacent tissues. The method has the advantages of small calculated amount, high accuracy and strong robustness.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
Claims (7)
1. A two-stage three-dimensional image segmentation method based on dimension decomposition attention is characterized in that a two-stage strategy and dimension decomposition attention are used for segmenting a three-dimensional image; the method comprises the following steps:
step S1: preprocessing and rough segmentation are carried out on an input image so as to obtain an interested region through clipping;
step S2: downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weights;
step S3: sending the depth features into a dimension decomposition channel attention to obtain channel weights;
step S4: and multiplying the depth feature with the space weight and the channel weight to obtain a final feature, and sending the final feature into an up-sampling network to obtain a segmentation result.
2. The two-stage three-dimensional image segmentation method according to claim 1, wherein the preprocessing and coarse segmentation of the input image to obtain the region of interest by cropping comprises:
cutting and readjusting the input image to a uniform size, and removing the influence of the image edge information;
sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results; and performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.
3. The two-stage three-dimensional image segmentation method according to claim 1, further comprising, after downsampling the obtained region of interest to obtain depth features, feeding the obtained depth features into a dimension-resolved spatial attention module to generate spatial features.
4. A two-stage three-dimensional image segmentation method according to claim 3, wherein said feeding the depth features into a dimension decomposition spatial attention to obtain spatial weights comprises:
the dimension decomposition space attention firstly generates a space feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O H And width direction feature O W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f 1 As shown in formula f 1 =δ 1 (F 1 ([t(O H ),t(O W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta 1 Is a nonlinear activation function Relu; then the transverse intermediate feature f 1 And depth direction feature O D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid s Wherein the spatial weight F s =δ 2 (F 2 ([t(f 1 ),t(O D )]) Of delta), wherein delta 2 Is nonlinearThe activation function Sigmoid.
5. The two-stage three-dimensional image segmentation method according to claim 1, wherein the feeding depth features into the dimension-resolved channel attention to obtain channel weights comprises: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively H 、O W And O D H, W and D are the height, width and depth of the three-dimensional image respectively;
generating channel characteristics f by pooling operation and convolution operation of input depth characteristics c Wherein the channel characteristics f c =δ 3 (F 3 (avg (X))), where X is the depth feature of the input, avg represents the global average pooling operation, F 3 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 3 Is a nonlinear activation function Relu, f c ∈R 3c×1×1×1 Obtaining a dimension channel feature set;
the obtained dimension channel feature set is split into three intermediate features and then is combined with one-dimensional direction feature O H 、O W And O D Respectively connecting the channels, and sending into a three-dimensional convolution with a convolution kernel size of 1 multiplied by 1 to be fused to obtain H, W, D three-dimensional channel characteristics, wherein the characteristic f in the H dimension H ∈R c ×1×1× 1 From formula f H =δ H (F H ([O H ,s(f c )]) Obtained by, among others [,]representing splicing along the channel dimension, s is a splitting operation, F H Is a convolution of 1 x 1 delta H Is a nonlinear activation function Relu;
the characteristics of the W dimension and the D dimension are represented by formula f W =δ W (F W ([O W ,s(f c )]) And f) D =δ D (F D ([O D ,s(f c )]) Obtained;
after the obtained three-dimensional channel characteristics are connected according to the channel, the three-dimensional channel characteristics are obtained through a formula F c =δ 4 (F 4 ([f H ,f W ,f Z ]) Generating final channel weights F) c Wherein F 4 Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 4 Sigmoid is a nonlinear activation function.
6. The two-stage three-dimensional image segmentation method according to claim 1, wherein multiplying the depth feature by the spatial weight and the channel weight to obtain a final feature, and sending the final feature to an upsampling network to obtain a segmentation result comprises:
depth feature F and spatial weight F s Sum channel weight F c According to formula F f =F×F s ×F c And fusing to obtain final features, and sending the final features into an up-sampling network to obtain a segmentation result.
7. A two-stage three-dimensional image segmentation system based on dimension decomposition attention, which is characterized in that a three-dimensional image is segmented by using a two-stage strategy and the dimension decomposition attention; the system comprises:
the clipping device is used for preprocessing and roughly dividing the input image to obtain an interested region through clipping;
the sampling device is used for downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weight;
obtaining means for sending the depth features into a dimension decomposition channel attention to obtain channel weights;
and the segmentation device is used for multiplying the depth characteristic with the space weight and the channel weight to obtain a final characteristic, and sending the final characteristic into the up-sampling network to obtain a segmentation result.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310994348.6A CN117036703A (en) | 2023-08-09 | 2023-08-09 | A two-stage 3D image segmentation method based on dimensionality decomposition attention |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310994348.6A CN117036703A (en) | 2023-08-09 | 2023-08-09 | A two-stage 3D image segmentation method based on dimensionality decomposition attention |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117036703A true CN117036703A (en) | 2023-11-10 |
Family
ID=88633030
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310994348.6A Pending CN117036703A (en) | 2023-08-09 | 2023-08-09 | A two-stage 3D image segmentation method based on dimensionality decomposition attention |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117036703A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112927255A (en) * | 2021-02-22 | 2021-06-08 | 武汉科技大学 | Three-dimensional liver image semantic segmentation method based on context attention strategy |
| WO2021147257A1 (en) * | 2020-01-20 | 2021-07-29 | 上海商汤智能科技有限公司 | Network training method and apparatus, image processing method and apparatus, and electronic device and storage medium |
| CN114998307A (en) * | 2022-07-06 | 2022-09-02 | 重庆大学 | Two-stage full 3D abdominal organ segmentation method and system based on dual-resolution network |
| CN115018864A (en) * | 2022-06-17 | 2022-09-06 | 东南大学 | Three-stage liver tumor image segmentation method based on adaptive preprocessing |
| CN116229067A (en) * | 2023-02-22 | 2023-06-06 | 上海理工大学 | Channel Attention-Based Segmentation Method for Hepatocellular Carcinoma CT Image |
-
2023
- 2023-08-09 CN CN202310994348.6A patent/CN117036703A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021147257A1 (en) * | 2020-01-20 | 2021-07-29 | 上海商汤智能科技有限公司 | Network training method and apparatus, image processing method and apparatus, and electronic device and storage medium |
| CN112927255A (en) * | 2021-02-22 | 2021-06-08 | 武汉科技大学 | Three-dimensional liver image semantic segmentation method based on context attention strategy |
| CN115018864A (en) * | 2022-06-17 | 2022-09-06 | 东南大学 | Three-stage liver tumor image segmentation method based on adaptive preprocessing |
| CN114998307A (en) * | 2022-07-06 | 2022-09-02 | 重庆大学 | Two-stage full 3D abdominal organ segmentation method and system based on dual-resolution network |
| CN116229067A (en) * | 2023-02-22 | 2023-06-06 | 上海理工大学 | Channel Attention-Based Segmentation Method for Hepatocellular Carcinoma CT Image |
Non-Patent Citations (1)
| Title |
|---|
| 张文君: "基于注意力机制的肝脏CT影像分割算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15) * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111563508A (en) | A Semantic Segmentation Method Based on Spatial Information Fusion | |
| CN117953224B (en) | Open vocabulary 3D panorama segmentation method and system | |
| CN113034563B (en) | Self-supervised monocular depth estimation method based on feature sharing | |
| CN114820469A (en) | Defect image sample generation method, system, medium, and device based on generation countermeasure network | |
| CN113096133A (en) | Method for constructing semantic segmentation network based on attention mechanism | |
| CN110992374A (en) | Hair refined segmentation method and system based on deep learning | |
| Sharma et al. | An efficient image super resolution model with dense skip connections between complex filter structures in Generative Adversarial Networks | |
| CN118037898B (en) | Text generation video method based on image guided video editing | |
| CN114240770B (en) | Image processing method, device, server and storage medium | |
| CN118968271A (en) | Indoor 3D scene understanding method and terminal based on multimodal feature embedding | |
| CN118708071A (en) | A multi-modal language generation method and system guided by multi-granularity visual information | |
| CN117455808A (en) | A lightweight image deblurring method and system | |
| CN113255646A (en) | A real-time scene text detection method | |
| CN110580462B (en) | Natural scene text detection method and system based on non-local network | |
| CN117132767A (en) | A small target detection method, device, equipment and readable storage medium | |
| CN116468820A (en) | Method and system for generating scene images from style-guided sketches | |
| CN117036703A (en) | A two-stage 3D image segmentation method based on dimensionality decomposition attention | |
| CN120182590A (en) | Method, processor, device and storage medium for well logging fracture image segmentation | |
| CN120411049A (en) | A micro motor armature defect detection method based on improved YOLOv11n | |
| CN115496912A (en) | Method and device for feature extraction of workpiece contour based on deep learning of virtual samples | |
| CN115908812B (en) | Attention-guided deformable self-attention semantic segmentation method | |
| CN118229725A (en) | Animal matting method, device, computer equipment and storage medium | |
| CN120339695B (en) | Object detection method and system based on LCSD-YOLO network | |
| CN119027822B (en) | Boundary-guided method for farmland extraction from agricultural remote sensing images | |
| CN120279368B (en) | A remote sensing image super-resolution fusion method based on frequency domain feature prior |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |