CN117036703A

CN117036703A - A two-stage 3D image segmentation method based on dimensionality decomposition attention

Info

Publication number: CN117036703A
Application number: CN202310994348.6A
Authority: CN
Inventors: 张国栋; 梁廷宇; 李彦林; 郭薇; 宫照煊
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-11-10

Abstract

The application provides a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which effectively solves the problem of segmentation performance reduction caused by the close strength of a target area and adjacent tissues. In the first stage of the method, a three-dimensional U-shaped network is used for positioning the region of interest, so that the interference of irrelevant tissues in the image is reduced, and input is provided for the second stage; and in the second stage, the dimension decomposition space attention and the dimension decomposition channel attention are added in the three-dimensional U-shaped network, so that the accurate segmentation of the target is realized. The dimension decomposition space attention and the dimension decomposition channel attention both decompose the characteristics output by the encoder into three one-dimensional direction characteristics of length, width and depth, and generate attention weights; the weight of the two attentions is multiplied by the characteristics of the encoder output as the input to the decoder, generating the final segmentation result. The application can effectively utilize all information of the three-dimensional image, enhance the weight of the region of interest and improve the segmentation performance.

Description

Two-stage three-dimensional image segmentation method based on dimension decomposition attention

Technical Field

The application relates to the field of computer vision, in particular to a two-stage three-dimensional image segmentation method and system based on dimension decomposition attention.

Background

Along with the development of deep learning, convolutional neural networks are widely applied to image segmentation tasks by virtue of the ability of the convolutional neural networks to automatically learn deep and more discriminative features from samples. Researchers have proposed a series of two-dimensional convolution models of image segmentation and exhibit excellent performance. But three-dimensional images can store more information than two-dimensional images, which is more efficient in real-world applications. Therefore, researchers have proposed a three-dimensional convolution model dedicated to three-dimensional images and achieve better results than a two-dimensional model.

Because the target area in the three-dimensional image is often similar to the adjacent tissue in strength, the two-dimensional model and the three-dimensional model are often prone to generating false separation when being segmented, and the segmentation performance is seriously affected. In three-dimensional images, the target area typically occupies only a small portion, which means that a large portion of the data is useless. Too much garbage can place an excessive burden on the computer during training. Therefore, the application introduces a two-stage strategy, and the irrelevant area in the image is removed through the extraction of the interested area in the first stage, thereby greatly reducing the calculated amount in the second stage.

To alleviate the performance degradation caused by the similarity of the target area to the background intensity, researchers have introduced attention methods in natural language processing. The method enables the model to concentrate on important information and ignore unimportant information, and improves segmentation performance. With the rapid development of deep learning, a series of attention methods have been proposed and have shown good results. However, most of these methods are designed for two-dimensional images, and are difficult to use directly in three-dimensional models without considering three-dimensional images. Therefore, the application provides a dimension decomposition attention module for three-dimensional image segmentation, which improves the segmentation accuracy.

In order to realize accurate and efficient three-dimensional image segmentation, the application provides a two-stage U-Net method based on dimension decomposition attention. First, a 3D U-Net is used as a rough segmentation network, and rough positioning is performed for region of interest extraction. In the second stage, the region of interest extracted in the first stage is used for segmentation, and the attention of the dimension decomposition space and the attention of the dimension decomposition channel are added to improve the final segmentation effect.

Disclosure of Invention

The application aims to provide a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which is used for enhancing the weight of a region of interest by utilizing all information in a three-dimensional image and solving the problem of segmentation performance reduction caused by the fact that the strength of a target region is close to that of an adjacent tissue.

In order to achieve the above object, the present application provides a two-stage three-dimensional image segmentation method based on dimension decomposition attention, which is characterized in that a two-stage strategy and dimension decomposition attention are used to segment a three-dimensional image; the method comprises the following steps:

step S1: preprocessing and rough segmentation are carried out on an input image so as to obtain an interested region through clipping;

step S2: downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weights;

step S3: sending the depth features into a dimension decomposition channel attention to obtain channel weights;

step S4: and multiplying the depth feature with the space weight and the channel weight to obtain a final feature, and sending the final feature into an up-sampling network to obtain a segmentation result.

Preferably, the preprocessing and coarse segmentation of the input image to obtain the region of interest by cropping comprises:

cutting and readjusting the input image to a uniform size, and removing the influence of the image edge information;

sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results; and performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.

Preferably, after downsampling the obtained region of interest to obtain depth features, the method further comprises feeding the obtained depth features into a dimension decomposition spatial attention module to generate spatial features.

Preferably, said feeding the depth features into a dimension decomposition spatial attention to obtain spatial weights comprises:

the dimension decomposition space attention firstly generates a space feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively ^H 、O ^W And O ^D H, W and D are the height, width and depth of the three-dimensional image respectively;

the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O ^H And width direction feature O ^W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f ₁ As shown in formula f ₁ ＝δ ₁ (F ₁ ([t(O ^H ),t(O ^W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta ₁ Is a nonlinear activation function Relu; then the transverse intermediate feature f ₁ And depth direction feature O ^D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid _s Wherein the spatial weight F _s ＝δ ₂ (F ₂ ([t(f ₁ ),t(O ^D )]) Of delta), wherein delta ₂ Sigmoid is a nonlinear activation function.

Preferably, said feeding the depth features into the dimension decomposition channel attention to obtain channel weights comprises: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively ^H 、O ^W And O ^D H, W and D are the height, width and depth of the three-dimensional image respectively;

generating channel characteristics f by pooling operation and convolution operation of input depth characteristics _c Wherein the channel characteristics f _c ＝δ ₃ (F ₃ (avg (X))), where X is the depth feature of the input, avg represents the global average pooling operation, F ₃ Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 ₃ Is a nonlinear activation function Relu, f _c ∈R ^3c×1×1×1 Obtaining a dimension channel feature set;

the obtained dimension channel feature set is split into three intermediate features and then is combined with one-dimensional direction feature O ^H 、O ^W And O ^D Respectively connecting the channels, and sending into a three-dimensional convolution with a convolution kernel size of 1 multiplied by 1 to be fused to obtain H, W, D three-dimensional channel characteristics, wherein the characteristic f in the H dimension _H ∈R ^c×1×1×1 From formula f _H ＝δ _H (F _H ([O ^H ,s(f _c )]) Obtained by, among others [,]representing splicing along the channel dimension, s is a splitting operation, F _H Is a convolution of 1 x 1 delta _H Is a nonlinear activation function Relu;

the characteristics of the W dimension and the D dimension are represented by formula f _W ＝δ _W (F _W ([O ^W ,s(f _c )]) And f) _D ＝δ _D (F _D ([O ^D ,s(f _c )]) Obtained;

after the obtained three-dimensional channel characteristics are connected according to the channel, the three-dimensional channel characteristics are obtained through a formula F _c ＝δ ₄ (F ₄ ([f _H ,f _W ,f _Z ]) Generating final channel weights F) _c Wherein F ₄ Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 ₄ Sigmoid is a nonlinear activation function.

Preferably, the multiplying the depth feature with the spatial weight and the channel weight to obtain a final feature, and sending the final feature to the upsampling network to obtain a segmentation result includes:

depth feature F and spatial weight F _s Sum channel weight F _c According to formula F _f ＝F×F _s ×F _c And fusing to obtain final features, and sending the final features into an up-sampling network to obtain a segmentation result.

According to another aspect of the present application, there is provided a two-stage three-dimensional image segmentation system based on dimension resolution attention, characterized in that a three-dimensional image is segmented using a two-stage strategy and dimension resolution attention; the system comprises:

the clipping device is used for preprocessing and roughly dividing the input image to obtain an interested region through clipping;

the sampling device is used for downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weight;

obtaining means for sending the depth features into a dimension decomposition channel attention to obtain channel weights;

and the segmentation device is used for multiplying the depth characteristic with the space weight and the channel weight to obtain a final characteristic, and sending the final characteristic into the up-sampling network to obtain a segmentation result.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a block diagram of a two-stage method based on dimension resolution attention provided by an embodiment of the present disclosure;

FIG. 2 is a diagram of a dimension decomposition spatial attention structure provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a diagram of a dimension decomposition channel attention structure provided in accordance with an embodiment of the present disclosure;

FIG. 4 is a flow chart of a two-stage three-dimensional image segmentation method based on dimension decomposition attention provided by an embodiment of the present disclosure; and

fig. 5 is a schematic structural diagram of a two-stage three-dimensional image segmentation system based on dimension decomposition attention according to an embodiment of the present disclosure.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any innovative effort, are intended to be within the scope of the application.

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.

Fig. 1 is a two-stage method structure diagram based on dimension decomposition attention provided by an embodiment of the present disclosure, fig. 2 is a dimension decomposition space attention structure diagram provided by an embodiment of the present disclosure, and fig. 3 is a dimension decomposition channel attention structure diagram provided by an embodiment of the present disclosure. Referring to fig. 1 and 2, the two-stage three-dimensional image segmentation method based on dimension decomposition attention provided by the application in fig. 3 specifically comprises the following steps:

step S1: the input image is first trimmed to a uniform size after clipping the edges to remove the effect of the image edge information. And then sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results. And finally, performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.

Step S2: firstly, the region of interest obtained in the step S1 is sent to a downsampling network, and depth characteristics are obtained. And then sending the obtained depth features to a dimension decomposition space attention module to generate space features. The dimensional decomposition spatial attention includes a dimensional decomposition and a dimensional decomposition spatial attention generation section.

The dimension decomposition part firstly carries out pooling operation along the channel direction to generate space characteristics according to the traditional space attention method, and compresses channel information for subsequent dimension characteristic decomposition. For the generated spatial features, three pooling kernels with the sizes of (H, 1), (1, W, 1) and (1, D) are used for respectively carrying out feature decomposition along three dimensions of the 3D image to obtain the height direction feature O ^H Width direction feature O ^W And depth direction feature O ^D . Wherein the height direction feature O ^H Output O at h of (2) ^h Given by equation (1).

Similarly, width direction feature O ^W Output O at w ^w And an output O at depth d ^d Are given by equations (2) and (3), respectively.

The dimension decomposition space attention generation part firstly uses the height direction characteristic O ^H And the width direction O ^W After expansion to the same size, concatenation is performed, and then a 3 x 3 convolution change F is sent ₁ And (3) performing feature fusion to obtain a formula (4).

f ₁ ＝δ ₁ (F ₁ ([t(O ^H ),t(O ^W )])) (4)

Wherein [,]represents stitching along the channel dimension, t represents expansion, delta ₁ Is a nonlinear activation function Relu, f ₁ ∈R ^1×H×W×1 Is a transverse intermediate feature for feature fusion of spatial information in the height direction and the depth direction. The resulting transverse middle f ₁ Features and depth-direction features O ^D The sign is extended to the same size and then connected again, and a three-dimensional convolution F with the convolution kernel size of 3 multiplied by 3 is also sent ₂ And (3) carrying out feature fusion and outputting space weights as shown in a formula (5).

F _s ＝δ ₂ (F ₂ ([t(f ₁ ),t(O ^D )])) (5)

Wherein delta ₂ Sigmoid is a nonlinear activation function.

Step S3: the depth features are simultaneously fed into the dimension decomposition channel attention to obtain channel weights. The dimension decomposition channel attention comprises two parts, namely dimension decomposition and dimension decomposition channel attention generation. Wherein the dimension decomposition part is the same as the dimension decomposition process described in step S2.

The dimension decomposition channel attention generation part firstly generates a channel characteristic by global average pooling and compressing space information for subsequent dimension channel characteristic generation.

After the channel characteristics are generated, a dimension channel characteristic set is obtained through a formula (6).

f _c ＝δ ₃ (F ₃ (avg(X))) (6)

Where X is the depth feature of the input, avg represents the global average pooling operation, F ₃ Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 ₃ Is a nonlinear activation function Relu, f _c ∈R ^3c×1×1×1 And obtaining a dimension channel characteristic set. Dividing the obtained dimension channel feature set, connecting the divided dimension channel feature set with the generated direction feature according to the channel, and sending the divided dimension channel feature set into a three-dimensional convolution fusion with a convolution kernel size of 1 multiplied by 1, wherein the output f in the H dimension _H ∈R ^c×1×1×1 Given by equation (7).

f _H ＝δ _H (F _H ([O ^H ,s(f _c )])) (7)

Wherein [,]representing stitching along the channel dimension, s is split operation, F _H Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 _H Is a nonlinear activation function Relu. The outputs of the W-dimension and the Z-dimension are given by formulas (8), (9) similarly.

f _W ＝δ _W (F _W ([O ^W ,s(f _c )])) (8)

f _D ＝δ _D (F _D ([O ^D ,s(f _c )])) (9)

And (3) after the obtained three-dimensional channel characteristics are connected according to the channels, generating final channel weights through a formula (10).

F _c ＝δ ₄ (F ₄ ([f _H ,f _W ,f _D ])) (10)

Wherein F is ₃ Delta is a three-dimensional convolution with a convolution kernel size of 1 x 1 ₃ Sigmoid is a nonlinear activation function.

Step S4: depth feature F and spatial weight F obtained in step S3 and step S2 _s Sum channel weight F _c According to formula F _f ＝F×F _s ×F _c And fusing to generate final characteristics, and sending the final characteristics into an up-sampling network to obtain a final segmentation result.

Fig. 4 is a flowchart of a two-stage three-dimensional image segmentation method based on dimension decomposition attention according to an embodiment of the present disclosure. As shown in fig. 4, the two-stage three-dimensional image segmentation method based on dimension decomposition attention includes:

step 401, preprocessing and rough segmentation are performed on the input image to obtain a region of interest through cropping, for example, including: cutting and readjusting the input image to a uniform size, and removing the influence of the image edge information; sending the preprocessed images with uniform sizes into a rough segmentation network to obtain rough segmentation results; and performing corrosion operation and maximum communication region positioning operation on the rough segmentation result, and cutting the original image and the real label by using a rectangular block with a fixed size to obtain the region of interest.

Step 402, downsampling the obtained region of interest to obtain depth features, and sending the depth features into the dimension decomposition space attention to obtain space weights. After downsampling the obtained region of interest to obtain depth features, the method further comprises the step of sending the obtained depth features to a dimension decomposition spatial attention module to generate spatial features.

In one embodiment, feeding the depth features into a dimensional decomposition spatial attention to obtain spatial weights comprises: the dimension decomposition space attention firstly generates a space feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively ^H 、O ^W And O ^D H, W and D are the height, width and depth of the three-dimensional image respectively;

the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O ^H And width direction feature O ^W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f ₁ As shown in formula f ₁ ＝δ ₁ (F ₁ ([t(O ^H ),t(O ^W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta ₁ Is not aA linear activation function Relu; then the transverse intermediate feature f ₁ And depth direction feature O ^D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid _s Wherein the spatial weight F _s ＝δ ₂ (F ₂ ([t(f ₁ ),t(O ^D )]) Of delta), wherein delta ₂ Sigmoid is a nonlinear activation function.

Step 403, feeding the depth features into a dimension decomposition channel attention to obtain channel weights, e.g., comprising: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively ^H 、O ^W And O ^D H, W and D are the height, width and depth of the three-dimensional image respectively;

Step S4: multiplying the depth feature with the spatial weight and the channel weight to obtain a final feature, and sending the final feature to an upsampling network to obtain a segmentation result, including, for example:

Fig. 5 is a schematic structural diagram of a two-stage three-dimensional image segmentation system based on dimension decomposition attention according to an embodiment of the present disclosure. The system comprises:

a cropping device 501, configured to perform preprocessing and rough segmentation on an input image, so as to obtain a region of interest through cropping;

the sampling device 502 is configured to downsample the obtained region of interest to obtain a depth feature, and send the depth feature into a dimension decomposition space attention to obtain a space weight;

obtaining means 503 for feeding the depth features into the dimension decomposition channel attention to obtain channel weights;

the segmentation device 504 is configured to multiply the depth feature with the spatial weight and the channel weight to obtain a final feature, and send the final feature to the upsampling network to obtain a segmentation result.

The application provides a two-stage three-dimensional image segmentation method based on light weight attention of dimension decomposition, which solves the problem of segmentation performance reduction caused by the close strength of a target area and adjacent tissues. The method has the advantages of small calculated amount, high accuracy and strong robustness.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. A two-stage three-dimensional image segmentation method based on dimension decomposition attention is characterized in that a two-stage strategy and dimension decomposition attention are used for segmenting a three-dimensional image; the method comprises the following steps:

2. The two-stage three-dimensional image segmentation method according to claim 1, wherein the preprocessing and coarse segmentation of the input image to obtain the region of interest by cropping comprises:

3. The two-stage three-dimensional image segmentation method according to claim 1, further comprising, after downsampling the obtained region of interest to obtain depth features, feeding the obtained depth features into a dimension-resolved spatial attention module to generate spatial features.

4. A two-stage three-dimensional image segmentation method according to claim 3, wherein said feeding the depth features into a dimension decomposition spatial attention to obtain spatial weights comprises:

the three obtained one-dimensional direction features are fused according to the sequence of H, W and D, and space weight is generated, which comprises the following steps: first, the height direction characteristic O ^H And width direction feature O ^W By unifying the dilation operation to a (H, W, 1) size, and then by a three-dimensional convolution operation with a convolution kernel size of 3 x 3, fusing the features in the height direction and the width direction to obtain a transverse intermediate feature f ₁ As shown in formula f ₁ ＝δ ₁ (F ₁ ([t(O ^H ),t(O ^W )]) Shown, wherein [,]represents the connection along the channel, t represents the expansion operation, delta ₁ Is a nonlinear activation function Relu; then the transverse intermediate feature f ₁ And depth direction feature O ^D The expansion operation is unified to the size of (H, W, D), and the three-dimensional convolution with the convolution kernel size of 3 multiplied by 3 is used for fusion, and the spatial weight F is obtained through the activation function Sigmoid _s Wherein the spatial weight F _s ＝δ ₂ (F ₂ ([t(f ₁ ),t(O ^D )]) Of delta), wherein delta ₂ Is nonlinearThe activation function Sigmoid.

5. The two-stage three-dimensional image segmentation method according to claim 1, wherein the feeding depth features into the dimension-resolved channel attention to obtain channel weights comprises: the dimension decomposition channel attention firstly generates a spatial feature X through pooling operation on the input depth feature; the spatial feature X is decomposed into three one-dimensional directional features O by three pooling kernels of (H, 1), (1, W, 1) and (1, D) sizes respectively ^H 、O ^W And O ^D H, W and D are the height, width and depth of the three-dimensional image respectively;

the obtained dimension channel feature set is split into three intermediate features and then is combined with one-dimensional direction feature O ^H 、O ^W And O ^D Respectively connecting the channels, and sending into a three-dimensional convolution with a convolution kernel size of 1 multiplied by 1 to be fused to obtain H, W, D three-dimensional channel characteristics, wherein the characteristic f in the H dimension _H ∈R ^c ×1×1× ¹ From formula f _H ＝δ _H (F _H ([O ^H ,s(f _c )]) Obtained by, among others [,]representing splicing along the channel dimension, s is a splitting operation, F _H Is a convolution of 1 x 1 delta _H Is a nonlinear activation function Relu;

6. The two-stage three-dimensional image segmentation method according to claim 1, wherein multiplying the depth feature by the spatial weight and the channel weight to obtain a final feature, and sending the final feature to an upsampling network to obtain a segmentation result comprises:

7. A two-stage three-dimensional image segmentation system based on dimension decomposition attention, which is characterized in that a three-dimensional image is segmented by using a two-stage strategy and the dimension decomposition attention; the system comprises: