CN113159051B

CN113159051B - A Lightweight Semantic Segmentation Method for Remote Sensing Images Based on Edge Decoupling

Info

Publication number: CN113159051B
Application number: CN202110456921.9A
Authority: CN
Inventors: 段锦; 刘高天; 祝勇; 赵言; 王乐泉
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2022-11-25
Anticipated expiration: 2041-04-27
Also published as: CN113159051A

Abstract

The invention discloses a remote sensing image lightweight semantic segmentation method based on edge decoupling, belongs to the field of computer vision, and can be used for intelligent interpretation in the field of remote sensing images. On one hand, the number of model parameters is reduced and network calculation overhead is reduced through a Ghost bottleneck module and deep separable convolution, the efficiency of semantic segmentation of the remote sensing image is effectively improved, and the lightweight of the proposed semantic segmentation network is realized; on the other hand, the precision of semantic segmentation is improved through the multi-scale feature pyramid, the global context module and the edge decoupling module, so that the proposed lightweight semantic segmentation network can accurately and efficiently realize the semantic segmentation of the remote sensing image, and the edge details of the remote sensing image are further refined.

Description

A Lightweight Semantic Segmentation Method for Remote Sensing Images Based on Edge Decoupling

技术领域technical field

本发明属于计算机视觉领域，尤其涉及一种基于边缘解耦的遥感图像轻量化语义分割方法,可用于遥感图像领域的智能化解译。The invention belongs to the field of computer vision, and in particular relates to a lightweight semantic segmentation method for remote sensing images based on edge decoupling, which can be used for intelligent interpretation in the field of remote sensing images.

背景技术Background technique

高分辨率遥感图像中包含着如道路、建筑等目标详细的色彩、纹理特征等信息，对这些信息的智能解译，在军、农、环境科学等诸多领域有着重要意义。为完成遥感图像的解析分类任务，图像中的每个像素都应该被分配一个与其所属类别相关联的标签，这与图像语义分割的目的是一致的。High-resolution remote sensing images contain detailed information such as color and texture features of targets such as roads and buildings. The intelligent interpretation of this information is of great significance in many fields such as military, agricultural, and environmental sciences. In order to complete the parsing and classification tasks of remote sensing images, each pixel in the image should be assigned a label associated with its category, which is consistent with the purpose of image semantic segmentation.

受到深度学习的启发，这项任务拥有了更好的发展方向，特别是全卷积网络的提出，使基于深度学习的图像语义分割方法走向了主流。出现了如UNet，SegNet，PSPNet，Deeplab系列等方法，它们相比传统的遥感图像分割算法更具优势。这些算法在应用于高分辨率遥感图像的语义分割时，虽都能保证较为出色的分割效果，但往往由于图像尺寸较大，网络结构复杂导致训练速度、分割效率较低。再者由于遥感图像目标多样性过于丰富、目标类别分布不均衡、不同目标类别间边缘易重叠，导致无法实现对遥感图像的精细化分割。Inspired by deep learning, this task has a better development direction, especially the proposal of full convolutional network, which makes the image semantic segmentation method based on deep learning go to the mainstream. There have been methods such as UNet, SegNet, PSPNet, Deeplab series, etc., which have more advantages than traditional remote sensing image segmentation algorithms. Although these algorithms can guarantee excellent segmentation results when applied to the semantic segmentation of high-resolution remote sensing images, the training speed and segmentation efficiency are often low due to the large image size and complex network structure. Furthermore, due to the rich diversity of remote sensing images, the unbalanced distribution of target categories, and the easy overlapping of edges between different target categories, it is impossible to achieve fine segmentation of remote sensing images.

发明内容Contents of the invention

本发明目的在于提供一种基于边缘解耦的遥感图像轻量化语义分割方法，以解决现有语义分割方法在面对高分辨率遥感图像时，因参数量大、计算开销大所导致的推理速度慢分割效率低，以及不同类别目标边缘易重叠导致边缘分割效果不理想的技术问题。The purpose of the present invention is to provide a lightweight semantic segmentation method for remote sensing images based on edge decoupling, so as to solve the problem of inference speed caused by the large amount of parameters and large computational overhead of existing semantic segmentation methods when facing high-resolution remote sensing images Slow segmentation is inefficient, and the edges of different types of objects are easy to overlap, which leads to unsatisfactory edge segmentation results.

为实现上述目的，本发明提出一种基于边缘解耦的遥感图像轻量化语义分割方法，具体技术方案如下：In order to achieve the above purpose, the present invention proposes a remote sensing image lightweight semantic segmentation method based on edge decoupling. The specific technical scheme is as follows:

一种基于边缘解耦的遥感图像轻量化语义分割方法，包括语义分割网络的搭建和训练以及测试，语义分割网络为一个具有双分支结构的轻量级编解码网络，基于训练样本完成语义分割网络训练后，将待测试遥感图像输入到语义分割网络中，输出最终的遥感图像语义分割结果；A lightweight semantic segmentation method for remote sensing images based on edge decoupling, including the construction, training and testing of the semantic segmentation network. The semantic segmentation network is a lightweight codec network with a dual-branch structure, and the semantic segmentation network is completed based on training samples After training, input the remote sensing image to be tested into the semantic segmentation network, and output the final semantic segmentation result of the remote sensing image;

包括以下步骤，且按以下步骤顺次进行：Include the following steps, and follow the steps in order:

步骤1、获取遥感图像数据集，准备训练和测试样本；Step 1. Obtain a remote sensing image dataset and prepare training and testing samples;

步骤2、搭建具有双分支结构的轻量级编解码语义分割网络；Step 2. Build a lightweight codec semantic segmentation network with a dual-branch structure;

步骤3、将训练样本输入到编码器中，通过特征提取进行特征编码，获取编码特征图F_E；Step 3. Input the training sample into the encoder, perform feature encoding through feature extraction, and obtain the encoded feature map F _E ;

步骤4、将编码特征图F_E输入到解码器中，进行边缘特征细化处理和上采样操作，获取解码特征图F_D；Step 4. Input the encoded feature map F _E into the decoder, perform edge feature refinement processing and up-sampling operations, and obtain the decoded feature map F _D ;

步骤5、将解码特征图输入到分类器中进行像素级分类预测，输出分割结果，通过监督机制，对语义分割网络进行有监督训练；Step 5. Input the decoded feature map into the classifier for pixel-level classification prediction, output the segmentation result, and conduct supervised training on the semantic segmentation network through the supervision mechanism;

步骤6、利用训练样本对步骤2搭建的语义分割网络，按照步骤(3-5)进行语义分割网络的训练；Step 6, using the training sample to carry out the training of the semantic segmentation network according to steps (3-5) to the semantic segmentation network built in step 2;

步骤7、将待测试样本输入到训练好的语义分割网络中，输出最终的遥感图像语义分割结果，完成语义分割网络的测试。Step 7. Input the sample to be tested into the trained semantic segmentation network, output the final semantic segmentation result of the remote sensing image, and complete the test of the semantic segmentation network.

进一步，所述步骤2搭建具有双分支结构的轻量级编解码语义分割网络，且该网络包含了编码器、解码器、分类器三个部分；Further, the step 2 builds a lightweight encoding and decoding semantic segmentation network with a double-branch structure, and the network includes three parts: an encoder, a decoder, and a classifier;

所述编码器是一个双分支结构，含有一个全局下采样块、一个轻量化双分支子网络、一个全局特征融合模块；The encoder is a dual-branch structure, including a global downsampling block, a lightweight dual-branch sub-network, and a global feature fusion module;

所述解码器由一个轻量化边缘解耦模块和一个上采样模块构成；The decoder is composed of a lightweight edge decoupling module and an upsampling module;

所述分类器由常规卷积层和SoftMax层构成。The classifier consists of regular convolutional layers and SoftMax layers.

进一步，所述步骤3中获取编码特征图F_E，包括如下步骤：Further, obtaining the encoded feature map F _E in the step 3 includes the following steps:

步骤3.1、将训练样本输入到编码器的全局下采样块中，获取低级特征图；Step 3.1. Input the training samples into the global downsampling block of the encoder to obtain low-level feature maps;

步骤3.2、将低级特征图输入至编码器中的轻量化双分支子网络中，获取空间细节特征图和抽象语义特征图；Step 3.2. Input the low-level feature map into the lightweight dual-branch sub-network in the encoder to obtain the spatial detail feature map and the abstract semantic feature map;

步骤3.3、将获取的空间细节特征图和抽象语义特征图，通过编码器的全局特征融合块，进行多级特征融合，输出编码特征图F_E。Step 3.3. The acquired spatial detail feature map and abstract semantic feature map are used for multi-level feature fusion through the global feature fusion block of the encoder, and an encoded feature map F _E is output.

进一步，所述步骤3.1中的全局下采样块由3个部分构成，其一是1个常规卷积，其二是1个Ghost瓶颈模块，其三是1个全局上下文模块；Further, the global downsampling block in step 3.1 is composed of 3 parts, one of which is a conventional convolution, the other is a Ghost bottleneck module, and the third is a global context module;

输入样本经过全局下采样块后，将生成一个输出分辨率为原输入1/4的低级特征图，作为后续过程的输入。After the input samples pass through the global downsampling block, a low-level feature map with an output resolution of 1/4 of the original input will be generated as the input for the subsequent process.

进一步，所述步骤3.2轻量化双分支子网络包含两条分支，分别是获取抽象语义特征的主干深度分支和获取空间细节特征的空间保持分支，两分支共享全局下采样块输出的低级特征图；Further, the step 3.2 lightweight dual-branch sub-network includes two branches, which are the backbone depth branch for obtaining abstract semantic features and the space-preserving branch for obtaining spatial detail features, and the two branches share the low-level feature map output by the global downsampling block;

所述主干深度分支基于GhostNet特征提取网络构建，包含两个结构，其一是由16个Ghost瓶颈模块构成的分支主体结构，用以进行4次下采样过程，实现深层特征的提取；其二是轻量级特征金字塔，该结构由深度可分离卷积、上采样块、轻量化空洞空间池化金字塔模块以及元素融合四个部分构成，利用主体形成的4个不同尺度深层特征图作为输入，最终输出感受野增大且具有多尺度信息的抽象语义特征；The backbone depth branch is constructed based on the GhostNet feature extraction network, which includes two structures, one of which is a branch main structure composed of 16 Ghost bottleneck modules, which is used to perform 4 down-sampling processes to realize the extraction of deep features; the other is Lightweight feature pyramid, the structure is composed of four parts: depth separable convolution, upsampling block, lightweight hollow space pooling pyramid module and element fusion, using four deep feature maps of different scales formed by the main body as input, and finally Output abstract semantic features with increased receptive field and multi-scale information;

所述空间保持分支由3个深度可分离卷积构成，对于输入的低级特征实现1次下采样，输出的空间细节特征图其分辨率是输入的1/2。The space-preserving branch is composed of three depth-separable convolutions, which implements one downsampling for the input low-level features, and the resolution of the output spatial detail feature map is 1/2 of the input.

进一步，所述步骤3.3全局特征融合模块包含3个部分，其一是两个并行的卷积核为1*1的深度可分离卷积；其二是元素融合；其三是全局上下文模块；Further, the step 3.3 global feature fusion module includes 3 parts, one of which is two parallel convolution kernels with a depth separable convolution of 1*1; the second is element fusion; the third is the global context module;

输入的抽象语义特征和空间细节特征通过并行的两个卷积进行维度调整，再经过元素融合输出具有丰富空间细节和抽象语义信息的特征图，最后通过全局上下文模块，进行轻量级的上下文建模，使得最终的形成编码特征图F_E能更好的融合全局信息。The input abstract semantic features and spatial detail features are dimensionally adjusted through two parallel convolutions, and then a feature map with rich spatial details and abstract semantic information is output through element fusion. Finally, lightweight context construction is carried out through the global context module. Modulus, so that the final form of the encoded feature map F _E can better integrate the global information.

进一步，所述步骤4获取解码特征图F_D，首先将编码特征图F_E输入到解码器的轻量化边缘解耦模块中，进行边缘特征细化处理，生成具有精细化边缘的精细特征图；再将精细特征图输入到解码器的上采样模块中，进行上采样操作，将精细特征图恢复至原输入遥感图像大小，作为解码器输出的解码特征图F_D。Further, the step 4 obtains the decoded feature map F _D , first inputs the encoded feature map F _E into the lightweight edge decoupling module of the decoder, performs edge feature refinement processing, and generates a fine feature map with refined edges; Then input the fine feature map into the up-sampling module of the decoder, perform an up-sampling operation, restore the fine feature map to the size of the original input remote sensing image, and use it as the decoded feature map F _D output by the decoder.

进一步，所述轻量化边缘解耦模块由3部分组成，分别是轻量化空洞空间池化金字塔、主体特征生成器和边缘保持器；编码特征首先经过轻量化空洞空间池化金字塔生成具有多尺度信息以及更大感受野的特征图F_aspp，再经主体生成器为同一对象内部的像素生成更一致的特征表示，进而形成目标对象的主体特征图F_body；将F_body、F_aspp以及F_E输入至边缘保持器，经过显式减法操作、通道堆叠融合以及1*1常规卷积降维输出精细化边缘的特征图F_edge，最后将主体特征图和精细化边缘特征图融合，输出用于进行上采样恢复的精细输出特征图记作F_final；可以用下式表示整个过程：Further, the lightweight edge decoupling module consists of three parts, which are the lightweight hollow space pooling pyramid, the main feature generator and the edge retainer; the encoded features are first generated through the lightweight hollow space pooling pyramid with multi-scale information And the feature map F _aspp with a larger receptive field, and then generate a more consistent feature representation for the pixels inside the same object through the subject generator, and then form the subject feature map F _body of the target object; input F _body , F _aspp and F _E To the edge preserver, after explicit subtraction, channel stacking and fusion, and 1*1 conventional convolution dimensionality reduction, the feature map F _edge of the refined edge is output, and finally the main feature map and the refined edge feature map are fused, and the output is used for The fine output feature map restored by upsampling is denoted as F _final ; the whole process can be expressed by the following formula:

式中，f_dsaspp表示轻量化空洞空间池化金字塔函数，φ表示主体特征生成函数，

边缘保持函数；In the formula, f _dsaspp represents the lightweight empty space pooling pyramid function, φ represents the main feature generation function,

edge preserving function;

所述上采样模块包含两个步骤，分别是1*1常规卷积操作和上采样操作，精细特征图F_final经该模块输出后将恢复成具有原输入遥感图像大小的特征图，即解码器的输出特征图F_D。The up-sampling module includes two steps, which are 1*1 conventional convolution operation and up-sampling operation, and the fine feature map F _final will be restored to a feature map with the size of the original input remote sensing image after being output by this module, that is, the decoder The output feature map F _D of .

进一步，所述步骤5中的监督机制，解码特征图F_D经过分类器处理后完成了像素级分类预测，输出即为语义分割的结果，通过由语义分割结果与真实标签形成的监督机制对网络进行有监督训练，使语义分割网络达到最佳的分割性能。Further, in the supervision mechanism in step 5, the decoded feature map F _D is processed by the classifier to complete the pixel-level classification prediction, and the output is the result of semantic segmentation. Perform supervised training to make the semantic segmentation network achieve the best segmentation performance.

进一步，所述步骤5中的监督机制是一种基于边缘的监督方式，该机制通过设计的损失函数实现，总损失函数记作L，其式如下式所示：Further, the supervision mechanism in step 5 is an edge-based supervision method, which is realized by a designed loss function, and the total loss function is denoted as L, and its formula is shown in the following formula:

式中L_body、L_edge、L_final、L_G分别代表主体特征损失、边缘特征损失、精细特征损失、全局编码损失，4个损失函数的输入分别是：主体特征图、精细化边缘特征图、精细输出特征图以及编码特征图分别经过上采样恢复和SoftMax层后各自形成的分割结果与各自对应的真实标签；In the formula, L _body , L _edge , L _final , and L _G represent the main body feature loss, edge feature loss, fine feature loss, and global coding loss respectively. The inputs of the four loss functions are: main body feature map, refined edge feature map, The fine output feature map and the coded feature map respectively undergo upsampling restoration and SoftMax layer to form the segmentation results and their corresponding real labels;

其中损失函数L_edge是基于边缘预测部分获取边界边缘先验的综合损失函数，其包含了两个方面：其一是用于边界像素分类的二进制交叉熵损失L_bce，其二是场景中边缘部分的交叉熵损失L_bce，λ₁、λ₂、λ₃、λ₄、λ₅、λ₆代表超参数用于控制几个损失之间的权重。The loss function L _edge is a comprehensive loss function based on the edge prediction part to obtain the boundary edge prior, which includes two aspects: one is the binary cross entropy loss L _bce for boundary pixel classification, and the other is the edge part of the scene The cross-entropy loss L _bce , λ ₁ , λ ₂ , λ ₃ , λ ₄ , λ ₅ , and λ ₆ represent hyperparameters used to control the weight between several losses.

本发明方法具有以下优点：该方法能充分考虑到语义分割网络的总参数量、总计算量、还有大量冗余特征对分割效率和准确度的影响，以及充分考虑到目标本体与边缘的联系对于精细化分割结果的作用；The method of the present invention has the following advantages: the method can fully take into account the total parameter amount of the semantic segmentation network, the total amount of calculation, and the impact of a large number of redundant features on the segmentation efficiency and accuracy, and fully consider the relationship between the target ontology and the edge. The role of refined segmentation results;

第一、本发明结合了特征共享思想，基于全局上下文模块和Ghost瓶颈模块设计了全局下采样块，作为本方法提出语义分割网络中编码器的首个部分,有效的降低了网络早期提取低级特征的参数规模，减少了计算开销，并且在低级特征中更好的融合了全局上下文信息。First, the present invention combines the idea of feature sharing, and designs a global downsampling block based on the global context module and the Ghost bottleneck module. As the first part of the encoder in the semantic segmentation network proposed by this method, it effectively reduces the early extraction of low-level features in the network. The parameter scale reduces computational overhead and better integrates global context information in low-level features.

第二，本发明结合了双分支结构和基于全局上下文模块的全局特征融合方式，首先基于Ghost瓶颈模块和深度可分离卷积搭建了轻量化双分支子网络，显著地较低了特征提取阶段的参数规模，减小了计算的复杂度，使最后输出的编码特征包含了丰富的空间细节和抽象语义信息。其次双分支的输出特征通过基于全局上下文模块的全局特征融合方式进行融合，使最后输出的编码特征加深了对全局信息的理解，减少了网络对于弱特征信息的丢失。Second, the present invention combines the dual-branch structure and the global feature fusion method based on the global context module. First, a lightweight dual-branch sub-network is built based on the Ghost bottleneck module and depth-separable convolution, which significantly reduces the feature extraction stage. The parameter scale reduces the computational complexity, so that the final output encoding features contain rich spatial details and abstract semantic information. Secondly, the output features of the dual branches are fused through the global feature fusion method based on the global context module, so that the final output coding features can deepen the understanding of global information and reduce the loss of weak feature information in the network.

第三，本发明使用深度可分离卷积搭建了轻量化的边缘解耦模块，通过对目标对象主体和边缘建模，引入了物体本体和其边缘之间的联系；有效的改善了现有遥感图像语义分割算法在边缘处分割不细致的问题，提升了对于遥感图像中边缘细节的分割效果。Third, the present invention uses depth-separable convolution to build a lightweight edge decoupling module, and introduces the connection between the object body and its edge by modeling the target object body and edge; effectively improving the existing remote sensing The image semantic segmentation algorithm is not finely segmented at the edge, which improves the segmentation effect of edge details in remote sensing images.

附图说明Description of drawings

图1为本发明方法的流程图。Fig. 1 is the flowchart of the method of the present invention.

图2为本发明方法所搭建的语义分割网络结构示意图。Fig. 2 is a schematic diagram of the structure of the semantic segmentation network built by the method of the present invention.

图3为Ghost瓶颈模块的结构示意图。Fig. 3 is a schematic structural diagram of the Ghost bottleneck module.

图4为全局上下文模块结构示意图。Fig. 4 is a schematic diagram of the structure of the global context module.

图5为主干特征提取分支中轻量级特征金字塔结构示意图。Figure 5 is a schematic diagram of the lightweight feature pyramid structure in the main feature extraction branch.

图6为轻量化边缘解耦模块的结构示意图。Fig. 6 is a schematic structural diagram of a lightweight edge decoupling module.

图7为数据集中遥感图像与对应语义标签示例图。Figure 7 is an example diagram of remote sensing images and corresponding semantic labels in the dataset.

图8为本发明方法实施例中语义分割结果对比图。(其中(a)、(b)为输入样本和对应标签，(c)-(g)依次为Fast-SCNN、Sem-FPN、本发明方法、UNet、PSPNet的语义分割结果图)。Fig. 8 is a comparison diagram of semantic segmentation results in the method embodiment of the present invention. (where (a), (b) are input samples and corresponding labels, and (c)-(g) are the semantic segmentation results of Fast-SCNN, Sem-FPN, the method of the present invention, UNet, and PSPNet in turn).

具体实施方式Detailed ways

为了更好地了解本发明的目的、结构及功能，下面结合附图，对本发明一种基于边缘解耦的遥感图像轻量化语义分割方法做进一步详细的描述。In order to better understand the purpose, structure and function of the present invention, a method for lightweight semantic segmentation of remote sensing images based on edge decoupling in the present invention will be further described in detail in conjunction with the accompanying drawings.

如图1所示，本发明设计了一种基于边缘解耦的遥感图像轻量化语义分割方法应用于高分辨率遥感图像，使得保证精度的同时细化边缘分割效果，并大幅度提升分割效率。As shown in Figure 1, the present invention designs a remote sensing image lightweight semantic segmentation method based on edge decoupling and applies it to high-resolution remote sensing images, so that the edge segmentation effect can be refined while ensuring accuracy, and the segmentation efficiency can be greatly improved.

如图1所示，本发明涉及了一种基于边缘解耦的遥感图像轻量化语义分割方法，具体包括如下步骤：As shown in Figure 1, the present invention relates to a remote sensing image lightweight semantic segmentation method based on edge decoupling, which specifically includes the following steps:

首先获取含有语义标注的高分辨率遥感图像数据集，将标签和数据对应实施裁剪，裁剪方式是以512*512分辨率的固定窗口，按照0.75的覆盖率进行滑动步长为384的滑窗裁剪。将裁剪好的新数据和新标签对应，利用旋转、色彩增强等方式进行数据增广，充足的样本能够有效的减弱过拟合的影响。最后按照4:1的比例进行训练集和测试集的样本划分；First, obtain a high-resolution remote sensing image dataset with semantic annotations, and then cut the labels and data correspondingly. The cutting method is a fixed window with a resolution of 512*512, and the sliding window is cut with a sliding step size of 384 according to a coverage rate of 0.75. . Correspond the cropped new data with the new label, and use methods such as rotation and color enhancement to augment the data. Sufficient samples can effectively reduce the impact of overfitting. Finally, the samples of the training set and the test set are divided according to the ratio of 4:1;

本发明搭建的语义分割网络，其结构如图2所示。该网络是一个具有双分支结构的轻量级编解码语义分割网络，包含了编码器、解码器、分类器三个部分。其中编码器是一个双分支结构，含有一个全局下采样块、一个轻量化双分支子网络、一个全局特征融合模块；解码器由一个轻量化边缘解耦模块和一个上采样模块构成。分类器由常规卷积层和SoftMax层构成；The structure of the semantic segmentation network built by the present invention is shown in FIG. 2 . The network is a lightweight codec semantic segmentation network with a dual-branch structure, which includes three parts: encoder, decoder, and classifier. The encoder is a dual-branch structure, which contains a global downsampling block, a lightweight dual-branch subnetwork, and a global feature fusion module; the decoder consists of a lightweight edge decoupling module and an upsampling module. The classifier consists of regular convolutional layers and SoftMax layers;

步骤3、将训练样本输入到编码器中，通过特征提取进行特征编码，获取编码特征图F_E；具体将经历如下三个子步骤：Step 3. Input the training samples into the encoder, perform feature encoding through feature extraction, and obtain the encoded feature map F _E ; specifically, it will go through the following three sub-steps:

将获取的训练样本输入到网络的编码器中，输入样本的尺度为512*512，首先经过编码器中的全局下采样块。该模块由3个部分构成，其一是常规卷积，其二是1个Ghost瓶颈模块，其三是全局上下文模块。经全局下采样模块输出的低级特征图更好的融合了全局上下文信息，且还包含丰富的空间细节信息；Input the obtained training samples into the encoder of the network, the scale of the input samples is 512*512, and firstly pass through the global downsampling block in the encoder. The module consists of three parts, one is conventional convolution, the other is a Ghost bottleneck module, and the third is the global context module. The low-level feature map output by the global downsampling module better integrates the global context information, and also contains rich spatial detail information;

常规卷积是一个卷积核为3*3、步长为2并带有批归一化层和ReLU激活层的卷积块，训练样本经过该部分下采样后输出分辨率为256*256的特征图；The conventional convolution is a convolution block with a convolution kernel of 3*3, a step size of 2, and a batch normalization layer and a ReLU activation layer. After the training sample is down-sampled by this part, the output resolution is 256*256. feature map;

Ghost瓶颈模块是一个轻量级模块，源于GhostNet网络，由Ghost模块构成，能够用更少的参数生成维度更深的特征图，其结构如图3所示。根据步长不同该模块的结构也不同，步长为1时结构中含有两层步长为1的Ghost模块，步长为2时两层Ghost模块中夹杂着一个步长为2的逐通道卷积。在全局下采样块中该模块的步长为2，实现第二次下采样进一步降低特征图的分辨率。经该模块后输出特征图的分辨率为128*128；The Ghost bottleneck module is a lightweight module derived from the GhostNet network. It is composed of Ghost modules and can generate feature maps with deeper dimensions with fewer parameters. Its structure is shown in Figure 3. The structure of the module is different according to the step size. When the step size is 1, the structure contains two layers of Ghost modules with a step size of 1. When the step size is 2, the two layers of Ghost modules are mixed with a channel-by-channel volume with a step size of 2. product. In the global downsampling block, the step size of this module is 2, and the second downsampling is implemented to further reduce the resolution of the feature map. The resolution of the output feature map after this module is 128*128;

全局上下文模块，其结构如图4所示，包含3个过程：其一是用于上下文建模的全局注意力集中机制；采用1*1常规卷积和SoftMax层获取输入特征图的自注意力权重,然后对输入特征图进行注意力集中操作获取全局背景特征图；其二是特征转换获取通道依赖性；该部分由两个1*1卷积层构成，两者间通过批归一化层和ReLu激活函数连接；其三是元素融合；通过元素融合将原输入特征图与获取通道依赖后的特征图融合，将全局上下文特征聚合到每个位置的特征上。经该模块后输出特征图与输入特征图大小保持一致，因此最后形成的低级特征图其尺度为128*128；The global context module, whose structure is shown in Figure 4, includes three processes: one is the global attention concentration mechanism for context modeling; the self-attention of the input feature map is obtained by using 1*1 conventional convolution and SoftMax layer Weight, and then focus on the input feature map to obtain the global background feature map; the second is feature conversion to obtain channel dependence; this part consists of two 1*1 convolutional layers, and the batch normalization layer is passed between them It is connected with the ReLu activation function; the third is element fusion; through element fusion, the original input feature map is fused with the feature map after obtaining channel dependencies, and the global context features are aggregated to the features of each position. After this module, the size of the output feature map is consistent with the size of the input feature map, so the scale of the final low-level feature map is 128*128;

步骤3.2将低级特征图输入至编码器中的轻量化双分支子网络中，获取空间细节特征图和抽象语义特征图；Step 3.2 Input the low-level feature map into the lightweight dual-branch sub-network in the encoder to obtain the spatial detail feature map and the abstract semantic feature map;

该双分支子网络包含了两条分支，分别是获取抽象语义特征的主干深度分支和获取空间细节特征的空间保持分支。两分支共享全局下采样块输出的低级特征，相比传统双分支网络减少了一条输入之路，降低了网络早期提取低级特征时的参数规模和计算开销；The dual-branch sub-network consists of two branches, namely the backbone depth branch for obtaining abstract semantic features and the space-preserving branch for obtaining spatial detail features. The two branches share the low-level features output by the global downsampling block, which reduces the input path compared to the traditional dual-branch network, and reduces the parameter scale and computational overhead when extracting low-level features in the early stage of the network;

主干深度分支基于GhostNet网络构建，该网络的主体部分包含16个Ghost瓶颈模块，用以进行4次下采样过程，实现深层特征的提取。本方法保留了GhostNet中的16个Ghost瓶颈模块，变成一个全卷积网络作为主干深度分支的主体。输入的低级特征图该经过该分支处理，最终将产生4个尺度分别为64*64、32*32、16*16、8*8的深度特征图。4个尺度代表4个阶段，每个阶段对应的Ghost瓶颈模块的数量分别为：[3，2，6，5]，对应的卷积核大小分别为：[3，5，3，5]。由于实现了4次下采样，为此每个阶段都有一个Ghost瓶颈模块其步长为2。The backbone depth branch is built based on the GhostNet network. The main part of the network contains 16 Ghost bottleneck modules, which are used to perform 4 down-sampling processes to realize the extraction of deep features. This method retains the 16 Ghost bottleneck modules in GhostNet and becomes a fully convolutional network as the main body of the backbone depth branch. The input low-level feature map should be processed by this branch, and finally four depth feature maps with scales of 64*64, 32*32, 16*16, and 8*8 will be generated. The 4 scales represent 4 stages. The number of Ghost bottleneck modules corresponding to each stage are: [3, 2, 6, 5], and the corresponding convolution kernel sizes are: [3, 5, 3, 5]. Since 4 times of downsampling is implemented, each stage has a Ghost bottleneck module with a step size of 2.

同时为了获取丰富的抽象语义特征，本方法结合深度可分离卷积、上采样块以及轻量化空洞空间池化金字塔模块，利用4个特征图搭建轻量级特征金字塔，其结构如图5所示。对新生成的4个层级联系紧密，感受野增大且具有多尺度信息的特征图上采样至64*64尺度，在经过元素融合形成最终的抽象语义特征组为主干深度分支的输出；At the same time, in order to obtain rich abstract semantic features, this method combines depthwise separable convolution, upsampling block and lightweight empty space pooling pyramid module, and uses 4 feature maps to build a lightweight feature pyramid. Its structure is shown in Figure 5 . The newly generated 4 levels are closely connected, the receptive field is enlarged and the feature map with multi-scale information is sampled to the 64*64 scale, and the final abstract semantic feature group is formed after element fusion as the output of the main depth branch;

空间保持分支由3个深度可分离卷积构成，三者的卷积核大小均为3*3，步长分别为[1，2，1]。可对输入的低级特征图实现1次下采样，输出的空间细节特征图其分辨率是64*64；该分支以较少的参数量以及较小的计算开销来保留了输入图像的空间尺度，能够编码丰富的空间信息；The space-preserving branch is composed of three depth-separable convolutions. The convolution kernel size of the three is 3*3, and the step size is [1, 2, 1]. The input low-level feature map can be down-sampled once, and the resolution of the output spatial detail feature map is 64*64; this branch preserves the spatial scale of the input image with fewer parameters and less computational overhead. Capable of encoding rich spatial information;

步骤3.3将获取的空间细节特征图和抽象语义特征图，通过编码器的全局特征融合块，进行多级特征融合，输出编码特征图F_E；Step 3.3 performs multi-level feature fusion on the obtained spatial detail feature map and abstract semantic feature map through the global feature fusion block of the encoder, and outputs the encoded feature map F _E ;

其中全局特征融合模块包含3个部分，其一是两个并行的卷积核为1*1的深度可分离卷积；其二是元素融合；其三是全局上下文模块；来自双分支子网络的抽象语义特征和空间细节特征通过并行的两个1*1卷积进行维度调整，在经过元素融合输出具有丰富空间细节和抽象语义信息的特征图。最后通过全局上下文模块，进行轻量级的上下文建模，使得最终的形成编码特征图F_E能更好的融合全局信息。由于未包含下采样过程，因此输出的编码特征图与输入大小保持一致；Among them, the global feature fusion module consists of three parts, one is the depth separable convolution with two parallel convolution kernels of 1*1; the other is element fusion; the third is the global context module; from the dual-branch sub-network Abstract semantic features and spatial detail features are dimensionally adjusted through two parallel 1*1 convolutions, and after element fusion, a feature map with rich spatial details and abstract semantic information is output. Finally, through the global context module, lightweight context modeling is performed, so that the final encoded feature map F _E can better integrate global information. Since the downsampling process is not included, the output encoded feature map remains the same size as the input;

经过上述步骤编码器最终生成一个尺度为64*64的编码特征图，用作后续过程的输入；After the above steps, the encoder finally generates a coded feature map with a scale of 64*64, which is used as input for the subsequent process;

解码器由两个模块构成，分别是边缘解耦模块和上采样模块。其中轻量化边缘解耦模块包含3部分，结构如图6所示。分别是轻量化空洞空间池化金字塔、主体特征生成器、边缘保持器；获取解码特征图的具体过程为：首先将编码特征图F_E输入到解码器的轻量化边缘解耦模块中，进行边缘特征细化处理，生成具有精细化边缘的精细特征图；再将精细特征图输入到解码器的上采样模块中，进行上采样操作，将精细特征图恢复至原输入遥感图像大小，作为解码器输出的解码特征图F_D；The decoder consists of two modules, the edge decoupling module and the upsampling module. The lightweight edge decoupling module consists of three parts, the structure of which is shown in Figure 6. They are the lightweight empty space pooling pyramid, the main feature generator, and the edge retainer; the specific process of obtaining the decoded feature map is as follows: first, the encoded feature map F _E is input into the lightweight edge decoupling module of the decoder, and the edge Feature refinement processing to generate a fine feature map with refined edges; then input the fine feature map to the upsampling module of the decoder, perform an upsampling operation, restore the fine feature map to the size of the original input remote sensing image, and use it as a decoder The output decoded feature map F _D ;

其中主体特征生成器包含流场生成和特征变形两个过程，流场生成由一个包含一次下采样和一次上采样的微型编解码结构以及一个卷积核为3*3的常规卷积构成，用于生成目标对象中心部分特征突出的流场特征表示。特征变形则是通过对流场特征进行变形操作获取目标对象显著的主体特征表示；因此主体特征生成器负责为同一对象内的像素生成更一致的特征表示，提取出的是目标对象的主体特征；The main body feature generator includes two processes of flow field generation and feature deformation. The flow field generation is composed of a micro codec structure including one downsampling and one upsampling and a conventional convolution with a convolution kernel of 3*3. It is used to generate a flow field feature representation with prominent features in the center of the target object. Feature deformation is to obtain the significant main feature representation of the target object by performing deformation operations on the flow field features; therefore, the main feature generator is responsible for generating a more consistent feature representation for the pixels in the same object, and extracts the main features of the target object;

边缘保持器包含两个步骤，第一是减法器用于将经过感受野扩大的编码特征图与主体特征图进行显示减法操作，获得粗略的边缘特征图；第二是边缘特征细化器，利用含有精细细节的低层特征图对边缘特征进行补充。具体为将来自编码器的低级特征图对产生的粗略边缘特征图通过通道堆叠进行特征融合，借此补充了高频信息。再经过一个1*1常规卷积进行降维，从而输出精细化边缘的特征图；The edge preserver consists of two steps. The first is the subtractor, which is used to subtract the encoded feature map after the receptive field expansion from the subject feature map to obtain a rough edge feature map; the second is the edge feature refiner, which uses Edge features are complemented by low-level feature maps of fine details. Specifically, the rough edge feature maps generated from the low-level feature map pairs from the encoder are combined for feature fusion through channel stacking, thereby supplementing high-frequency information. After a 1*1 conventional convolution for dimensionality reduction, the feature map of the refined edge is output;

输入编码特征F_E，首先经过轻量化空洞空间池化金字塔生成具有多尺度信息以及更大感受野的特征图F_aspp，再经主体生成器形成目标的主体特征图F_body；将F_body、F_aspp以及F_E输入至边缘保持器中，先将F_body和F_E进行显式减法操作生成初步的边缘特征图，然后将该初步的边缘特征图与F_E进行通道堆叠融合，再经过1*1常规卷积操作降维处理输出具有精细化边缘的边缘特征图F_edge。最后F_body和F_edge进行元素融合，得到用于进行上采样恢复的精细输出特征图F_final；整个过程可以用下式表示：Input the encoded feature F _E , first generate a feature map F _aspp with multi-scale information and a larger receptive field through a lightweight empty space pooling pyramid, and then form the subject feature map F _body of the target through the subject generator; F _body , F _Aspp and F _E are input into the edge retainer. First, the F _body and F _E are explicitly subtracted to generate a preliminary edge feature map, and then the preliminary edge feature map is channel-stacked and fused with F _E , and then after 1* 1 Conventional convolution operation dimensionality reduction processing outputs an edge feature map F _edge with refined edges. Finally, elements of F _body and F _edge are fused to obtain a fine output feature map F _final for upsampling recovery; the whole process can be expressed by the following formula:

edge preserving function;

为得到最终的解码特征图F_D,将F_final输入到包含1*1常规卷积和上采样操作的上采样模块中进行处理，恢复至原输入图像的大小，最后生成的特征图即为解码特征图F_D，其尺度为512*512；In order to obtain the final decoded feature map F _D , input F _final into the upsampling module including 1*1 conventional convolution and upsampling operations for processing, and restore it to the size of the original input image, and the final generated feature map is the decoding Feature map F _D , whose scale is 512*512;

分类器主体为一个SoftMax层，解码特征图F_D经过SoftMax层处理后完成了像素级别的分类预测，得到了语义分割的结果。通过由分割结果与真实标签形成的监督机制对网络的训练形成监督，使语义分割网络达到最佳的分割性能；The main body of the classifier is a SoftMax layer, and the decoded feature map F _D is processed by the SoftMax layer to complete the pixel-level classification prediction and obtain the result of semantic segmentation. The training of the network is supervised by the supervision mechanism formed by the segmentation result and the real label, so that the semantic segmentation network can achieve the best segmentation performance;

该监督机制并非只监督最终的分割结果，而是对F_body、F_edge、F_final、F_E四个部分共同进行监督。该机制通过设计的损失函数实现，总损失函数记作L，其式如下式所示：The supervision mechanism does not only supervise the final segmentation result, but jointly supervises the four parts of F _body , F _edge , F _final , and F _E . This mechanism is realized through the designed loss function, and the total loss function is denoted as L, and its formula is as follows:

式中L_body、L_edge、L_final、L_G分别代表主体特征损失、边缘特征损失、精细特征损失、全局编码损失。其中L_final和L_G采用的是语义分割任务中常见的交叉熵损失函数。L_body采取的是边界松弛损失该损失，在训练过程中可使F_body能够放松对边界像素的分类，允许分割网络将边界像素点预测为多个类别。L_edge是基于边缘预测部分获取边界边缘先验的综合损失函数,其包含了两个方面：其一是用于边界像素分类的二进制交叉熵损失L_bce。其二是场景中边缘部分的交叉熵损失L_ce。λ₁、λ₂、λ₃、λ₄、λ₅、λ₆代表超参数用于控制几个损失之间的权重。前三个默认值为1，后三个分别为0.4、20、1。

代表真实语义标签，

代表由

生成的二进制掩码，b代表边界预测结果，s_body、s_final和s_E代表从F_body、F_edge、F_E中获取的分割图预测结果；In the formula, L _body , L _edge , L _final , and L _G represent body feature loss, edge feature loss, fine feature loss, and global coding loss, respectively. Among them, L _final and L _G use the common cross-entropy loss function in semantic segmentation tasks. The L _body adopts the boundary relaxation loss, which enables the F _body to relax the classification of boundary pixels during the training process, allowing the segmentation network to predict the boundary pixel points into multiple categories. L _edge is a comprehensive loss function based on the edge prediction part to obtain the boundary edge prior, which includes two aspects: one is the binary cross entropy loss L _bce for boundary pixel classification. The second is the cross-entropy loss L _ce of the edge part in the scene. λ ₁ , λ ₂ , λ ₃ , λ ₄ , λ ₅ , and λ ₆ represent hyperparameters used to control the weight between several losses. The default value of the first three is 1, and the last three are 0.4, 20, and 1 respectively.

represents the real semantic label,

represented by

The generated binary mask, b represents the boundary prediction result, s _body , s _final and s _E represent the segmentation map prediction results obtained from F _body , F _edge , and F _E ;

依据上述过程，搭建好语义分割网络后，不断输入训练样本按照步骤(3-5)对网络进行训练；在训练前还需设定好如网络的输入尺度、输入样本批量、学习率等相关训练参数。According to the above process, after building the semantic segmentation network, continuously input training samples to train the network according to steps (3-5); before training, it is necessary to set up related training such as network input scale, input sample batch, learning rate, etc. parameter.

步骤7、将待测试样本输入到训练好的语义分割网络中，输出最终的遥感图像语义分割结果，完成语义分割网络的测试；Step 7. Input the sample to be tested into the trained semantic segmentation network, output the final remote sensing image semantic segmentation result, and complete the test of the semantic segmentation network;

下面给出具体实例实验结果，该实例实验并不用以限制本发明方法的使用，仅为表现较佳的一种实例，以供分析。The experimental results of specific examples are given below, which are not intended to limit the use of the method of the present invention, but are only an example with better performance for analysis.

实验采用ISPRS提供的Vaihingen数据集，它包含3个通道的IRRG图像、DSM图像和NDSM图像。16张6000*6000大小的遥感图像以及对应标签。对应的可视化结果如图7所示。标签中包含了6类目标各类的对应的语义标注由RGB数值确定，具体如下表1所示:The experiment uses the Vaihingen dataset provided by ISPRS, which contains three channels of IRRG images, DSM images and NDSM images. 16 remote sensing images of size 6000*6000 and corresponding labels. The corresponding visualization results are shown in Figure 7. The label contains 6 types of targets, and the corresponding semantic annotations of each type are determined by the RGB value, as shown in Table 1 below:

表1语义标注信息表Table 1 Semantic annotation information table

本实施实例中，将上述数据集，按照步骤1中描述的滑窗裁剪方式以及数据增广进行预处理，所得数据均为512*512*3的多通道图。再按照4:1的比例划分为训练样本和测试样本；In this implementation example, the above data set is preprocessed according to the sliding window clipping method and data augmentation described in step 1, and the obtained data are all multi-channel images of 512*512*3. Then it is divided into training samples and test samples according to the ratio of 4:1;

接着搭建好本方法的语义分割网络，在训练前需要设置好相关参数。网络的输入尺度为512*512，将输入批量设为10(依据显存而定)，优化器采用SGD优化器，初始学习率设定为0.001，最小学习率设定为0.00001，动量设定为0.9，权值衰减系数设定为0.0005。Then build the semantic segmentation network of this method, and set the relevant parameters before training. The input scale of the network is 512*512, the input batch is set to 10 (depending on the video memory), the optimizer uses the SGD optimizer, the initial learning rate is set to 0.001, the minimum learning rate is set to 0.00001, and the momentum is set to 0.9 , and the weight decay coefficient is set to 0.0005.

本实施实例中，选取的语义分割评估指标为平均像素交并比(mIoU)、平均像素精度(mAcc)、GFLOPs(浮点计算量)、参数量以及单幅图像的分割推理时间。选取了4中语义分割方法，用于从分割精度和效率两方面对比本方法，4中方法分别为：UNet、PSPNet、Fast_SCNN、Sem-FPN。利用mIoU和mAcc作为衡量语义分割准确度的标准。二者越高代表分割结果与真实标签越接近，语义分割精度也就越高。GFLOPs、参数量和单张图像分割推理时间作为语义分割效率的标准，GFLOPs、参数量和推理时间越小代表分割效率越高。不同语义分割方法的实验结果如表所2示：In this implementation example, the selected evaluation indicators for semantic segmentation are average pixel intersection-over-union ratio (mIoU), average pixel accuracy (mAcc), GFLOPs (floating-point calculation amount), parameter amount, and segmentation reasoning time of a single image. Four semantic segmentation methods were selected to compare this method in terms of segmentation accuracy and efficiency. The four methods are: UNet, PSPNet, Fast_SCNN, and Sem-FPN. Utilize mIoU and mAcc as the standard to measure the accuracy of semantic segmentation. The higher the two, the closer the segmentation result is to the real label, and the higher the semantic segmentation accuracy. GFLOPs, parameter amount, and single image segmentation inference time are used as the criteria for semantic segmentation efficiency. The smaller the GFLOPs, parameter amount, and inference time, the higher the segmentation efficiency. The experimental results of different semantic segmentation methods are shown in Table 2:

表2本方法与现有方法对比Table 2 Comparison between this method and existing methods

方法method mIoU(％)mIoU(%) mAcc(％)mAcc(%) GFLOPsGFLOPs 参数量(M)Parameter amount (M) 分割推理时间(s)Split inference time (s) UNetUNet 86.1986.19 91.1691.16 203.04203.04 29.0629.06 0.0670.067 PSPNetPSPNet 86.4086.40 92.1992.19 178.48178.48 48.9848.98 0.0660.066 Fast-SCNNFast-SCNN 76.2376.23 83.8383.83 0.910.91 1.211.21 0.0150.015 Sem-FPNSem-FPN 83.5783.57 90.9190.91 45.4845.48 28.5028.50 0.0290.029 本方法This method 85.3385.33 90.9890.98 6.636.63 4.174.17 0.0310.031

从表2中结果可知，本方法取得了89.42％的mIoU和93.15％的mAcc，GFlOPs为6.9，参数量为4.1M，单张图像分割推理速度为0.031s。相比Fast-SCNN，尽管其参数量最少，浮点计算量最小，推理时间最短，但是精度远低于本方法；相比Sem-FPN，虽然本方法在推理时间上略显劣势，但是在mIoU和mAcc上均高于Sem-FPN，并且Sem-FPN的参数量和GFLOPs远高于本方法。UNet和PSPNet均是经典的语义分割网络，相比之下本方法在精度上不如二者，然而UNet和PSPNet在参数量、GFLOPs和推理时间上均是本方法的数倍。因此综合语义分割的精度和效率来看，本方法提出的语义分割网络优于其他语义分割网络。并且从参数量、GFLOPs以及推理速度的大小来看，验证了本发明是一种面向遥感图像的轻量化语义分割方法；From the results in Table 2, it can be seen that this method achieves 89.42% mIoU and 93.15% mAcc, GFlOPs is 6.9, the number of parameters is 4.1M, and the single image segmentation inference speed is 0.031s. Compared with Fast-SCNN, although it has the least amount of parameters, the least amount of floating-point calculations, and the shortest inference time, its accuracy is much lower than this method; compared with Sem-FPN, although this method is slightly inferior in inference time, but in mIoU and mAcc are higher than Sem-FPN, and Sem-FPN's parameters and GFLOPs are much higher than this method. Both UNet and PSPNet are classic semantic segmentation networks. In comparison, this method is not as accurate as the two. However, UNet and PSPNet are several times faster than this method in terms of parameter quantity, GFLOPs and inference time. Therefore, considering the accuracy and efficiency of semantic segmentation, the semantic segmentation network proposed by this method is superior to other semantic segmentation networks. And from the perspective of the amount of parameters, GFLOPs and inference speed, it is verified that the present invention is a lightweight semantic segmentation method for remote sensing images;

图8为测试样本输入后，得到的可视化语义分割结果。本方法相比Fast-SCNN、Sem-FPN两个方法得到的语义分割结果，在像素的分类上更为精确，有效的改善了由于错误分类导致分割错误的情况；并且本方法在边缘细节的处理上更为准确，更接近语义标签的真实结果。相比UNet和PSPNet，虽然在整体的分割精度上存在不足，但是本方法在边缘细节处的分割上更为接近语义标签。Figure 8 shows the visual semantic segmentation results obtained after the test samples are input. Compared with the semantic segmentation results obtained by the Fast-SCNN and Sem-FPN methods, this method is more accurate in pixel classification, and effectively improves the segmentation error caused by misclassification; and this method can handle edge details It is more accurate and closer to the real result of semantic labeling. Compared with UNet and PSPNet, although the overall segmentation accuracy is insufficient, this method is closer to the semantic label in the segmentation of edge details.

可以理解，本发明是通过一些实施例进行描述的，本领域技术人员知悉的，在不脱离本发明的精神和范围的情况下，可以对这些特征和实施例进行各种改变或等效替换。另外，在本发明的教导下，可以对这些特征和实施例进行修改以适应具体的情况及材料而不会脱离本发明的精神和范围。因此，本发明不受此处所公开的具体实施例的限制，所有落入本申请的权利要求范围内的实施例都属于本发明所保护的范围内。It can be understood that the present invention is described through some embodiments, and those skilled in the art know that various changes or equivalent substitutions can be made to these features and embodiments without departing from the spirit and scope of the present invention. In addition, the features and examples may be modified to adapt a particular situation and material to the teachings of the invention without departing from the spirit and scope of the invention. Therefore, the present invention is not limited by the specific embodiments disclosed here, and all embodiments falling within the scope of the claims of the present application belong to the protection scope of the present invention.

Claims

1. A remote sensing image lightweight semantic segmentation method based on edge decoupling is characterized by comprising the steps of building, training and testing a semantic segmentation network, wherein the semantic segmentation network is a lightweight coding and decoding network with a double-branch structure, after training of the semantic segmentation network is completed based on a training sample, a remote sensing image to be tested is input into the semantic segmentation network, and a final remote sensing image semantic segmentation result is output;

the method comprises the following steps in sequence:

step 1, acquiring a remote sensing image data set, and preparing a training and testing sample;

step 2, building a lightweight coding and decoding semantic segmentation network with a double-branch structure;

step 2, building a lightweight coding and decoding semantic segmentation network with a double-branch structure, wherein the network comprises an encoder, a decoder and a classifier;

the encoder is of a double-branch structure and comprises a global downsampling block, a lightweight double-branch sub-network and a global feature fusion module;

the decoder consists of a lightweight edge decoupling module and an up-sampling module;

the classifier is composed of a conventional convolutional layer and a SoftMax layer;

step 3, inputting the training samples into an encoder, performing feature encoding through feature extraction, and obtaining an encoding feature map F _E ；

Obtaining a coding feature map F in the step 3 _E The method comprises the following steps:

step 3.1, inputting the training sample into a global downsampling block of an encoder to obtain a low-level feature map;

the global downsampling block in the step 3.1 consists of 3 parts, wherein one part is 1 conventional convolution, the other part is 1 Ghost bottleneck module, and the third part is 1 global context module;

after an input sample passes through a global downsampling block, a low-level feature map with the output resolution being 1/4 of the original input is generated and used as the input of the subsequent process;

step 3.2, inputting the low-level feature map into a lightweight double-branch sub-network in an encoder to obtain a space detail feature map and an abstract semantic feature map;

step 3.3, the obtained space detail feature map and the abstract semantic feature map are subjected to multi-level feature fusion through a global feature fusion block of the encoder, and an encoding feature map F is output _E ；

Step 4, encoding feature map F _E Inputting the data into a decoder, performing edge feature refinement processing and up-sampling operation to obtain a decoded feature map F _D ；

Step 5, inputting the decoding characteristic graph into a classifier to perform pixel-level classification prediction, outputting a segmentation result, and performing supervised training on the semantic segmentation network through a supervision mechanism;

step 6, training the semantic segmentation network built in the step 2 by using the training sample according to the step (3-5);

and 7, inputting the sample to be tested into the trained semantic segmentation network, outputting a final remote sensing image semantic segmentation result, and completing the test of the semantic segmentation network.

2. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the light-weight double-branch sub-network in step 3.2 comprises two branches, namely a trunk depth branch for obtaining abstract semantic features and a space preservation branch for obtaining space detail features, and the two branches share a low-level feature map output by a global downsampling block;

the trunk deep branch is constructed based on a GhostNet feature extraction network and comprises two structures, wherein one structure is a branch main body structure consisting of 16 GhostNet bottleneck modules and is used for carrying out a downsampling process for 4 times to realize the extraction of deep features; the structure comprises four parts of depth separable convolution, an up-sampling block, a lightweight cavity space pooling pyramid module and element fusion, 4 deep feature maps with different scales formed by a main body are used as input, and finally, abstract semantic features with enlarged receptive field and multi-scale information are output;

the spatial preservation branch is composed of 3 depth separable convolutions, down-sampling is performed 1 time for the input low-level feature map, and the resolution of the output spatial detail feature map is 1/2 of the input.

3. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the step 3.3 global feature fusion module comprises 3 parts, one of which is a depth separable convolution with two parallel convolution kernels of 1 x 1; second, element fusion; the third is 1 global context module;

performing dimension adjustment on the input abstract semantic features and space detail features through two parallel convolutions, outputting feature graphs with rich space details and abstract semantic information through element fusion, and finally performing lightweight context modeling through a global context module to finally form a coding feature graph F _E The global information can be better fused.

4. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the step 4 obtains a decoding feature map F _D Firstly, the coding feature map F _E Inputting the data into a lightweight edge decoupling module of a decoder, and performing edge feature refinement processing to generate a fine feature map with refined edges; inputting the fine characteristic diagram into an up-sampling module of a decoder, performing up-sampling operation, restoring the fine characteristic diagram to the size of the original input remote sensing image, and using the fine characteristic diagram as a decoding characteristic diagram F output by the decoder _D 。

5. The remote sensing image light-weight semantic segmentation method based on the edge decoupling as claimed in claim 1, wherein the light-weight edge decoupling module is composed of 3 parts, namely a light-weight cavity space pooling pyramid, a main body feature generator and an edge retainer; firstly, the coding characteristics are subjected to light-weight cavity space pooling pyramid to generate a characteristic diagram F with multi-scale information and a larger receptive field _aspp Then, more consistent feature representation is generated for pixels in the same object through a main body generator, and further a main body feature graph F of the target object is formed _body (ii) a F is to be _body 、F _aspp And F _E Inputting the data into an edge holder, and outputting a feature map F of a refined edge through explicit subtraction operation, channel stack fusion and 1 x 1 conventional convolution dimensionality reduction _edge Finally, the main body characteristic diagram and the refined edge characteristic diagram are fused, and a refined output characteristic diagram F for carrying out up-sampling recovery is output _final (ii) a The whole process can be represented by the following formula:

in the formula, f _dsaspp Representing a lightweight void space pooling pyramid function, phi representing a principal feature generating function,

an edge preservation function;

the up-sampling module comprises two steps of 1 × 1 conventional convolution operation and up-sampling operation, and a fine feature map F _final After being output by the module, the characteristic diagram is restored to have the size of the original input remote sensing image, namely the output characteristic diagram F of the decoder _D 。

6. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the supervision mechanism in step 5 is decoding feature map F _D After the classification of the pixel level is completed by the classifier, the output is the result of semantic segmentation, and the network is supervised and trained by a supervision mechanism formed by the result of semantic segmentation and a real label, so that the semantic segmentation network achieves the best segmentation performance.

7. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the supervision mechanism in step 5 is an edge-based supervision mode, the mechanism is realized by a designed loss function, the total loss function is denoted as L, and the formula is shown as follows:

in the formula L _body 、L _edge 、L _final 、L _G Respectively representing the main feature loss, the edge feature loss, the fine feature loss and the global coding loss, and the input of the 4 loss functions is respectively: respectively forming segmentation results and corresponding real labels of the segmentation results after the main body characteristic diagram, the refined edge characteristic diagram, the refined output characteristic diagram and the coding characteristic diagram are subjected to up-sampling recovery and a SoftMax layer;

wherein the loss function L _edge The method is based on the edge prediction part to obtain the comprehensive loss function of the boundary edge prior, and comprises two aspects: one is binary cross-entropy loss for boundary pixel classificationL _bce And secondly, the cross entropy loss L of the edge part in the scene _bce ，λ ₁ 、λ ₂ 、λ ₃ 、λ ₄ 、λ ₅ 、λ ₆ The representative hyperparameter is used to control the weighting between several losses.