CN116703735A

CN116703735A - Image processing method, model training method and device, equipment and storage medium

Info

Publication number: CN116703735A
Application number: CN202210177692.1A
Authority: CN
Inventors: 夏红蕊
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-09-05

Abstract

The present disclosure relates to an image processing method, an image processing model training method and apparatus, an electronic device, and a storage medium, where the image processing method may include: performing multi-scale feature coding on the first image to obtain N feature images with different sizes; wherein N is a positive integer equal to or greater than 2; performing local attention mechanism processing on the nth decoding graph, and then performing upsampling to obtain an nth upsampling graph; wherein N is a positive integer less than N; the 0 th decoding diagram is determined according to the N-1 th characteristic diagram; the N-1 th feature map is the last generated feature map in the N feature maps with different sizes; the N-N feature map and the N up-sampling map are fused based on a global attention mechanism to obtain an n+1 decoding map; and obtaining a second image based on the N-1 decoding graph, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than that of the first image.

Description

Image processing method, model training method and device, equipment and storage medium

技术领域technical field

本公开涉及信息技术领域，尤其涉及一种图像处理方法、图像处理模型训练方法及装置、电子设备及存储介质。The present disclosure relates to the field of information technology, and in particular to an image processing method, an image processing model training method and device, electronic equipment, and a storage medium.

背景技术Background technique

显示屏幕性能的提升给人们带来了极佳的视觉感受，然而质量较差的图像或者视频源会影响最终的显示效果。在实际生产生活中，受到场景条件和传输过程的影响，拍摄出的部分图像不仅含有较大的噪声，且过曝光和欠曝光的问题导致图像细节丢失、颜色发生偏差，最终生成的图像质量低下。因此，画质增强技术应运而生，即通过一定的技术手段尽可能恢复出不含噪声、正确曝光且没有色偏的图像。The improvement of the performance of the display screen has brought people an excellent visual experience, but poor quality images or video sources will affect the final display effect. In actual production and life, affected by the scene conditions and the transmission process, some of the captured images not only contain large noise, but also the problems of overexposure and underexposure lead to the loss of image details and color deviation, and the final generated image quality is low . Therefore, image quality enhancement technology has emerged as the times require, that is, through certain technical means, images that are noise-free, correctly exposed, and have no color cast can be recovered as much as possible.

在相关技术中，通过画质增强算法使用不同的方式解决不同环境下成像设备造成的曝光度、色偏以及噪声等问题。这类方法所针对的问题比较单一，鲁棒性不高，无法满足真实场景下复杂多变的环境情况及生产需求。In related technologies, image quality enhancement algorithms use different methods to solve problems such as exposure, color shift, and noise caused by imaging devices in different environments. The problems targeted by this type of method are relatively simple, the robustness is not high, and it cannot meet the complex and changeable environmental conditions and production requirements in real scenarios.

在相关技术中，基于深度学习的画质增强算法，相较于比较传统的画质增强算法，可以进一步提高了画质增强算法的视觉效果与鲁棒性，但是存在通用性不强，或者仅能进行图像比较单一方面的图像质量的提升。In related technologies, the image quality enhancement algorithm based on deep learning can further improve the visual effect and robustness of the image quality enhancement algorithm compared with the traditional image quality enhancement algorithm, but the generality is not strong, or only It is possible to improve the image quality of a single aspect of image comparison.

发明内容Contents of the invention

本公开提供一种图像处理方法、图像处理模型训练方法及装置、电子设备及存储介质。The disclosure provides an image processing method, an image processing model training method and device, electronic equipment, and a storage medium.

本公开实施例第一方面提供一种图像处理方法，其特征在于，所述方法包括：The first aspect of the embodiments of the present disclosure provides an image processing method, wherein the method includes:

对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图；其中，所述N为等于或大于2的正整数；Performing multi-scale feature encoding on the first image to obtain N feature maps of different sizes; wherein, the N is a positive integer equal to or greater than 2;

对第n解码图进行局部注意力机制处理之后进行上采样，得到第n上采样图；其中，所述n为小于所述N的正整数；第0解码图是根据第N-1特征图确定的；所述第N-1特征图为N个不同尺寸的特征图中最后一个产生的特征图；After performing local attention mechanism processing on the nth decoded image, perform upsampling to obtain the nth upsampled image; wherein, the n is a positive integer smaller than the N; the 0th decoded image is determined according to the N-1th feature map The N-1th feature map is the last feature map generated in N feature maps of different sizes;

基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图；Based on the global attention mechanism, the N-n feature map and the nth upsampled image are fused to obtain the n+1th decoded image;

基于第N-1解码图得到第二图像，其中，所述第二图像的图像内容与所述第一图像的图像内容相同，且所述第二图像的图像质量高于所述第一图像的图像质量。A second image is obtained based on the N-1th decoded image, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than that of the first image Image Quality.

在一些实施例中，所述对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图，包括：In some embodiments, the multi-scale feature encoding is performed on the first image to obtain N feature maps of different sizes, including:

对所述第一图像进行卷积，得到第0特征图；Convolving the first image to obtain the 0th feature map;

对第m特征图进行多尺度特征提取得到第m+1特征图，其中，所述m为小于或等于所述N-1的正整数或0。Performing multi-scale feature extraction on the mth feature map to obtain the m+1th feature map, wherein the m is a positive integer less than or equal to the N-1 or 0.

在一些实施例中，所述对第m特征图进行多尺度特征提取得到第m+1特征图，包括：In some embodiments, performing multi-scale feature extraction on the mth feature map to obtain the m+1th feature map includes:

使用不同尺度的卷积核对所述第m特征图进行卷积，得到相同尺寸的多个第一类中间特征图；Convolving the mth feature map using convolution kernels of different scales to obtain multiple first-type intermediate feature maps of the same size;

融合多个所述第一类中间特征图，得到所述第m+1特征图。Fusing multiple intermediate feature maps of the first type to obtain the m+1th feature map.

在一些实施例中，所述将第n解码图基于局部注意力机制处理之后进行上采样得到第n上采样图，包括：In some embodiments, the nth decoded image is processed based on the local attention mechanism and then upsampled to obtain the nth upsampled image, including:

基于所述第n解码图，生成所述局部注意力机制的第一权重；generating a first weight of the local attention mechanism based on the nth decoded image;

基于所述第一权重对所述第n解码图进行加权处理，得到第二类中间特征图；performing weighting processing on the nth decoded image based on the first weight to obtain a second type of intermediate feature map;

对所述第二类中间特征图进行上采样处理，得到所述第n上采样图。Perform upsampling processing on the intermediate feature map of the second type to obtain the nth upsampling map.

在一些实施例中，所述基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图，包括：In some embodiments, the global attention mechanism is used to fuse the N-n feature map and the nth upsampled image to obtain the n+1th decoded image, including:

根据所述第n解码图，生成所述全局注意力机制的第二权重；generating a second weight of the global attention mechanism according to the nth decoding map;

根据所述第二权重对所述第N-n特征图进行加权处理，得到第三类中间特征图；performing weighting processing on the N-nth feature map according to the second weight to obtain a third type of intermediate feature map;

融合所述第三类中间特征图与所述第n上采样图，得到所述第n+1解码图。fusing the third type of intermediate feature map with the nth upsampled image to obtain the n+1th decoded image.

本公开实施例第二方面提供一种图像处理模型训练方法，其特征在于，包括：The second aspect of the embodiments of the present disclosure provides an image processing model training method, which is characterized in that it includes:

训练执行第一方面图像处理方法的图像处理模型。An image processing model for performing the image processing method of the first aspect is trained.

在一些实施例中，所述训练执行第一方面图像处理方法的图像处理模型，包括：In some embodiments, the training of the image processing model that executes the image processing method of the first aspect includes:

将样本图像输入到第一模型，得到所述第一模型输出的预测图像；inputting the sample image into the first model to obtain the predicted image output by the first model;

将所述预测图像输入到第二模型，得到第二模型输出的M个第一特征图；其中，所述M为大于或等于2的正整数；The predicted image is input to the second model to obtain M first feature maps output by the second model; wherein, the M is a positive integer greater than or equal to 2;

将所述样本图像对应的目标图像输入到所述第二模型，得到所述第二模型输出M个第二特征图；Inputting the target image corresponding to the sample image into the second model, and obtaining M second feature maps output by the second model;

根据第p个所述第一特征图和第p个所述第二特征图之间的差异，得到第一类损失值，其中，所述p为小于或等于所述M的正整数；According to the difference between the pth first feature map and the pth second feature map, the first type of loss value is obtained, wherein the p is a positive integer less than or equal to the M;

基于所述第一类损失值，调整所述第一模型的模型参数。Adjusting model parameters of the first model based on the first type of loss value.

在一些实施例中，所述训练执行第一方面图像处理方法的图像处理模型，还包括：In some embodiments, the training of the image processing model that executes the image processing method of the first aspect further includes:

根据所述预测图像和所述目标图像的差异，得到第二类损失值；Obtaining a second type of loss value according to the difference between the predicted image and the target image;

所述基于所述第一类损失值，调整所述第一模型的模型参数，包括：The adjusting the model parameters of the first model based on the first type of loss value includes:

根据所述第一类损失值和所述第二类损失值，调整所述第一模型的模型参数。Adjust model parameters of the first model according to the first type of loss value and the second type of loss value.

根据所述预测图像中第s个像素以及与所述第s个像素的相邻像素之间的像素差，得到第三类损失值，其中，所述s为小于或等于S的正整数；所述S为所述预测图像包含的像素总个数；According to the pixel difference between the sth pixel in the predicted image and the adjacent pixels of the sth pixel, the third type of loss value is obtained, wherein the s is a positive integer less than or equal to S; Said S is the total number of pixels included in the predicted image;

所述根据所述第一类损失值和所述第二类损失值，调整所述第一模型的模型参数，包括：The adjusting the model parameters of the first model according to the first type loss value and the second type loss value includes:

根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值；Obtaining an image loss value according to the first type of loss value, the second type of loss value, and the third type of loss value;

基于所述图像损失值，调整所述第一模型的模型参数。Based on the image loss value, a model parameter of the first model is adjusted.

在一些实施例中，所述根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值，包括：In some embodiments, the image loss value obtained according to the first type of loss value, the second type of loss value and the third type of loss value includes:

对所述第一类损失值、所述第一类损失值的权重、所述第二类损失值、所述第二类损失值的权重、所述第三类损失值、所述第三类损失值的权重进行加权求和，得到所述图像损失值；For the first type of loss value, the weight of the first type of loss value, the second type of loss value, the weight of the second type of loss value, the third type of loss value, the third type The weight of the loss value is weighted and summed to obtain the image loss value;

其中，所述第一类损失值的权重小于所述第二类损失值的权重；Wherein, the weight of the first type of loss value is less than the weight of the second type of loss value;

所述第三类损失值的权重小于所述第一类损失值的权重。The weight of the third type of loss value is smaller than the weight of the first type of loss value.

本公开第三方面提供一种图像处理装置，所述装置包括：A third aspect of the present disclosure provides an image processing device, the device comprising:

编码模块，用于对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图；其中，所述N为等于或大于2的正整数；An encoding module, configured to perform multi-scale feature encoding on the first image to obtain N feature maps of different sizes; wherein, the N is a positive integer equal to or greater than 2;

转换模块，用于对第n解码图进行局部注意力机制处理之后进行上采样，得到第n上采样图；其中，所述n为小于所述N的正整数；第0解码图是根据第N-1特征图确定的；所述第N-1特征图为N个不同尺寸的特征图中最后一个产生的特征图；The conversion module is used to perform upsampling on the nth decoded image after local attention mechanism processing to obtain the nth upsampled image; wherein, the n is a positive integer smaller than the N; the 0th decoded image is based on the Nth The -1 feature map is determined; the N-1th feature map is the last feature map generated in the N feature maps of different sizes;

融合模块，用于基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图；The fusion module is used to fuse the N-n feature map and the nth upsampled image based on the global attention mechanism to obtain the n+1th decoded image;

得到模块，用于基于第N-1解码图得到第二图像，其中，所述第二图像的图像内容与所述第一图像的图像内容相同，且所述第二图像的图像质量高于所述第一图像的图像质量。An obtaining module, configured to obtain a second image based on the N-1th decoded image, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than the Describe the image quality of the first image.

在一些实施例中，所述编码模块，具体用于对所述第一图像进行卷积，得到第0特征图；对第m特征图进行多尺度特征提取得到第m+1特征图，其中，所述m为小于或等于所述N-1的正整数或0。In some embodiments, the encoding module is specifically configured to perform convolution on the first image to obtain the 0th feature map; perform multi-scale feature extraction on the mth feature map to obtain the m+1th feature map, wherein, Said m is a positive integer less than or equal to said N-1 or 0.

在一些实施例中，所述编码模块，具体用于使用不同尺度的卷积核对所述第m特征图进行卷积，得到相同尺寸的多个第一类中间特征图；融合多个所述第一类中间特征图，得到所述第m+1特征图。In some embodiments, the encoding module is specifically configured to use convolution kernels of different scales to convolve the mth feature map to obtain multiple first-type intermediate feature maps of the same size; A class of intermediate feature maps to obtain the m+1th feature map.

在一些实施例中，所述转换模块，具体用于基于所述第n解码图，生成所述局部注意力机制的第一权重；基于所述第一权重对所述第n解码图进行加权处理，得到第二类中间特征图；对所述第二类中间特征图进行上采样处理，得到所述第n上采样图。In some embodiments, the conversion module is specifically configured to generate a first weight of the local attention mechanism based on the nth decoded image; perform weighting processing on the nth decoded image based on the first weight , to obtain the second type of intermediate feature map; performing upsampling processing on the second type of intermediate feature map to obtain the nth upsampled map.

在一些实施例中，所述融合模块，具体用于根据所述第n解码图，生成所述全局注意力机制的第二权重；根据所述第二权重对所述第N-n特征图进行加权处理，得到第三类中间特征图；融合所述第三类中间特征图与所述第n上采样图，得到所述第n+1解码图。In some embodiments, the fusion module is specifically configured to generate a second weight of the global attention mechanism according to the nth decoded image; perform weighting processing on the N-nth feature map according to the second weight , to obtain a third type of intermediate feature map; fusing the third type of intermediate feature map with the nth upsampling image to obtain the n+1th decoding image.

本公开实施例第四方面提供一种图像处理模型训练装置，包括：The fourth aspect of the embodiments of the present disclosure provides an image processing model training device, including:

训练模块，用于训练执行第一方面的图像处理方法的图像处理模型。The training module is used for training the image processing model for executing the image processing method of the first aspect.

在一些实施例中，所述训练模块，包括：In some embodiments, the training module includes:

预测单元，用于将样本图像输入到第一模型，得到所述第一模型输出的预测图像；a prediction unit, configured to input the sample image into the first model, and obtain a predicted image output by the first model;

第一输入单元，用于将所述预测图像输入到第二模型，得到第二模型输出的M个第一特征图；其中，所述M为大于或等于2的正整数；The first input unit is configured to input the predicted image to the second model to obtain M first feature maps output by the second model; wherein, the M is a positive integer greater than or equal to 2;

第二输入单元，用于将所述样本图像对应的目标图像输入到所述第二模型，得到所述第二模型输出M个第二特征图；The second input unit is configured to input the target image corresponding to the sample image into the second model, and obtain M second feature maps output by the second model;

第一损失单元，用于根据第p个所述第一特征图和第p个所述第二特征图之间的差异，得到第一类损失值，其中，所述p为小于或等于所述M的正整数；The first loss unit is configured to obtain a first-type loss value according to the difference between the p-th first feature map and the p-th second feature map, wherein the p is less than or equal to the a positive integer of M;

调整单元，用于基于所述第一类损失值，调整所述第一模型的模型参数。An adjustment unit, configured to adjust model parameters of the first model based on the first type of loss value.

在一些实施例中，所述训练模块，还包括：In some embodiments, the training module further includes:

第二损失单元，用于根据所述预测图像和所述目标图像的差异，得到第二类损失值；a second loss unit, configured to obtain a second type of loss value according to the difference between the predicted image and the target image;

所述调整单元，具体用于根据所述第一类损失值和所述第二类损失值，调整所述第一模型的模型参数。The adjusting unit is specifically configured to adjust model parameters of the first model according to the first type of loss value and the second type of loss value.

第三损失单元，用于根据所述预测图像中第s个像素以及与所述第s个像素的相邻像素之间的像素差，得到第三类损失值，其中，所述s为小于或等于S的正整数；所述S为所述预测图像包含的像素总个数；The third loss unit is configured to obtain a third type of loss value according to the pixel difference between the sth pixel in the predicted image and the adjacent pixel of the sth pixel, wherein the s is less than or A positive integer equal to S; the S is the total number of pixels contained in the predicted image;

所述调整单元，用于根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值；基于所述图像损失值，调整所述第一模型的模型参数。The adjustment unit is configured to obtain an image loss value according to the first type loss value, the second type loss value, and the third type loss value; adjust the first model based on the image loss value model parameters.

在一些实施例中，所述第四损失单元，用于对所述第一类损失值、所述第一类损失值的权重、所述第二类损失值、所述第二类损失值的权重、所述第三类损失值、所述第三类损失值的权重进行加权求和，得到所述图像损失值；In some embodiments, the fourth loss unit is configured to calculate the first type of loss value, the weight of the first type of loss value, the second type of loss value, and the weight of the second type of loss value The weight, the third type of loss value, and the weight of the third type of loss value are weighted and summed to obtain the image loss value;

其中，所述第一类损失值的权重小于所述第二类损失值的权重；Wherein, the weight of the first type of loss value is smaller than the weight of the second type of loss value;

本公开实施例第五方面提供一种电子设备，包括：A fifth aspect of an embodiment of the present disclosure provides an electronic device, including:

用于存储处理器可执行指令的存储器；memory for storing processor-executable instructions;

处理器，与所述存储器连接；a processor connected to the memory;

其中，所述处理器被配置为执行如第一方面或第二方面任意技术方案提供的方法。Wherein, the processor is configured to execute the method provided in any technical solution of the first aspect or the second aspect.

本公开实施例第六方面提供一种非临时性计算机可读存储介质，当所述存储介质中的指令由计算机的处理器执行时，使得计算机能够执行如第一方面或第二方面任意技术方案提供的法。The sixth aspect of the embodiments of the present disclosure provides a non-transitory computer-readable storage medium. When the instructions in the storage medium are executed by the processor of the computer, the computer can execute any technical solution according to the first aspect or the second aspect. provided law.

本公开的实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

通过多尺度特征编码，如此对第一图像进行图像质量优化的过程中，保留更多的图像细节等图像信息；通过局部注意力机制和全局注意力机制的结合，可以更完整地保留图像的全局信息、局部信息以及图像纹理的恢复和重建，从而实现对第一图像的去噪、颜色校正、曝光调节之后，得到图像质量得到优化的第二图像。Through multi-scale feature encoding, in the process of optimizing the image quality of the first image, more image information such as image details is retained; through the combination of local attention mechanism and global attention mechanism, the global image can be more completely preserved Information, local information, and image texture recovery and reconstruction, so as to realize denoising, color correction, and exposure adjustment of the first image, and obtain a second image with optimized image quality.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种图像处理方法的流程示意图；Fig. 1 is a schematic flowchart of an image processing method according to an exemplary embodiment;

图2是根据一示例性实施例示出的一种图像处理模型进行图像处理的示意图；Fig. 2 is a schematic diagram showing an image processing model performing image processing according to an exemplary embodiment;

图3是根据一示例性实施例示出的一种多尺度特征提取的示意图；Fig. 3 is a schematic diagram of a multi-scale feature extraction according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种基于全局注意力机制进行图像处理的示意图；Fig. 4 is a schematic diagram of image processing based on a global attention mechanism according to an exemplary embodiment;

图5根据一示例性实施例示出的的一种局部注意力机制的第一权重和全局注意力机制的第二权重的生成、以及图像融合的示意图；Fig. 5 is a schematic diagram showing the generation of the first weight of a local attention mechanism and the second weight of the global attention mechanism, and image fusion according to an exemplary embodiment;

图6是根据一示例性实施例示出的一种图像处理模型训练方法的示意图；Fig. 6 is a schematic diagram of an image processing model training method according to an exemplary embodiment;

图7是根据一示例性实施例示出的第一类损失值的确定示意图；Fig. 7 is a schematic diagram of determining the first type of loss value according to an exemplary embodiment;

图8是根据一示例性实施例示出的另一种图像处理模型训练方法的示意图；Fig. 8 is a schematic diagram showing another image processing model training method according to an exemplary embodiment;

图9是根据一示例性实施例示出的图像处理方法的示意图；Fig. 9 is a schematic diagram of an image processing method according to an exemplary embodiment;

图10A是根据一示例性实施例示出的图像处理效果比对示意图；Fig. 10A is a schematic diagram showing comparison of image processing effects according to an exemplary embodiment;

图10B是根据一示例性实施例示出的图像处理效果比对示意图；Fig. 10B is a schematic diagram showing comparison of image processing effects according to an exemplary embodiment;

图10C是根据一示例性实施例示出的图像处理效果比对示意图；Fig. 10C is a schematic diagram showing comparison of image processing effects according to an exemplary embodiment;

图11是根据一示例性实施例示出的图像处理装置的结构示意图；Fig. 11 is a schematic structural diagram of an image processing device according to an exemplary embodiment;

图12是根据一示例性实施例示出的图像处理训练装置的结构示意图；Fig. 12 is a schematic structural diagram of an image processing training device according to an exemplary embodiment;

图13是根据一示例性实施例示出的终端设备的结构示意图；Fig. 13 is a schematic structural diagram of a terminal device according to an exemplary embodiment;

图14是根据一示例性实施例示出的服务器的结构示意图。Fig. 14 is a schematic structural diagram of a server according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices consistent with aspects of the present disclosure as recited in the appended claims.

如图1所示，本公开实施例提供一种图像处理方法，包括：As shown in FIG. 1, an embodiment of the present disclosure provides an image processing method, including:

S110：对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图；其中，所述N为等于或大于2的正整数；S110: Perform multi-scale feature encoding on the first image to obtain N feature maps of different sizes; wherein, N is a positive integer equal to or greater than 2;

S120：对第n解码图进行局部注意力机制处理之后进行上采样，得到第n上采样图；其中，所述n为小于所述N的正整数；第0解码图是根据第N-1特征图确定的；所述第N-1特征图为N个不同尺寸的特征图中最后一个产生的特征图；S120: Perform upsampling on the nth decoded image after local attention mechanism processing to obtain the nth upsampled image; wherein, the n is a positive integer smaller than the N; the 0th decoded image is based on the N-1th The feature map is determined; the N-1th feature map is the last feature map generated in the N feature maps of different sizes;

S130：基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图；S130: Fusion processing the N-n feature map and the nth upsampled image based on the global attention mechanism to obtain the n+1th decoded image;

S140：基于第N-1解码图得到第二图像，其中，所述第二图像的图像内容与所述第一图像的图像内容相同，且所述第二图像的图像质量高于所述第一图像的图像质量。S140: Obtain a second image based on the N-1th decoded image, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than that of the first image The image quality of the image.

该图像处理方法的执行设备包括但不限于：终端设备和服务器等电子设备。Executing devices of the image processing method include but are not limited to electronic devices such as terminal devices and servers.

所述终端设备包括但不限于：手机、平板电脑、可穿戴式设备、智能家居设备或者智能办公设备等。The terminal devices include, but are not limited to: mobile phones, tablet computers, wearable devices, smart home devices, or smart office devices.

该服务器可包括各种应用服务器或者图像服务器。The server may include various application servers or image servers.

在本公开实施例中，第一图像可为待优化的原始图像。In an embodiment of the present disclosure, the first image may be an original image to be optimized.

本公开实施例提供的图像处理方法可以是基于深度学习模型执行的。该深度学习模型用于图像处理，从而又可以称之为图像处理模型。The image processing method provided by the embodiments of the present disclosure may be executed based on a deep learning model. The deep learning model is used for image processing, so it can be called an image processing model.

如图2所示，该图像处理模块可包括：As shown in Figure 2, the image processing module may include:

编码层，用于对第一图像进行特征提取得到特征图；示例性地，该图像处理模型至少具有N个编码层，用于生成N个特征图；示例性地，编码层可对应于图2中产生F₀至F_n的网络层；The coding layer is used to perform feature extraction on the first image to obtain a feature map; exemplary, the image processing model has at least N coding layers for generating N feature maps; exemplary, the coding layer may correspond to FIG. 2 Produce the network layer of F ₀ to F _n in;

解码层，用于对特征图进行进一步解码得到优化后的图像；示例性地，该图像处理模型至少具有N个解码层。示例性地，解码层可为图2中产生R₀至R_m的网络层。The decoding layer is used to further decode the feature map to obtain an optimized image; for example, the image processing model has at least N decoding layers. Exemplarily, the decoding layer may be the network layer generating R ₀ to R _m in FIG. 2 .

编码层和解码层之间可具有跳转连接，可以实现不同尺寸的特征图在不同编码层、不同解码层之间的跳转传输。There can be a jump connection between the encoding layer and the decoding layer, which can realize the jump transmission of feature maps of different sizes between different encoding layers and different decoding layers.

利用不同尺寸的卷积核对第一图像多尺度的编码从而得到包含不同图像信息的特征图。这些特征图的图像尺寸是不同的，从而不同尺度的特征图可以保留第一图像不同层次的图像信息，从而方便得到后续高还原的第二图像。N个特征图可分别是第0特征图至第N-1特征图。其中，第0特征图至第N-1特征图的图像尺寸依次减小。The multi-scale encoding of the first image is performed using convolution kernels of different sizes to obtain feature maps containing different image information. The image sizes of these feature maps are different, so that the feature maps of different scales can retain the image information of different levels of the first image, so as to facilitate the subsequent second image with high restoration. The N feature maps may be the 0th feature map to the N−1th feature map, respectively. Among them, the image size of the 0th feature map to the N-1th feature map decreases sequentially.

参考图2所示，F₀可为第0特征图，是通过对原始图像I_ori的卷积操作，F₁可为F₀进行多尺度特征提取之后得到第1特征图，F₂可为F₁进行多尺度特征提取之后得到第2特征图；F_n可为F_n-1进行多尺度特征提取之后得到第n特征图。F₀至F_n包含的特征可为如图2中所示的低级特征。Referring to Figure 2, F ₀ can be the 0th feature map, which is the convolution operation on the original image I _ori , F ₁ can be the first feature map obtained after multi-scale feature extraction of F ₀ , and F ₂ can be F ₁ The second feature map is obtained after multi-scale feature extraction; F _n can be F _n-1 and the n-th feature map is obtained after multi-scale feature extraction. The features included by F ₀ to F _n may be low-level features as shown in FIG. 2 .

图2所示的图像处理模型的编码层，可为产生特征图R₀至R_m。R₀至R_m包含的特征可为如图2中所示的高级特征。The encoding layer of the image processing model shown in FIG. 2 can generate feature maps R ₀ to R _m . The features comprised by R ₀ to R _m may be high-level features as shown in FIG. 2 .

第N-1特征图经过处理之后，会得到第0解码层的输入图像。After the N-1th feature map is processed, the input image of the 0th decoding layer will be obtained.

在一个实施例中，所述第0特征图可以直接是第N-1特征图。In an embodiment, the 0th feature map may directly be the N-1th feature map.

在另一个实施例中，通过全局特征提取模块(GFEB)对所述第N-1特征图进行全局特征提取，得到所述第0解码图。示例性，通过级联的卷积层对所述第N-1特征图进行卷积处理得到所述第0解码图。In another embodiment, a global feature extraction module (GFEB) is used to perform global feature extraction on the N-1th feature map to obtain the 0th decoded map. Exemplarily, the N-1th feature map is convoluted through cascaded convolutional layers to obtain the 0th decoded map.

对第n解码图进行局部注意力机制处理之后，得到处理后的图像。采用局部注意力机制对第n解码图进行处理，可包括：将第n解码图和权重进行加权处理之后，得到抑制噪声、实现色偏校正、欠曝光的曝光补强、过度曝光之后降低曝光值等特征图。After the nth decoded image is processed by the local attention mechanism, the processed image is obtained. The local attention mechanism is used to process the nth decoded image, which may include: after weighting the nth decoded image and weights, suppressing noise, realizing color cast correction, underexposure exposure enhancement, and reducing exposure value after overexposure and other feature maps.

得到对第n解码图进行局部注意力机制处理后的图像进行上采样，如此使得会得到图像尺寸更大的上采样图像。上采样图像的图像尺寸会等于第N-n特征图的图像尺寸。The image processed by the local attention mechanism on the nth decoded image is obtained to be up-sampled, so that an up-sampled image with a larger image size can be obtained. The image size of the upsampled image will be equal to the image size of the N-nth feature map.

第N-n特征图可以通过图像处理模型的跳转连接传输在进行第n解码图像处理的解码层。The N-nth feature map can be transmitted to the decoding layer that processes the nth decoded image through the jump connection of the image processing model.

第n解码层采用全局注意力机制对第N-n特征图和上采样图像的各像素进行加权处理，以更完整地保留第N-n特征图的图像信息。The nth decoding layer uses a global attention mechanism to weight the N-nth feature map and each pixel of the upsampled image to more completely preserve the image information of the N-nth feature map.

在S130中进行上采样图像与第N-n特征图进行融合时，将上采样图像和第N-n特征图的像素对齐之后，基于全局注意力机制对应的权重实现逐像素融合。即，按照全局注意力机制对应的权重，将上采样图像中的第x行第y列的像素，与第N-n特征图中的第x行第y列的像素进行融合，融合后将得到第n+1解码图。When the upsampled image is fused with the N-nth feature map in S130, after the pixels of the upsampled image and the N-nth feature map are aligned, pixel-by-pixel fusion is implemented based on the weight corresponding to the global attention mechanism. That is, according to the weight corresponding to the global attention mechanism, the pixel in the xth row and the yth column in the upsampled image is fused with the xth row and the yth column in the N-nth feature map. After fusion, the nth +1 for the decoded diagram.

第n+1解码图可为第n解码层的输入图像。The n+1th decoded picture may be an input picture of the nth decoded layer.

第N-1解码图可为最后一个解码层的解码图。其中，第0解码图至第N-1解码图的图像尺寸依次增加。The N-1th decoded picture may be a decoded picture of the last decoding layer. Wherein, the image size of the 0th decoded picture to the N-1th decoded picture increases sequentially.

示例性地，S140可包括：对第N-1解码图进行卷积得到第二图像。示例性地，使用卷积核对第N-1解码图进行卷积操作，从而通过卷积操作得到卷积后的像素值，构成所述第二图像。例如，第N-1解码图的图像尺寸可能大于第一图像的图像尺寸，通过大于1的卷积步长的卷积核与第N-1解码图像的卷积操作之后，会得到图像尺寸等于第一图像的第二图像。Exemplarily, S140 may include: performing convolution on the N-1th decoded image to obtain the second image. Exemplarily, a convolution kernel is used to perform a convolution operation on the N-1th decoded image, so that the convolution operation obtains pixel values after convolution to form the second image. For example, the image size of the N-1th decoded image may be larger than the image size of the first image. After the convolution operation of the convolution kernel with a convolution step size greater than 1 and the N-1th decoded image, the image size is equal to The second image of the first image.

通过多尺度特征编码，如此对第一图像进行图像质量优化的过程中，保留更多的图像细节等图像信息；通过局部注意力机制和全局注意力机制可以更多的保留图像的全局信息、局部信息以及图像纹理的恢复和重建，从而实现对第一图像的去噪、颜色校正、曝光调节之后，得到图像质量得到优化的第二图像。Through multi-scale feature encoding, in the process of optimizing the image quality of the first image, more image information such as image details is retained; through the local attention mechanism and the global attention mechanism, more global information and local information of the image can be retained. Information and image texture recovery and reconstruction, so as to realize the denoising, color correction, and exposure adjustment of the first image, and obtain the second image with optimized image quality.

在一些实施例中，所述S110可包括：In some embodiments, the S110 may include:

相邻两个编码层，前一个编码层的输出图像是下一个编码层的输入图像。示例性地，第0个编码层的输出图像为第1个编码层输入图像。There are two adjacent coding layers, and the output image of the previous coding layer is the input image of the next coding layer. Exemplarily, the output image of the 0th coding layer is the input image of the 1st coding layer.

第1编程至第N-1编码层分别对输入图像进行多尺度特征提取，得到当前编码层的输出图像。The 1st programming to the N-1th encoding layer respectively perform multi-scale feature extraction on the input image to obtain the output image of the current encoding layer.

多尺度特征提取可是利用不同的卷积和对输入图像进行卷积处理，进行不同尺寸卷积核特征提取之后的特征图进行融合，得到当前编码层的输出图像。Multi-scale feature extraction can use different convolutions to convolve the input image, and then fuse the feature maps after different size convolution kernel feature extraction to obtain the output image of the current coding layer.

通过多尺寸特征提取，可以提取不同层次的图像信息，从而生成图像内容还原度更高且图像质量更高的第二图像。Through multi-scale feature extraction, different levels of image information can be extracted, thereby generating a second image with a higher restoration degree of image content and higher image quality.

不同尺寸的卷积核可以以相同的卷积步长(简称步长)对第m图像进行卷积，得到卷积后的图像。参考图3所示，利用两个卷积核分别对第m特征图进行卷积处理得到卷积后的图像。Convolution kernels of different sizes can convolve the m-th image with the same convolution step size (step size for short) to obtain a convolved image. Referring to FIG. 3 , two convolution kernels are used to perform convolution processing on the mth feature map respectively to obtain a convolved image.

由于这两个卷积核的大小不一样，如此，但是采用相同的卷积步长进行第m特征图进行卷积，将得到相同尺寸的卷积后图像，然后将这些卷积后图像融合将得到第m+1特征图。Since the sizes of the two convolution kernels are different, this is the case, but using the same convolution step size to perform convolution on the mth feature map, the convolution image of the same size will be obtained, and then these convolution images will be fused. Get the m+1th feature map.

图3展示的两个卷积核，一个为3*3的卷积核，一个是1*1的卷积核，且卷积步长为2。3*3卷积核卷积后图像得到的3*3像素的特征图，如此：3*3的卷积核处理后的图像尺寸和1*1的卷积核卷后的图像，相对于处理的基础图像像素个数均减，将两个卷积后的图像对位相加，就可以得到第m+1特征图。The two convolution kernels shown in Figure 3, one is a 3*3 convolution kernel, the other is a 1*1 convolution kernel, and the convolution step is 2. The image obtained after convolution of the 3*3 convolution kernel The feature map of 3*3 pixels is like this: the size of the image processed by the 3*3 convolution kernel and the image after the convolution kernel of 1*1 are reduced relative to the number of pixels of the basic image processed, and the two The convolved image is added to the phase, and the m+1th feature map can be obtained.

如此，利用较大尺寸的卷积核保留了第一图像更多的局部颜色和/或纹理等图像信息，利用较小尺寸的卷积核保留了第一图像的内容信息。且通过多尺度卷积核处理后的特征图融合后，可以得到保留更多图像信息的特征图，且实现对图像内容的精细化处理。In this way, more image information such as local color and/or texture of the first image is preserved by using a larger-sized convolution kernel, and content information of the first image is preserved by using a smaller-sized convolution kernel. And after the feature map processed by the multi-scale convolution kernel is fused, the feature map that retains more image information can be obtained, and the refined processing of the image content can be realized.

所述对第m特征图进行多尺度特征提取得到第m+1特征图，包括：The multi-scale feature extraction of the mth feature map to obtain the m+1th feature map includes:

多个尺度的卷积核采用的相同卷积步长对第m特征图进行处理，得到卷积后的图像称之为所述第一类中间特征图。多个第一类特征图的图像尺寸是相同的。Convolution kernels of multiple scales use the same convolution step to process the m-th feature map, and the obtained image after convolution is called the first type of intermediate feature map. The image size of multiple first-class feature maps is the same.

融合多个第一类中间特征图包括：多个第一类中间特征图像素对齐之后，将特征值相加或者特征值算术平均之后，得到最终的第m+1特征图。Fusing multiple intermediate feature maps of the first type includes: after pixel alignment of multiple intermediate feature maps of the first type, adding feature values or arithmetically averaging feature values to obtain a final m+1th feature map.

例如，所述基于所述第n解码图，生成所述局部注意力机制的第一权重，包括：For example, the generating the first weight of the local attention mechanism based on the nth decoded image includes:

对第n解码图进行全局池化，得到特征向量；Perform global pooling on the nth decoded image to obtain the feature vector;

对特征向量进行全连接处理，得到第一权重向量，第一权重向量包括：多个第一权重，多个所述第一权重相同或者不同。示例性地，在进行全连接处理时，可以通过一次或多次卷积，得到与第n解码图像同图像尺寸的第n上采样图。Performing full connection processing on the feature vectors to obtain a first weight vector, where the first weight vector includes: a plurality of first weights, and the plurality of first weights are the same or different. Exemplarily, when performing full-connection processing, one or more convolutions may be performed to obtain an nth upsampled image having the same image size as the nth decoded image.

此处的加权处理可包括：将第n解码图的特征值(或称像素值)与对应的权重相乘，得到新的特征值，从而生成所述第二类中间特征图。The weighting process here may include: multiplying the eigenvalue (or pixel value) of the nth decoded image by the corresponding weight to obtain a new eigenvalue, thereby generating the second type of intermediate eigenmap.

通过对第二类中间特征图的上采样处理，将得到像素个数增加的第n上采样图。By upsampling the intermediate feature map of the second type, an nth upsampled map with an increased number of pixels will be obtained.

参考图5所示，R_m-k-1可为进行全局池化得到包含2C个向量，通过涉及卷积操作的全连接操作得到2C个第一权重。Referring to FIG. 5 , R _mk-1 can be obtained by performing global pooling to contain 2C vectors, and 2C first weights can be obtained through a fully connected operation involving convolution operations.

对所述第二类中间特征图进行上采样处理，得到所述第n上采样图，可包括：Performing upsampling processing on the second type of intermediate feature map to obtain the nth upsampling map may include:

对所述第二类中间图像进行双线性插值处理，得到插值后的图像；performing bilinear interpolation processing on the second type of intermediate image to obtain an interpolated image;

对插值后的图像进行卷积处理，得到所述第n上采样图。Perform convolution processing on the interpolated image to obtain the nth upsampled image.

如此，通过局部注意力机制的处理，实现对第一图像局部内容的提取。In this way, the local content of the first image is extracted through the processing of the local attention mechanism.

参考图5所示，将2C个图像尺寸为h*w的特征图，通过双线性插值得到2C个图像尺寸为2h*2w的中间特征图，然后通过3*3的卷积核等进行卷积操作，得到C个图像尺寸为2h*2w的特征图，该C个图像尺寸为2h*2w的特征图即为前述第n上采样图。Referring to Figure 5, 2C feature maps with an image size of h*w are obtained through bilinear interpolation to obtain 2C intermediate feature maps with an image size of 2h*2w, and then convolution is performed through a 3*3 convolution kernel, etc. Product operation to obtain C feature maps with image size of 2h*2w, and the C feature maps with image size of 2h*2w are the aforementioned nth upsampled maps.

图5所示的2C个图像尺寸为h*w的特征图的不同特征图，可为相同解码层处理的同一个第一图像不同通道的特征图。例如，针对RGB图像，分别对应有R通道、G通道和B通道的特征图。The different feature maps of the 2C feature maps with image size h*w shown in FIG. 5 may be feature maps of different channels of the same first image processed by the same decoding layer. For example, for an RGB image, there are feature maps of the R channel, the G channel, and the B channel respectively.

图5所示的R_m-k为前述第n上采样图像和第N-n特征图融合之后的下一个解码层的输入图像。R _mk shown in FIG. 5 is the input image of the next decoding layer after the fusion of the aforementioned nth upsampled image and the Nnth feature map.

如图4所示，所述基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图，包括：As shown in Figure 4, the said global attention mechanism is based on the fusion processing of the N-n feature map and the nth upsampled image to obtain the n+1th decoded image, including:

S131：根据所述第n解码图，生成所述全局注意力机制的第二权重；S131: Generate a second weight of the global attention mechanism according to the nth decoded image;

S132：根据所述第二权重对所述第N-n特征图进行加权处理，得到第三类中间特征图；S132: Perform weighting processing on the N-nth feature map according to the second weight to obtain a third type of intermediate feature map;

S133：融合所述第三类中间特征图与所述第n上采样图，得到所述第n+1解码图。S133: Fuse the third type of intermediate feature map and the nth upsampled image to obtain the n+1th decoded image.

在一些实施例中，根据所述第n解码图，生成所述全局注意力机制的第二权重，可包括：In some embodiments, generating the second weight of the global attention mechanism according to the nth decoded image may include:

对特征向量进行全连接处理，得到第二权重向量，第二权重向量包括：多个第二权重。示例性地，多个第二权重的权重值可能相同或者不同。Performing full connection processing on the feature vectors to obtain a second weight vector, where the second weight vector includes: a plurality of second weights. Exemplarily, weight values of multiple second weights may be the same or different.

参考图5所示，R_m-k-1可为进行全局池化得到包含2C个向量，通过涉及卷积操作的全连接操作得到C个第二权重。Referring to FIG. 5 , R _mk-1 can be obtained by global pooling to include 2C vectors, and C second weights can be obtained through a fully connected operation involving convolution operations.

在图5中C或者2C代表的特征图的个数。图5中2h或者h代表的是对应特征图的行数。w或者2w代表的是对应特征图的列数。The number of feature maps represented by C or 2C in Figure 5. 2h or h in Figure 5 represents the number of rows corresponding to the feature map. w or 2w represents the number of columns of the corresponding feature map.

所述根据所述第二权重对所述第N-n特征图进行加权处理，得到第三类中间特征图，包括：The N-nth feature map is weighted according to the second weight to obtain a third type of intermediate feature map, including:

将第二权重与第N-n特征图中的特征值进行乘积运算，将乘积至作为第三类中间特征图的特征值。The second weight is multiplied by the feature value in the N-nth feature map, and the product is multiplied to the feature value of the third type of intermediate feature map.

所述融合所述第三类中间特征图与所述第n上采样图，得到所述第n+1解码图，包括：The fusing of the third type of intermediate feature map and the nth upsampling image to obtain the n+1th decoding image includes:

将相同行相同列的第三类中间特征图的特征值与第n上采样图的特征值融合，得到第n+1解码图中对应像素的特征值。The eigenvalues of the third type of intermediate feature maps in the same row and the same column are fused with the eigenvalues of the nth upsampled image to obtain the eigenvalues of the corresponding pixels in the n+1th decoded image.

如此，在对第一图像进行处理时，综合了全局注意力机制和局部注意力机制，及混合两种注意力机制，即采用混合注意力机制进行特征编码和解码的第二图像。如此，尺度的特征提取使得网络获取的图像信息更加丰富，而混合注意力模块在网络上采样过程中同时融合全局注意力和局部注意力，使得图像的全局信息、局部信息和细节纹理信息都得到不同程度的恢复和重建。图2中的HAU代表在各个解码层综合局部注意力机制和全局注意力机制的混合注意力机制。In this way, when processing the first image, the global attention mechanism and the local attention mechanism are integrated, and the two attention mechanisms are mixed, that is, the second image is encoded and decoded using the mixed attention mechanism. In this way, the scale feature extraction makes the image information acquired by the network more abundant, and the mixed attention module fuses global attention and local attention at the same time during the network upsampling process, so that the global information, local information and detailed texture information of the image are obtained. Various degrees of restoration and reconstruction. HAU in Figure 2 represents a hybrid attention mechanism that integrates local attention mechanisms and global attention mechanisms at each decoding layer.

本公开实施例还提供一种图像处理模型训练方法，包括：训练执行前述任意技术方案提供的图像处理方法的图像处理模型。An embodiment of the present disclosure also provides a method for training an image processing model, including: training an image processing model that executes the image processing method provided by any of the foregoing technical solutions.

即利用样本数据以及样本数据的标签，训练的图像处理模型可如2所示的图像处理模型。That is, using the sample data and the labels of the sample data, the trained image processing model can be the image processing model shown in 2.

如图6所示，本公开实施例提供的图像处理模型可包括：As shown in Figure 6, the image processing model provided by the embodiment of the present disclosure may include:

S210：将样本图像输入到第一模型，得到所述第一模型输出的预测图像；S210: Input the sample image into the first model, and obtain the predicted image output by the first model;

S220：将所述预测图像输入到第二模型，得到第二模型输出的M个第一特征图；其中，所述M为大于或等于2的正整数；S220: Input the predicted image into the second model to obtain M first feature maps output by the second model; wherein, the M is a positive integer greater than or equal to 2;

S230：将所述样本图像对应的目标图像输入到所述第二模型，得到所述第二模型输出M个第二特征图；S230: Input the target image corresponding to the sample image into the second model, and obtain M second feature maps output by the second model;

S240：根据第p个所述第一特征图和第p个所述第二特征图之间的差异，得到第一类损失值，其中，所述p为小于或等于所述M的正整数；S240: Obtain a first-type loss value according to the difference between the p-th first feature map and the p-th second feature map, where the p is a positive integer less than or equal to the M;

S250：基于所述第一类损失值，调整所述第一模型的模型参数。S250: Adjust model parameters of the first model based on the first type of loss value.

此处的第一模型就是要训练的图像处理模型，即可以执行前述任意实施例提供的图像处理方法的模型。The first model here is the image processing model to be trained, that is, a model that can execute the image processing method provided by any of the foregoing embodiments.

将待优化的样本图像输入到第一模型，第一模型经过一系列优化处理之后，会得到基于当前模型参数输出的预测图像。The sample image to be optimized is input to the first model, and after the first model undergoes a series of optimization processes, a predicted image output based on the current model parameters will be obtained.

在本公开实施例中，会将第一模型输出的预测图像和样本图像对应的待优化的目标图像分别输入到第二模型。第二模型包括一个或多个特征提取层，这些特征提取层通过一次或多次卷积，得到特征图。In the embodiment of the present disclosure, the predicted image output by the first model and the target image to be optimized corresponding to the sample image are respectively input into the second model. The second model includes one or more feature extraction layers, and these feature extraction layers obtain feature maps through one or more convolutions.

在本公开实施例中，预测图像输入到第二模型之后，会得到M个第一特征图，然后将呈现优化后目标效果的目标图像的M个第二特征图。In the embodiment of the present disclosure, after the predicted image is input to the second model, M first feature maps are obtained, and then M second feature maps of the target image showing the optimized target effect are obtained.

将第一特征图和第二特征图按照特征提取层进行排序并比对，得到第p个第一特征图和第p个第二特征图之间的差异，得到第一类损失值。The first feature map and the second feature map are sorted and compared according to the feature extraction layer, and the difference between the pth first feature map and the pth second feature map is obtained, and the first type of loss value is obtained.

根据第1个第一特征图和第1个第二特征图的差异，将得到一个损失项，如此，M个第一特征图和M个第二特征图，将共得到M个损失项。将M个损失项相加之后，将得到第一类损失值。According to the difference between the first first feature map and the first second feature map, a loss item will be obtained. In this way, M first feature maps and M second feature maps will get a total of M loss items. After adding the M loss items, the first type of loss value will be obtained.

参考图所示为第二模型可为视觉几何组网络(Visual Gemetry Group network，VCG)VGG，可以采用一个或多个修正线性单元(Rectified linear unit，ReLU)进行图像的特征提取。The reference figure shows that the second model can be a visual geometry group network (Visual Gemetry Group network, VCG) VGG, and one or more rectified linear units (Rectified linear unit, ReLU) can be used for image feature extraction.

示例性地，图7所示的第二模型可包括5个层，分别是relu1_2，relu2_2，relu3_3，relu4_3，relu5_3层，然后得到5个层分别输出的第一特征图和第二特征，然后将对应层第一特征图和第二特征图进行比较，得到对应层的第一特征图和第二特征图之间的差异，并结合5个层分别输出的第一特征图和第二特征图之间的差异，得到第一类损失值。图7中所示的I_gt可为目标图像；I_gen可为第一模型输出的预测图像。f_R ¹至f_R ⁵为M等于5时的第一特征图。f_G ¹至f_G ⁵为M等于5时的第二特征图。图7中所示的a_i为对每层网络输出计算损失的权重，该组权重对各个层输出的误差进行缩放，确保不同重要的特征图得到不同程度的像素差计算。Exemplarily, the second model shown in FIG. 7 may include 5 layers, namely relu1_2, relu2_2, relu3_3, relu4_3, relu5_3 layers, and then obtain the first feature map and the second feature respectively output by the 5 layers, and then Compare the first feature map and the second feature map of the corresponding layer to obtain the difference between the first feature map and the second feature map of the corresponding layer, and combine the first feature map and the second feature map respectively output by the five layers. The difference between , get the first type of loss value. I _gt shown in FIG. 7 may be a target image; I _gen may be a predicted image output by the first model. f _R ¹ to f _R ⁵ are the first feature maps when M is equal to 5. f _G ¹ to f _G ⁵ are the second feature maps when M is equal to 5. The a _i shown in Figure 7 is the weight for calculating the loss of each layer of network output. This set of weights scales the error of each layer output to ensure that different important feature maps get different degrees of pixel difference calculation.

若第一模型的模型参数能够实现目标效果的图像优化，则第一类损失值会足够小，若第一类损失值过大，则需要通过模型参数的调整，再通过样本图像的训练和模型参数的更新，实现第一模型的模型参数更新。第一模型的模型参数可包括：第一模型内各个节点的权重和/或阈值。If the model parameters of the first model can achieve the image optimization of the target effect, the first type of loss value will be small enough. If the first type of loss value is too large, it is necessary to adjust the model parameters, and then through the training of sample images and model The update of the parameters implements the update of the model parameters of the first model. The model parameters of the first model may include: weights and/or thresholds of each node in the first model.

示例性地，根据第一模型的损失值采用反向更新方式，更新第一模型的模型参数。Exemplarily, the model parameters of the first model are updated in a reverse update manner according to the loss value of the first model.

在一些实施例中，参考图8所示，所述图像处理模型的训练，还包括：In some embodiments, as shown in FIG. 8, the training of the image processing model further includes:

S241：根据所述预测图像和所述目标图像的差异，得到第二类损失值；S241: Obtain a second type of loss value according to the difference between the predicted image and the target image;

所述S250可包括：根据所述第一类损失值和所述第二类损失值，调整所述第一模型的模型参数。The S250 may include: adjusting model parameters of the first model according to the first type of loss value and the second type of loss value.

在本公开实施例中，可以直接将预测图像和目标图像进行比较，得到目标图像和预测图像之间的差异，从而得到第二类损失值。示例性地，将预测图像和目标图像之间的平均绝对误差(MAE)或者均方差(MSE)作为所述第二类损失值。In the embodiment of the present disclosure, the predicted image can be directly compared with the target image to obtain the difference between the target image and the predicted image, thereby obtaining the second type of loss value. Exemplarily, the mean absolute error (MAE) or mean square error (MSE) between the predicted image and the target image is used as the second type of loss value.

在计算得到第二类损失值之后，将计算第一类损失值和第二类损失值的算术平均或者加权平均，得到一个损失值，基于该损失值以反向传播等方式进行模型参数的更新等。After the second type of loss value is calculated, the arithmetic mean or weighted average of the first type of loss value and the second type of loss value will be calculated to obtain a loss value, and the model parameters will be updated based on the loss value by means of backpropagation, etc. wait.

若将第一类损失值和第二类损失值的加权平均作为更新模型参数的损失值，此时第一类损失值的权重值可小于所述第二类损失值的权重值，从而更加关注第二类损失值这种全局损失值，以加速第一模型的训练。If the weighted average of the first type loss value and the second type loss value is used as the loss value for updating model parameters, the weight value of the first type loss value can be smaller than the weight value of the second type loss value, thus paying more attention to The second type of loss value This global loss value to speed up the training of the first model.

在一些实施例中，所述图像处理模型的训练，还包括：In some embodiments, the training of the image processing model also includes:

S242：根据所述预测图像中第s个像素以及与所述第s个像素的相邻像素之间的像素差，得到第三类损失值，其中，所述s为小于或等于S的正整数；所述S为所述预测图像包含的像素总个数；S242: Obtain a third type of loss value according to the pixel difference between the sth pixel in the predicted image and the adjacent pixels of the sth pixel, where the s is a positive integer less than or equal to S ; The S is the total number of pixels included in the predicted image;

如图8所示，所述S250可包括：As shown in Figure 8, the S250 may include:

S251：根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值；S251: Obtain an image loss value according to the first type of loss value, the second type of loss value, and the third type of loss value;

S252：基于所述图像损失值，调整所述第一模型的模型参数。S252: Adjust model parameters of the first model based on the image loss value.

通过第三类损失值的损失引入，可以考虑到优化后的图像像素之间的平滑性。Through the introduction of the loss of the third type of loss value, the smoothness between the optimized image pixels can be considered.

例如，计算第一类损失值、第二类损失值、第三类损失值的和，得到所述图像损失值。或者，计算第一类损失值、第二类损失值以及第三类损失值的算术平均，得到所述图像损失值。又或者，计算所述第一类损失值、第二类损失值、第三类损失值的加权平均，得到所述图像损失值。若计算第一类损失值、第二类损失值以及第三类损失值的加权平均时，第一类损失值的权重、小于第二类损失值的权重，且第三类损失值的权重可小于或等于第一类损失值的权重。For example, the sum of the first type loss value, the second type loss value, and the third type loss value is calculated to obtain the image loss value. Alternatively, calculate the arithmetic mean of the first type of loss value, the second type of loss value and the third type of loss value to obtain the image loss value. Alternatively, a weighted average of the first type loss value, the second type loss value, and the third type loss value is calculated to obtain the image loss value. When calculating the weighted average of the first type of loss value, the second type of loss value and the third type of loss value, the weight of the first type of loss value is less than the weight of the second type of loss value, and the weight of the third type of loss value can be Weights less than or equal to the first class loss value.

因此在本公开实施例中，可以同时计算第一类损失值至第三类损失值，得到最终的图像损失值，然后最终结合图像损失值更新第一模型的模型参数。Therefore, in the embodiment of the present disclosure, the first type loss value to the third type loss value can be calculated simultaneously to obtain the final image loss value, and then finally update the model parameters of the first model in combination with the image loss value.

在计算得到损失值小于损失阈值，或者得到最小图像损失值时，可以确定第一模型完成模型训练，当前第一模型的模型参数即为训练之后的目标参数，而采集目标参数的第一模型即为能够前述任意图像处理方法的图像处理模型。When the calculated loss value is less than the loss threshold, or the minimum image loss value is obtained, it can be determined that the first model has completed the model training, the current model parameters of the first model are the target parameters after training, and the first model that collects the target parameters is is an image processing model capable of any of the aforementioned image processing methods.

在实际生产生活中，图像的传递采集系统和传输过程都相对较为复杂，成像设备在图像采集的过程中极易受到周围环境的影响，如由于光照环境或物体表面反光带来的整体光照分布不均匀以及机械设备会自动的采集噪声等。且图像传输过程中的图像压缩编码、图像存储及通信的各个方面都对图像质量带来了一定程度的伤害，使得最终显示设备上的图像无法表示真实情况下的自然场景，丢失了亮度、对比度、颜色等信息，不仅不符合人眼视觉感知效果，也不符合实际生产的需求。In actual production and life, the image transmission and acquisition system and transmission process are relatively complex, and the imaging device is easily affected by the surrounding environment during the image acquisition process, such as the overall light distribution caused by the lighting environment or the reflection of the object surface. Uniform and mechanical equipment will automatically collect noise and so on. In addition, image compression coding, image storage and communication in the process of image transmission have brought a certain degree of damage to the image quality, making the image on the final display device unable to represent the natural scene under real conditions, losing brightness and contrast , color and other information, not only does not meet the visual perception effect of the human eye, but also does not meet the actual production needs.

如在室外环境进行图像采集时，过于强的太阳光会给最终的成像带来极大影响，导致图像丢失对比度，产生曝光度问题。For example, when image acquisition is performed in an outdoor environment, too strong sunlight will have a great impact on the final imaging, resulting in image loss of contrast and exposure problems.

在监控系统中，采集到的视频图像往往含有较多噪声，且图像照度低，不便于在夜间快速直接的进行犯罪分子行为的跟踪和监控。In the monitoring system, the collected video images often contain a lot of noise, and the image illumination is low, which is not convenient for fast and direct tracking and monitoring of criminal behavior at night.

在图像设计专业领域中，图像传输系统会产生一些色彩偏差，使得最终展示的图像不符合创作者意图。In the professional field of image design, the image transmission system will produce some color deviation, so that the final displayed image does not meet the creator's intention.

除此之外，由各种环境和设备原因带来的图像质量不佳的问题也会直接影响到军事侦察、自动驾驶、遥感成像等系统的性能。In addition, the problem of poor image quality caused by various environmental and equipment reasons will also directly affect the performance of military reconnaissance, automatic driving, remote sensing imaging and other systems.

本公开针对成像和传输过程中可能产生的上述质量问题，提出了一个基于深度学习的图像质量增强方法，给定一张有噪声、色偏和曝光度问题的输入图像，最终恢复出不含噪声、高对比度和有正确颜色表现的图像。Aiming at the above-mentioned quality problems that may arise during imaging and transmission, this disclosure proposes an image quality enhancement method based on deep learning, given an input image with noise, color cast and exposure problems, and finally restores a noise-free image , images with high contrast and correct color representation.

基于深度学习的画质增强算法在精度和速度上均取得了显著进步，但是目前的画质增强算法仍然存在一些问题，具体可如下：The image quality enhancement algorithm based on deep learning has made significant progress in both accuracy and speed, but there are still some problems in the current image quality enhancement algorithm, as follows:

第一：多数网络根据不同场景设计不同的网络结构，缺乏统一通用的网络框架解决图像曝光调节、颜色校正、图像去噪等画质增强问题。First: Most networks design different network structures according to different scenarios, and lack a unified and general network framework to solve image quality enhancement problems such as image exposure adjustment, color correction, and image denoising.

第二：图像由于上采样和下采样的网络结构导致信息丢失从而无法进行高分辨率的重建。Second: The image cannot be reconstructed with high resolution due to the loss of information due to the network structure of upsampling and downsampling.

第三：神经网络在对图像的处理过程中缺乏完备的特征提取模块。Third: The neural network lacks a complete feature extraction module in the process of image processing.

本公开实施例基于编解码的网络结构，提出了一种采用多尺度下采样模块和混合注意力策略的图像增强算法，多尺度的特征提取使得网络获取的图像信息更加丰富，而混合注意力模块在网络上采样过程中同时融合全局注意力和局部注意力，使得图像的全局信息、局部信息和细节纹理信息都得到不同程度的恢复和重建。Based on the network structure of encoding and decoding, the embodiment of the present disclosure proposes an image enhancement algorithm using a multi-scale downsampling module and a mixed attention strategy. Multi-scale feature extraction makes the image information acquired by the network richer, and the mixed attention module In the process of network upsampling, the global attention and local attention are fused simultaneously, so that the global information, local information and detailed texture information of the image are restored and reconstructed to varying degrees.

基于编解码器的结构搭建了一个端到端的画质增强网络(图像处理模型)，并提出多尺度特征提取和混合注意力的策略方案对网络结构进行改进。Based on the codec structure, an end-to-end image quality enhancement network (image processing model) is built, and a multi-scale feature extraction and mixed attention strategy is proposed to improve the network structure.

针对深度学习网络在特征提取过程中的不完备问题，在基础网络中加入全局特征增强子网络，采取不同尺寸的卷积核进行特征抽取的方式，融合各尺度的特征进行传递，并根据不同尺寸卷积核的特征提取能力确定最终的卷积核大小。In view of the incompleteness of the deep learning network in the feature extraction process, a global feature enhancement sub-network is added to the basic network, and convolution kernels of different sizes are used for feature extraction, and features of various scales are fused for transmission, and according to different sizes The feature extraction capability of the convolution kernel determines the final convolution kernel size.

针对图像恢复过程中层次和细节不够明显以及光晕问题，采取了本公开实施例提出的混合注意力策略。在跳转连接(skip-connection)结构中融入全局注意力，使用高级特征产生权重赋予低级各通道不同的注意力，使得最终重建出的图像细节丰富，层次明显。在图像上采样时融入局部注意力，针对解码过程中的高级特征产生一组自注意力权重引导其自身的细节恢复和重建。Aiming at the lack of obvious layers and details and halo problems in the process of image restoration, the hybrid attention strategy proposed by the embodiments of the present disclosure is adopted. Incorporate global attention into the skip-connection structure, and use high-level features to generate weights to give different attention to low-level channels, so that the final reconstructed image has rich details and obvious layers. Incorporate local attention when upsampling images, and generate a set of self-attention weights for high-level features in the decoding process to guide its own detail recovery and reconstruction.

考虑到L2范数函数作为网络损失函数导致最终生成图像过于平滑的问题，本公开实施例在L2损失函数的基础上联合使用了基于VGG网络的感知损失函数，在更高维的特征空间对图像的语义信息进行约束，确保网络输出图像具有更好的视觉效果。Considering that the L2 norm function as a network loss function leads to the problem that the final generated image is too smooth, the embodiment of the present disclosure jointly uses the perceptual loss function based on the VGG network on the basis of the L2 loss function, and performs image processing in a higher-dimensional feature space. The semantic information of the network is constrained to ensure that the network output image has a better visual effect.

如图2所示，整个网络输入为一张RGB图像，首先经过一个卷积层进行基础特征的提取，然后经过五个多尺度特征提取模块进行多尺度的特征提取和跨尺度的特征融合。As shown in Figure 2, the input of the entire network is an RGB image. First, a convolutional layer is used to extract basic features, and then five multi-scale feature extraction modules are used to perform multi-scale feature extraction and cross-scale feature fusion.

当特征图尺寸较小时，再通过全局特征提取模块，在输入特征图的基础上聚合全局上下文信息向后进行传递，编码后的特征图经过五个混合注意力上采样网络进行解码，特征图最终恢复到图像原尺寸大小，完成图像的重建，再经过一个卷积层输出一张三通道图像，即为最终所得图像。When the size of the feature map is small, the global feature extraction module aggregates the global context information on the basis of the input feature map and transmits it backwards. The encoded feature map is decoded by five mixed attention upsampling networks, and the feature map is finally Return to the original size of the image, complete the reconstruction of the image, and then output a three-channel image through a convolutional layer, which is the final image.

多尺度特征提取模块包括两个步骤：The multi-scale feature extraction module consists of two steps:

多尺度特征提取和跨尺度特征融合。Multi-scale feature extraction and cross-scale feature fusion.

多尺度特征提取过程中，同一维度下的特征图分别经过两种尺寸的卷积核同时进行特征提取和下采样处理，3×3大小的卷积核用于提取图像局部的颜色、纹理等信息，1×1的卷积核则用于保持图像的内容信息，在下采样的过程中实现更加精细化的处理，确保图像的各个像素都能得到唯一且准确的映射。In the process of multi-scale feature extraction, the feature map in the same dimension is subjected to feature extraction and downsampling processing through convolution kernels of two sizes at the same time, and the convolution kernel of 3×3 size is used to extract local color, texture and other information of the image. , the 1×1 convolution kernel is used to maintain the content information of the image, and achieve more refined processing in the process of downsampling to ensure that each pixel of the image can be uniquely and accurately mapped.

与其他多尺度策略不同的是，本公开实施例在跨尺度特征融合时采用逐像素相加的方式而非级联，此操作可以避免通道数的改变，减小了计算量，加速了网络收敛。Different from other multi-scale strategies, the embodiment of the present disclosure adopts pixel-by-pixel addition instead of cascading in cross-scale feature fusion. This operation can avoid changes in the number of channels, reduce the amount of calculation, and accelerate network convergence. .

注意力机制通过对特征通道之间的依赖性和相关性进行建模，根据每个通道信息的重要程度重新缩放每个通道的特征，这种机制允许网络模型更加专注于有效的通道信息，提高网络在不同任务中的性能。本公开中的混合注意力上采样模块基于局部注意力模块和全局注意力模块实现，混合注意力的使用使得网络在上采样的过程中兼顾图像全局信息和局部信息的恢复。The attention mechanism rescales the features of each channel according to the importance of each channel information by modeling the dependencies and correlations between feature channels. This mechanism allows the network model to focus more on effective channel information and improve Performance of the network on different tasks. The mixed attention upsampling module in this disclosure is implemented based on the local attention module and the global attention module. The use of the mixed attention enables the network to take into account the recovery of both global image information and local information during the upsampling process.

混合注意力上采样模块的实现过程如图3所示，该模块的上半部分为高级特征产生注意力引导低级特征传输的过程，下半部分表示高级特征产生注意力引导自身进行上采样重建的过程，最终两部分特征逐像素相加进行融合，输出更高分辨率的高级特征继续向后传递。如图2所示，高级特征首先经过一个全局池化产生2C个1×1大小的特征图，该特征图表征高级特征各种信息的丰富程度，再分别产生全局注意力和局部注意力。The implementation process of the mixed attention upsampling module is shown in Figure 3. The upper part of the module is the process of high-level features generating attention to guide the transmission of low-level features, and the lower part represents the process of high-level features generating attention and guiding itself to perform upsampling reconstruction. process, and finally the two parts of features are added pixel by pixel for fusion, and the output of higher-resolution advanced features continues to be passed backwards. As shown in Figure 2, the advanced features first undergo a global pooling to generate 2C feature maps of 1×1 size, which represent the richness of various information of the advanced features, and then generate global attention and local attention respectively.

首先向上经过一个全连接层得到一组权重，该权重与经过一个卷积层的低级特征按通道相乘产生带有全局注意的特征图F′_k。同时，2C个1×1大小的特征图再向右经过一个全连接层得到另外一组自注意力权重，这组特征与高级特征相乘产生带有局部注意的特征图R′_m-k-1。到此为止，全局注意力和局部注意力运算全部结束。最后对带局部注意力的高级特征R′_m-k-1进行上采样操作，再融合带全局注意的低级特征图产生含有混合注意力的更高级特征R′_m-k，First pass through a fully connected layer to get a set of weights, which are multiplied by channels with the low-level features passed through a convolutional layer to generate a feature map F′ _k with global attention. At the same time, 2C feature maps of 1×1 size pass through a fully connected layer to the right to obtain another set of self-attention weights. This set of features is multiplied by advanced features to generate a feature map R′ _mk-1 with local attention. So far, the global attention and local attention operations are all over. Finally, the high-level feature R′ _mk-1 with local attention is upsampled, and then the low-level feature map with global attention is fused to generate a higher-level feature R′ _mk with mixed attention.

普通的图像画质增强网络多采用两张图像之间的平均绝对误差(MAE)或者均方误差(MSE)作为损失函数来优化网络参数。Ordinary image quality enhancement networks mostly use the mean absolute error (MAE) or mean square error (MSE) between two images as a loss function to optimize network parameters.

MAE对两张图像之间每个像素差的绝对值求和，而MSE则是对每个像素差的平方求和。MAE sums the absolute value of each pixel difference between two images, while MSE sums the square of each pixel difference.

与MAE相比，MSE因为像素差平方的操作，会放大较大误差和较小误差之间的差距，该损失函数对较大误差的惩罚力度更大，对较小误差则更为容忍。Compared with MAE, MSE will amplify the gap between larger errors and smaller errors due to the operation of the square of the pixel difference. The loss function is more punitive for larger errors and more tolerant of smaller errors.

本公开实施例中使用L2损失函数即MSE作为网络的内容损失函数，而L2损失函数在反向传递的过程中，更加倾向于约束图像的全局信息分布，最终得到的解是全局最优解，因此会导致生成图像变得模糊。因此在L2损失函数的基础上联合使用了基于VGG预训练模型的感知损失函数，基于高维度的感知损失函数能够产生高质量的图像，感知损失函数计算过程可如图7所示。In the embodiment of the present disclosure, the L2 loss function, that is, MSE, is used as the content loss function of the network, and the L2 loss function is more inclined to constrain the global information distribution of the image in the process of reverse transmission, and the final solution is the global optimal solution. As a result, the resulting image becomes blurred. Therefore, on the basis of the L2 loss function, the perceptual loss function based on the VGG pre-training model is jointly used. The high-dimensional perceptual loss function can produce high-quality images. The calculation process of the perceptual loss function can be shown in Figure 7.

感知损失的计算基于VGG特征提取网络的每一个Relu层输出，这种方式在特征图的各个维度上对生成图像和真值图像计算差值，对不同层次的特征信息进行了约束重建。感知损失函数的计算过程如下：The calculation of perceptual loss is based on the output of each Relu layer of the VGG feature extraction network. In this way, the difference between the generated image and the real image is calculated in each dimension of the feature map, and the feature information of different levels is constrained to be reconstructed. The calculation process of the perceptual loss function is as follows:

将生成图像f_i和真值图像gⁱ分别通过训练好的VGG16网络，并获取模型各层的特征图Φ_i(i＝1，2.3，4.5)，即对应relu1_2，reluu2_2，relu3_3，relu4_3，relu5_3层的输出，对此输出分别进行像素间平均绝对误差的计算。Pass the generated image f _i and the true value image g ⁱ through the trained VGG16 network respectively, and obtain the feature map Φ _i (i=1, 2.3, 4.5) of each layer of the model, that is, corresponding to relu1_2, reluu2_2, relu3_3, relu4_3, relu5_3 The output of the layer, the average absolute error between pixels is calculated for this output.

a_i为对每层网络输出计算损失的权重，该组权重对各个层输出的误差进行缩放，确保不同重要的特征图得到不同程度的像素差计算。a _i is the weight for calculating the loss of each layer of network output. This set of weights scales the error of each layer output to ensure that different important feature maps get different degrees of pixel difference calculation.

由于VGG网络进行特征提取时中间层的特征信息同时涵盖纹理和内容信息，对整个网络的优化方向影响较大，因此在进行实验时我们将中间两层的权重等比例缩小确定最后的权重比值。Since the feature information of the middle layer covers both texture and content information when the VGG network performs feature extraction, it has a great influence on the optimization direction of the entire network. Therefore, when conducting experiments, we scale down the weights of the middle two layers to determine the final weight ratio.

整个画质增强网络流程如图9所示，输入图像依次经过特征提取模块和特征重建模块产生一张输出图像；The entire image quality enhancement network process is shown in Figure 9. The input image is sequentially passed through the feature extraction module and the feature reconstruction module to generate an output image;

再分别计算输出图像与真值图像之间的内容损失、感知损失和全变分损失，再以降低此三种损失的方式优化网络特征提取模块和特征重建模块的参数，以使得下一次网络迭代后输出更加接近于真值图的生成图像。Then calculate the content loss, perceptual loss and total variation loss between the output image and the real value image respectively, and then optimize the parameters of the network feature extraction module and feature reconstruction module by reducing these three losses, so that the next network iteration The final output is a generated image that is closer to the ground truth map.

网络最终的优化目标是使得生成图像与真值图像尽可能保持一致。此处的内容损失可为前述第二类损失值的一种；所述感知损失可为前述第一类损失值的一种；所述全变分损失可为前述第三类损失值。The ultimate optimization goal of the network is to make the generated image as consistent as possible with the ground truth image. Here, the content loss may be one of the aforementioned second type of loss values; the perceptual loss may be one of the aforementioned first type of loss values; the total variation loss may be the aforementioned third type of loss values.

本公开实施例中的画质增强网络基于Pytorch深度学习框架搭建，实验使用了一块GFORCE RTX 2080Ti GPU用于模型的训练和测试。The image quality enhancement network in the embodiment of the disclosure is built based on the Pytorch deep learning framework, and a GFORCE RTX 2080Ti GPU is used in the experiment for model training and testing.

在以上硬件和软件环境下分别进行图像曝光度调节、颜色校正、图像去噪等三个画质增强任务实验，训练次数设置为200个阶段(epoch)，训练过程中批量大小(batchsize)设置为4，测试过程批量大小(batchsize)设置为1。In the above hardware and software environments, three image quality enhancement tasks, such as image exposure adjustment, color correction, and image denoising, were tested. The number of training sessions was set to 200 epochs, and the batch size (batchsize) during the training process was set to 4. The test process batch size (batchsize) is set to 1.

训练过程采用L2范数损失函数和感知损失函数交替使用的方式，前100轮使用L2范数损失和感知损失共同进行网络参数优化，后100轮单独使用L2损失函数进行网络参数微调。即在前述图像处理模型的训练中，一个阶段的训练可包括：N次训练中前一部分训练同时结合第一类损失值和第二类损失值更新第一模型的模型参数，在N次训练之后的另一部分训练，可以仅仅根据第二类损失值更新第一模型的模型参数，从而减少模型训练的开销，并提升模型训练效率。In the training process, the L2 norm loss function and the perceptual loss function are used alternately. The first 100 rounds use the L2 norm loss and the perceptual loss to optimize the network parameters, and the last 100 rounds use the L2 loss function alone to fine-tune the network parameters. That is, in the training of the aforementioned image processing model, the training of one stage may include: the first part of the training in the N trainings simultaneously combines the first type loss value and the second type loss value to update the model parameters of the first model, and after N training times Another part of the training can only update the model parameters of the first model according to the second type of loss value, thereby reducing the overhead of model training and improving the efficiency of model training.

网络学习过程中使用Adam优化器，学习率设置为0.00001，网络的参数以较小的步长进行迭代优化，确保网络可以达到更加精细的图像细节重建。其中，Adam算法的优化步骤如表2所示。The Adam optimizer is used in the network learning process, the learning rate is set to 0.00001, and the parameters of the network are iteratively optimized with a smaller step size to ensure that the network can achieve finer image detail reconstruction. Among them, the optimization steps of the Adam algorithm are shown in Table 2.

表2Table 2

本公开实施例分别针对多尺度特征提取模块、混合注意力上采样模块设计了消融实验，实验结果表明，多尺度特征提取模块使得整个网络特征提取部分就获取到较为完备的特征信息，1×1卷积核和3×3卷积核的联合使用也使得最终的重建结果更加精细，图像清晰度较高。混合注意力上采样网络中利用高级特征产生的注意力权重“有目的”的引导低级特征的传输和高级特征自身的分辨率恢复，使得图像中信息量较大的感兴趣区域得到增强，冗余信息得到抑制。同时，感知损失的加入在更高维空间对图像特征进行约束，使得网络输出更符合人眼特性的图像。The embodiments of the disclosure design ablation experiments for the multi-scale feature extraction module and the mixed attention upsampling module respectively. The experimental results show that the multi-scale feature extraction module enables the entire network feature extraction part to obtain relatively complete feature information, 1×1 The joint use of the convolution kernel and the 3×3 convolution kernel also makes the final reconstruction result more refined and the image definition is higher. The attention weight generated by the high-level features in the mixed attention upsampling network "purposely" guides the transmission of low-level features and the resolution recovery of high-level features themselves, so that the region of interest with a large amount of information in the image is enhanced and redundant Information is suppressed. At the same time, the addition of perceptual loss constrains the image features in a higher-dimensional space, making the network output an image that is more in line with the characteristics of the human eye.

图像处理效果比对可如图10A至图10C所示。The comparison of image processing effects can be shown in FIG. 10A to FIG. 10C .

图10A至图10C中的图片(a)都是需要模型处理的原始图像；图片(b)都是目标效果对应的目标图像；图片(c)至(g)都是其他模型对原始图像处理之后得到的图像；图片(h)为本公开实施例提供的图像处理模型处理之后得到第二图像。The pictures (a) in Figure 10A to Figure 10C are all original images that need to be processed by the model; the picture (b) is the target image corresponding to the target effect; the pictures (c) to (g) are all other models after the original image is processed The obtained image; Picture (h) is the second image obtained after processing by the image processing model provided by the embodiment of the present disclosure.

图10A中图片(a)至(h)是去噪之后的图片效果。Pictures (a) to (h) in FIG. 10A are picture effects after denoising.

图10B中图片(a)至(h)是图像细节有处理的图片效果。Pictures (a) to (h) in FIG. 10B are pictures with image details processed.

图10C中图片(a)至(h)是照度调整的图片效果。Pictures (a) to (h) in FIG. 10C are picture effects of illuminance adjustment.

如图11所示，本公开实施例提供一种图像处理装置，所述装置包括：As shown in FIG. 11 , an embodiment of the present disclosure provides an image processing device, and the device includes:

编码模块110，用于对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图；其中，所述N为等于或大于2的正整数；The encoding module 110 is configured to perform multi-scale feature encoding on the first image to obtain N feature maps of different sizes; wherein, the N is a positive integer equal to or greater than 2;

转换模块120，用于对第n解码图进行局部注意力机制处理之后进行上采样，得到第n上采样图；其中，所述n为小于所述N的正整数；第0解码图是根据第N-1特征图确定的；所述第N-1特征图为N个不同尺寸的特征图中最后一个产生的特征图；The conversion module 120 is configured to perform upsampling on the nth decoded image after local attention mechanism processing to obtain the nth upsampled image; wherein, the n is a positive integer smaller than the N; the 0th decoded image is based on the The N-1 feature map is determined; the N-1th feature map is the last feature map generated in the N feature maps of different sizes;

融合模块130，用于基于全局注意力机制对将N-n特征图和所述第n上采样图融合处理，得到第n+1解码图；The fusion module 130 is used to fuse the N-n feature map and the nth upsampled image based on the global attention mechanism to obtain the n+1th decoded image;

得到模块140，用于基于第N-1解码图得到第二图像，其中，所述第二图像的图像内容与所述第一图像的图像内容相同，且所述第二图像的图像质量高于所述第一图像的图像质量。Obtaining module 140, configured to obtain a second image based on the N-1th decoded image, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than The image quality of the first image.

在一些实施例中，所述编码模块110、转换模块120、融合模块130及得到模块140可为程序模块；所述程序模块被处理器执行之后，能够实现上述功能。In some embodiments, the encoding module 110 , the conversion module 120 , the fusion module 130 and the obtaining module 140 may be program modules; after the program modules are executed by the processor, the above functions can be realized.

在还有一些实施例中，所述编码模块110、转换模块120、融合模块130及得到模块140均可为软硬结合模块；所述软硬结合模块包括但不限于各种可编程阵列；所述可编程阵列包括但不限于：现场可编程阵列和/或复杂可编程阵列。In some other embodiments, the encoding module 110, the conversion module 120, the fusion module 130 and the obtaining module 140 can all be soft and hard combination modules; the soft and hard combination modules include but are not limited to various programmable arrays; The aforementioned programmable arrays include, but are not limited to: Field Programmable Arrays and/or Complex Programmable Arrays.

在另外一些实施例中，所述编码模块110、转换模块120、融合模块130及得到模块140均可为纯硬件模块；所述纯硬件模块包括但不限于：专用集成电路。In some other embodiments, the encoding module 110 , the conversion module 120 , the fusion module 130 and the obtaining module 140 can all be pure hardware modules; the pure hardware modules include but are not limited to: application specific integrated circuits.

在一些实施例中，所述编码模块110，具体用于对所述第一图像进行卷积，得到第0特征图；对第m特征图进行多尺度特征提取得到第m+1特征图，其中，所述m为小于或等于所述N-1的正整数或0。In some embodiments, the encoding module 110 is specifically configured to perform convolution on the first image to obtain the 0th feature map; perform multi-scale feature extraction on the mth feature map to obtain the m+1th feature map, wherein , the m is a positive integer less than or equal to the N-1 or 0.

在一些实施例中，所述编码模块110，具体用于使用不同尺度的卷积核对所述第m特征图进行卷积，得到相同尺寸的多个第一类中间特征图；融合多个所述第一类中间特征图，得到所述第m+1特征图。In some embodiments, the encoding module 110 is specifically configured to use convolution kernels of different scales to convolve the mth feature map to obtain multiple first-type intermediate feature maps of the same size; The first type of intermediate feature map is used to obtain the m+1th feature map.

在一些实施例中，所述转换模块120，具体用于基于所述第n解码图，生成所述局部注意力机制的第一权重；基于所述第一权重对所述第n解码图进行加权处理，得到第二类中间特征图；对所述第二类中间特征图进行上采样处理，得到所述第n上采样图。In some embodiments, the conversion module 120 is specifically configured to generate a first weight of the local attention mechanism based on the nth decoded image; and weight the nth decoded image based on the first weight processing to obtain the second type of intermediate feature map; performing upsampling processing on the second type of intermediate feature map to obtain the nth upsampling map.

在一些实施例中，所述融合模块130，具体用于根据所述第n解码图，生成所述全局注意力机制的第二权重；根据所述第二权重对所述第N-n特征图进行加权处理，得到第三类中间特征图；融合所述第三类中间特征图与所述第n上采样图，得到所述第n+1解码图。In some embodiments, the fusion module 130 is specifically configured to generate a second weight of the global attention mechanism according to the nth decoded image; and weight the N-nth feature map according to the second weight processing to obtain a third type of intermediate feature map; fusing the third type of intermediate feature map with the nth upsampling image to obtain the n+1th decoding image.

本公开实施例提供一种图像处理模型训练装置，包括：An embodiment of the present disclosure provides an image processing model training device, including:

训练模块，用于训练执行前述任意技术方案提供的图像处理方法的图像处理模型。The training module is used to train an image processing model that executes the image processing method provided by any of the aforementioned technical solutions.

在一些实施例中，所述训练模块可为程序模块；所述程序模块被处理器执行之后，能够实现图像处理模型的训练功能。In some embodiments, the training module may be a program module; after the program module is executed by the processor, it can realize the training function of the image processing model.

在还有一些实施例中，所述训练模块均可为软硬结合模块；所述软硬结合模块包括但不限于各种可编程阵列；所述可编程阵列包括但不限于：现场可编程阵列和/或复杂可编程阵列。In some other embodiments, the training module can be a combination of hardware and software; the combination of hardware and software includes, but is not limited to, various programmable arrays; the programmable arrays include, but are not limited to: field programmable arrays and/or complex programmable arrays.

在另外一些实施例中，所述训练模块可为纯硬件模块；所述纯硬件模块包括但不限于：专用集成电路。In some other embodiments, the training module may be a pure hardware module; the pure hardware module includes but not limited to: an application specific integrated circuit.

在一些实施例中，如图12所示，所述训练模块，包括：In some embodiments, as shown in Figure 12, the training module includes:

预测单元210，用于将样本图像输入到第一模型，得到所述第一模型输出的预测图像；A prediction unit 210, configured to input the sample image into the first model, and obtain a predicted image output by the first model;

第一输入单元220，用于将所述预测图像输入到第二模型，得到第二模型输出的M个第一特征图；其中，所述M为大于或等于2的正整数；The first input unit 220 is configured to input the predicted image to the second model to obtain M first feature maps output by the second model; wherein, the M is a positive integer greater than or equal to 2;

第二输入单元230，用于将所述样本图像对应的目标图像输入到所述第二模型，得到所述第二模型输出M个第二特征图；The second input unit 230 is configured to input the target image corresponding to the sample image into the second model, and obtain M second feature maps output by the second model;

第一损失单元240，用于根据第p个所述第一特征图和第p个所述第二特征图之间的差异，得到第一类损失值，其中，所述p为小于或等于所述M的正整数；The first loss unit 240 is configured to obtain a first-type loss value according to the difference between the p-th first feature map and the p-th second feature map, wherein the p is less than or equal to the A positive integer of M;

调整单元250，用于基于所述第一类损失值，调整所述第一模型的模型参数。The adjustment unit 250 is configured to adjust model parameters of the first model based on the first type of loss value.

所述调整单元250，具体用于根据所述第一类损失值和所述第二类损失值，调整所述第一模型的模型参数。The adjusting unit 250 is specifically configured to adjust model parameters of the first model according to the first type of loss value and the second type of loss value.

所述调整单元250，用于根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值；基于所述图像损失值，调整所述第一模型的模型参数。The adjustment unit 250 is configured to obtain an image loss value according to the first type loss value, the second type loss value, and the third type loss value; based on the image loss value, adjust the first The model parameters for the model.

本公开实施例提供一种电子设备，包括：An embodiment of the present disclosure provides an electronic device, including:

处理器，与存储器连接；a processor connected to the memory;

其中，处理器被配置为执行前述任意技术方案提供的图像处理方法或者图像处理模型训练方法。Wherein, the processor is configured to execute the image processing method or the image processing model training method provided by any of the foregoing technical solutions.

处理器可包括各种类型的存储介质，该存储介质为非临时性计算机存储介质，在通信设备掉电之后能够继续记忆存储其上的信息。The processor may include various types of storage media, which are non-transitory computer storage media, and can continue to memorize and store information thereon after the communication device is powered off.

处理器可以通过总线等与存储器连接，用于读取存储器上存储的可执行程序，例如，能够执行如图1、图4、图6至图8任意一个所示图像处理方法和图像处理模型训练方法的至少其中之一。The processor can be connected to the memory through a bus, etc., and is used to read the executable program stored on the memory, for example, it can execute the image processing method and image processing model training shown in any one of Figures 1, 4, 6 to 8 at least one of the methods.

该电子设备可为前述终端设备和/或服务器等。The electronic device may be the aforementioned terminal device and/or server.

图13是根据一示例性实施例示出的一种终端设备800的框图。例如，终端设备800可以是移动电话、移动电脑等终端设备或者智能家居设备或者智能办公设备。Fig. 13 is a block diagram showing a terminal device 800 according to an exemplary embodiment. For example, the terminal device 800 may be a terminal device such as a mobile phone or a mobile computer, or a smart home device or a smart office device.

参照图13，终端设备800可以包括以下一个或多个组件：处理组件802，存储器804，电源组件806，多媒体组件808，多媒体数据组件810，输入/输出(I/O)的接口812，传感器组件814，以及通信组件816。Referring to Fig. 13, the terminal device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, a multimedia data component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

处理组件802通常控制终端设备800的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件802可以包括一个或多个模块，便于处理组件802和其他组件之间的交互。例如，处理组件802可以包括多媒体模块，以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the terminal device 800, such as operations associated with display, telephone calls, data communication, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

存储器804被配置为存储各种类型的数据以支持在设备800的操作。这些数据的示例包括用于在终端设备800上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。The memory 804 is configured to store various types of data to support operations at the device 800 . Examples of such data include instructions for any application or method operating on the terminal device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电源组件806为终端设备800的各种组件提供电力。电力组件806可以包括电源管理系统，一个或多个电源，及其他与为终端设备800生成、管理和分配电力相关联的组件。The power supply component 806 provides power to various components of the terminal device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for end device 800 .

多媒体组件808包括在终端设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件808包括一个前置摄像头和/或后置摄像头。当设备800处于操作状态，如拍摄状态或视频状态时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 808 includes a screen providing an output interface between the terminal device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or a swipe action, but also detect duration and pressure associated with the touch or swipe operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operating state, such as a shooting state or a video state, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

多媒体数据组件810被配置为输出和/或输入多媒体数据信号。例如，多媒体数据组件810包括一个麦克风(MIC)，当终端设备800处于操作状态，如呼叫状态、记录状态和语音识别状态时，麦克风被配置为接收外部多媒体数据信号。所接收的多媒体数据信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中，多媒体数据组件810还包括一个扬声器，用于输出多媒体数据信号。The multimedia data component 810 is configured to output and/or input multimedia data signals. For example, the multimedia data component 810 includes a microphone (MIC), which is configured to receive external multimedia data signals when the terminal device 800 is in an operating state, such as a calling state, a recording state, and a voice recognition state. The received multimedia data signals may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, the multimedia data component 810 further includes a speaker for outputting multimedia data signals.

I/O接口812为处理组件802和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

传感器组件814包括一个或多个传感器，用于为终端设备800提供各个方面的状态评估。例如，传感器组件814可以检测到设备800的打开/关闭状态，组件的相对定位，例如组件为终端设备800的显示器和小键盘，传感器组件814还可以检测终端设备800或终端设备800一个组件的位置改变，用户与终端设备800接触的存在或不存在，终端设备800方位或加速/减速和终端设备800的温度变化。传感器组件814可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件814还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。The sensor component 814 includes one or more sensors for providing status assessment of various aspects of the terminal device 800 . For example, the sensor component 814 can detect the open/closed state of the device 800, the relative positioning of components, for example, the components are the display and the keypad of the terminal device 800, and the sensor component 814 can also detect the position of the terminal device 800 or a component of the terminal device 800 changes, the presence or absence of user contact with the terminal device 800 , the orientation or acceleration/deceleration of the terminal device 800 and the temperature change of the terminal device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信组件816被配置为便于终端设备800和其他设备之间有线或无线方式的通信。终端设备800可以接入基于通信标准的无线网络，如Wi-Fi，2G或3G，或它们的组合。在一个示例性实施例中，通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，通信组件816还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communications between the terminal device 800 and other devices. The terminal device 800 can access a wireless network based on communication standards, such as Wi-Fi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，装置800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, apparatus 800 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器804，上述指令可由装置800的处理器820执行以完成上述方法。例如，非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the device 800 to implement the above method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

如图14所示，本公开一实施例示出一种服务器的结构。参照图14，服务器900包括处理组件922，其进一步包括一个或多个处理器，以及由存储器932所代表的存储器资源，用于存储可由处理组件922的执行的指令，例如应用程序。存储器932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外，处理组件922被配置为执行指令，以执行上述方法前述图像处理方法，例如，如图1、图4、图6至图8任一项所示的图像处理方法和/或图像处理模型训练方法。As shown in FIG. 14 , an embodiment of the present disclosure shows a structure of a server. Referring to FIG. 14 , server 900 includes processing component 922 , which further includes one or more processors, and a memory resource represented by memory 932 for storing instructions executable by processing component 922 , such as application programs. The application program stored in memory 932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 922 is configured to execute instructions to perform the aforementioned method and the aforementioned image processing method, for example, the image processing method and/or image processing model training shown in any one of Figures 1, 4, 6 to 8 method.

服务器900还可以包括一个电源组件926被配置为执行服务器900的电源管理，一个有线或无线网络接口950被配置为将服务器900连接到网络，和一个输入输出(I/O)接口958。服务器900可以操作基于存储在存储器932的操作系统，例如Windows Server TM，MacOS XTM，UnixTM，LinuxTM，FreeBSDTM或类似。Server 900 may also include a power component 926 configured to perform power management of server 900 , a wired or wireless network interface 950 configured to connect server 900 to a network, and an input-output (I/O) interface 958 . The server 900 may operate based on an operating system stored in the memory 932, such as Windows Server™, MacOS X™, Unix™, Linux™, FreeBSD™ or the like.

本公开实施例提供一种计算机存储介质，该计算存储介质可为非临时性计算机可读存储介质，当存储介质中的指令由服务器或终端的处理器执行时，使得供电设备能够执行前述任意实施例提供的图像处理方法和/或图像处理模型训练方法，能够执行如图1至图4任意一个所示方法的至少其中之一。An embodiment of the present disclosure provides a computer storage medium, which may be a non-transitory computer-readable storage medium. When the instructions in the storage medium are executed by the processor of the server or the terminal, the power supply device can perform any of the aforementioned implementations. The image processing method and/or image processing model training method provided by the example can execute at least one of the methods shown in any one of Fig. 1 to Fig. 4 .

所述计算机存储介质中存储的指令被执行之后，UE能够实现图像处理方法和/或图像处理模型训练方法。After the instructions stored in the computer storage medium are executed, the UE can implement an image processing method and/or an image processing model training method.

图像处理方法可包括：Image processing methods may include:

图像处理方法，其特征在于，所述方法包括：Image processing method, is characterized in that, described method comprises:

可以理解地，所述对第一图像进行多尺度特征编码，得到N个不同尺寸的特征图，包括：Understandably, the multi-scale feature encoding is performed on the first image to obtain N feature maps of different sizes, including:

基于所述第n解码图，生成所述局部注意力机制的第一权重；generating a first weight of the local attention mechanism based on the nth decoded map;

图像处理模型训练方法可包括：训练前述任意实施例提供的图像处理方法的图像处理模型。The image processing model training method may include: training the image processing model of the image processing method provided in any of the foregoing embodiments.

具体该图像处理模型训练方法可包括：Specifically, the image processing model training method may include:

可以理解地，所述图像处理模型训练方法，还包括：Understandably, the image processing model training method also includes:

所述图像处理模型训练方法，还包括：The image processing model training method also includes:

可以理解地，所述根据所述第一类损失值、所述第二类损失值以及所述第三类损失值，得到图像损失值，包括：It can be understood that the image loss value obtained according to the first type loss value, the second type loss value and the third type loss value includes:

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The present disclosure is intended to cover any modification, use or adaptation of the present disclosure. These modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, the method comprising:

performing multi-scale feature coding on the first image to obtain N feature images with different sizes; wherein N is a positive integer equal to or greater than 2;

performing local attention mechanism processing on the nth decoding graph, and then performing upsampling to obtain an nth upsampling graph; wherein N is a positive integer less than N; decoding diagram 0 is determined according to the N-1 characteristic diagram; the N-1 th feature map is the last generated feature map in N feature maps with different sizes;

the N-N characteristic diagram and the N up-sampling diagram are fused based on a global attention mechanism to obtain an n+1 decoding diagram;

and obtaining a second image based on the N-1 decoding graph, wherein the image content of the second image is the same as that of the first image, and the image quality of the second image is higher than that of the first image.

2. The method of claim 1, wherein the multi-scale feature encoding of the first image results in N feature maps of different sizes, comprising:

convolving the first image to obtain a 0 th feature map;

and carrying out multi-scale feature extraction on the m feature map to obtain an m+1 feature map, wherein m is a positive integer less than or equal to N-1 or 0.

3. The method of claim 2, wherein the multi-scale feature extraction of the mth feature map to obtain the mth+1 feature map comprises:

convolving the mth feature map by using convolution cores of different scales to obtain a plurality of first class intermediate feature maps of the same size;

and fusing a plurality of the first class intermediate feature graphs to obtain the m+1th feature graph.

4. A method according to any one of claims 1 to 3, wherein upsampling the nth decoding map after processing based on the local attention mechanism to obtain an nth upsampled map comprises:

generating a first weight of the local attention mechanism based on the nth decoding graph;

weighting the nth decoding graph based on the first weight to obtain a second class intermediate feature graph;

and carrying out up-sampling processing on the second class intermediate feature map to obtain the nth up-sampling map.

5. A method according to any one of claims 1 to 3, wherein the fusing the N-N feature map and the N up-sample map based on the global attention mechanism to obtain an n+1 decoding map comprises:

generating a second weight of the global attention mechanism according to the nth decoding diagram;

Weighting the N-N feature map according to the second weight to obtain a third class of intermediate feature map;

and fusing the third class intermediate feature map and the nth sampling map to obtain the (n+1) th decoding map.

6. An image processing model training method, comprising:

training an image processing model performing the method of any of claims 1 to 5.

7. The method according to claim 6, wherein the training performs the image processing model of the method of any one of claims 1 to 5, comprising:

inputting a sample image into a first model to obtain a predicted image output by the first model;

inputting the predicted image into a second model to obtain M first feature images output by the second model; wherein M is a positive integer greater than or equal to 2;

inputting a target image corresponding to the sample image into the second model to obtain M second feature images output by the second model;

obtaining a first type of loss value according to the difference between the p-th first characteristic diagram and the p-th second characteristic diagram, wherein p is a positive integer smaller than or equal to M;

and adjusting model parameters of the first model based on the first class loss values.

8. The method of claim 7, wherein the training performs the image processing model of the method of any one of claims 1 to 5, further comprising:

obtaining a second class of loss values according to the difference between the predicted image and the target image;

the adjusting the model parameters of the first model based on the first class of loss values includes:

and adjusting model parameters of the first model according to the first type of loss values and the second type of loss values.

9. The method of claim 8, wherein the training performs the image processing model of the method of any one of claims 1 to 5, further comprising:

obtaining a third type of loss value according to the pixel difference between the S-th pixel and the adjacent pixel of the S-th pixel in the predicted image, wherein S is a positive integer less than or equal to S; the S is the total number of pixels contained in the predicted image;

the adjusting the model parameters of the first model according to the first class of loss values and the second class of loss values comprises:

obtaining an image loss value according to the first type loss value, the second type loss value and the third type loss value;

And adjusting model parameters of the first model based on the image loss value.

10. The method of claim 9, wherein obtaining the image loss value based on the first type of loss value, the second type of loss value, and the third type of loss value comprises:

the first class loss value, the weight of the first class loss value, the second class loss value, the weight of the second class loss value, the third class loss value and the weight of the third class loss value are weighted and summed to obtain the image loss value;

wherein the weight of the first class of loss values is less than the weight of the second class of loss values;

the third class of penalty values have a weight that is less than the weight of the first class of penalty values.

11. An image processing apparatus, characterized in that the apparatus comprises:

the coding module is used for carrying out multi-scale feature coding on the first image to obtain N feature images with different sizes; wherein N is a positive integer equal to or greater than 2;

the conversion module is used for carrying out local attention mechanism processing on the nth decoding graph and then carrying out up-sampling to obtain an nth up-sampling graph; wherein N is a positive integer less than N; the 0 th decoding diagram is determined according to the N-1 th characteristic diagram; the N-1 th feature map is the last generated feature map in N feature maps with different sizes;

The fusion module is used for carrying out fusion processing on the N-N characteristic diagram and the N up-sampling diagram based on a global attention mechanism to obtain an n+1 decoding diagram;

the obtaining module is used for obtaining a second image based on the N-1 decoding graph, wherein the image content of the second image is the same as the image content of the first image, and the image quality of the second image is higher than that of the first image.

12. The apparatus of claim 11, wherein the encoding module is specifically configured to convolve the first image to obtain a 0 th feature map; and carrying out multi-scale feature extraction on the m feature map to obtain an m+1 feature map, wherein m is a positive integer less than or equal to N-1 or 0.

13. The apparatus according to claim 12, wherein the encoding module is specifically configured to use convolution kernels of different scales to convolve the mth feature map to obtain a plurality of first class intermediate feature maps of the same size; and fusing a plurality of the first class intermediate feature graphs to obtain the m+1th feature graph.

14. The apparatus according to any one of claims 11 to 13, wherein the conversion module is configured to generate a first weight of the local attention mechanism, in particular based on the nth decoding graph; weighting the nth decoding graph based on the first weight to obtain a second class intermediate feature graph; and carrying out up-sampling processing on the second class intermediate feature map to obtain the nth up-sampling map.

15. The apparatus according to any one of claims 11 to 13, wherein the fusion module is configured to generate a second weight of the global attention mechanism, in particular according to the nth decoding graph; weighting the N-N feature map according to the second weight to obtain a third class of intermediate feature map; and fusing the third class intermediate feature map and the nth sampling map to obtain the (n+1) th decoding map.

16. An image processing model training apparatus, comprising:

training module for training an image processing model for performing the method of any of claims 1 to 5.

17. The apparatus of claim 16, wherein the training module comprises:

the prediction unit is used for inputting the sample image into the first model to obtain a predicted image output by the first model;

the first input unit is used for inputting the predicted image into the second model to obtain M first feature images output by the second model; wherein M is a positive integer greater than or equal to 2;

the second input unit is used for inputting the target image corresponding to the sample image into the second model to obtain M second feature images output by the second model;

The first loss unit is used for obtaining a first type of loss value according to the difference between the p-th first characteristic diagram and the p-th second characteristic diagram, wherein p is a positive integer smaller than or equal to M;

and the adjusting unit is used for adjusting the model parameters of the first model based on the first type loss value.

18. The apparatus of claim 17, wherein the training module further comprises:

the second loss unit is used for obtaining a second class loss value according to the difference between the predicted image and the target image;

the adjusting unit is specifically configured to adjust model parameters of the first model according to the first type of loss values and the second type of loss values.

19. The apparatus of claim 18, wherein the training module further comprises:

a third loss unit, configured to obtain a third type of loss value according to a pixel difference between an S-th pixel and a pixel adjacent to the S-th pixel in the predicted image, where S is a positive integer less than or equal to S; the S is the total number of pixels contained in the predicted image;

the adjusting unit is configured to obtain an image loss value according to the first type loss value, the second type loss value, and the third type loss value; and adjusting model parameters of the first model based on the image loss value.

20. The apparatus according to claim 19, wherein the fourth loss unit is configured to perform weighted summation on the first type of loss value, the weight of the first type of loss value, the second type of loss value, the weight of the second type of loss value, the third type of loss value, and the weight of the third type of loss value to obtain the image loss value;

21. An electronic device, comprising:

a memory for storing processor-executable instructions;

a processor connected to the memory;

wherein the processor is configured to perform the method as provided in any one of claims 1 to 5 or 6 to 10.

22. A non-transitory computer readable storage medium, which when executed by a processor of a computer, enables the computer to perform the method provided in any one of claims 1 to 5 or 6 to 10.