Detailed Description
In view of the shortcomings in the prior art, the inventor of the present invention has long studied and practiced in a large number of ways to propose the technical scheme of the present invention. The technical scheme, the implementation process, the principle and the like are further explained as follows.
Specifically, HDR (High-Dynamic Range) is simply a processing technology for improving brightness and contrast of images, and can lighten details of each dark part, darken the dark part, enrich more detail colors, and enable films and pictures to show excellent effects. Let the user more closely look at the visual perception in real environment when watching, this is what HDR exists. The traditional SDR (standard contrast) has the highest brightness of only 100nit, the part higher than 100nit in the picture is distorted (lost), the lowest debugging is 0.1nit, and the part lower than 0.1nit in the picture is lost. The HDR technology is developed, the highest brightness reaches thousands nit, the lowest brightness reaches 0.0005nit, the details of the parts with the brightness higher than 100nit and lower than 0.1nit in the picture are greatly expanded, and meanwhile, the whole picture is more transparent and clear and has rich details. More abundant scene information results in higher dynamic range for HDR, but some real-world devices do not support such high dynamic range displays, require tone mapping to adjust their dynamic range, and maintain as rich a level of detail as possible. In order to enable HDR to be applied in most current dynamic range limited devices, the present invention proposes a high dynamic range image tone mapping method based on deep learning.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Moreover, relational terms such as "first" and "second", and the like, may be used solely to distinguish one from another component or method step having the same name, without necessarily requiring or implying any actual such relationship or order between such components or method steps.
Referring to fig. 1, an embodiment of the present invention provides a high dynamic range image tone mapping method based on deep learning, which includes the steps of:
s1, providing a high dynamic range image, and compressing the high dynamic range image into a low dynamic range image through global mapping.
S2, acquiring an enhanced neural network under the encoder-decoder structure, wherein the enhanced neural network takes jump connection as a connection mode.
S3, performing online training on the low dynamic range image through the enhanced neural network, and extracting an enhanced factor.
S4, performing image enhancement on the low dynamic range image through the enhancement factors to obtain a mapping image.
In the step S3, the loss function of the online training is a nonlinear loss function, and the nonlinear loss function includes a first type of loss function that reflects a difference between an image parameter and a statistical ideal value of an iterative image formed in the online training process, and a second type of loss function that reflects a difference between the iterative image and the high dynamic range image.
The prior art is mainly based on supervised learning methods for tone mapping, and therefore often requires a large number of labeled datasets for training. The technical scheme provided by the invention designs a series of non-reference loss functions by taking the contrast, intensity, saturation, hue and brightness structure of the LDR image into consideration and utilizing the statistical information of the LDR image dataset and a given HDR image based on unsupervised consideration; based on the loss function, the mapping method provided by the invention realizes tone mapping of the HDR image in an on-line training mode, no tag data is needed, various problems caused by the tag data are avoided, and unsupervised tone mapping is realized.
And in particular in some embodiments the statistical ideal may comprise, for example, average parameter information obtained from existing image dataset statistics.
As an example, some of the statistical ideal values and intensities in the first class of loss functions are based on the statistical information of a public dataset Imagenet dataset, i.e. the mean, variance, etc. of the dataset image. Of course, even if statistics are made from other common data sets, it is possible to count the image average parameters of the higher quality image data collected by itself.
In some embodiments, the first class of loss functions includes an intensity loss function and/or a contrast loss function.
In some embodiments, the second class of loss functions includes any one or a combination of two or more of hue loss functions, saturation loss functions, structure loss functions.
In some embodiments, the intensity loss function may be expressed, for example, as:
;
wherein,as strength lossFailure of value, ->For the average intensity of the color channels of the iterative image, < >>The average ideal intensity of the color channels is obtained for statistics.
In some embodiments, the contrast loss function may be expressed, for example, as:
;
wherein,for contrast loss value, +.>For the standard deviation of the color channels of the iterative image,the mean ideal standard deviation of the color channels obtained for statistics.
In some embodiments, the hue loss function may be expressed, for example, as:
;
wherein,is hue loss value, N is total pixel point, i is constant with value of 1-N, < ->For the red-green component of the high dynamic range image in IPT space, +.>For yellow Lan Fenliang of the high dynamic range image in IPT space,for the red-green component of the iterative image in IPT space, +.>For the yellow-blue component of the iterative image in IPT space.
In some embodiments, the saturation loss function may be expressed, for example, as:
;
wherein,for saturation loss value, +_>For the saturation of the ith pixel point in the high dynamic range image +.>And the saturation of the ith pixel point in the iterative image.
In some embodiments, the structural loss function may be expressed, for example, as:
;
;
;
wherein,for the structural loss value, +.>Is a multi-scale structural similarity index>For the low dynamic range image, +.>For the iterative image;Is the weight of the nth scale, sigma x 、σ y Local standard deviation, sigma, between corresponding blocks in the high dynamic range image and the iterative image, respectively xy Cross-correlation coefficients for the high dynamic range image and the iterative image>Is->Global mapped version, C 1 And C 2 Are all stable constants.
With respect to the utilization of a plurality of loss functions, in some embodiments, a plurality of the first type of loss functions and the second type of loss functions are added according to respective weights as the nonlinear loss functions. A specific embodiment may be, for example, a weighted average method, for example, sequentially giving different weights V1-V5 to the intensity loss function, the contrast loss function, the hue loss function, the saturation loss function, and the structure loss function, and then performing an addition operation to obtain a final total nonlinear loss function, so as to guide the online training.
Based on the practical experience of the inventor, in order to achieve better mapping quality and efficiency, the weight values of different loss functions are different, specifically, the weight range of the intensity loss function is generally set to 40-60, the weight range of the contrast loss function is generally set to 1-3, the weight range of the hue loss function is generally set to 80-120, the weight range of the saturation loss function is generally set to 1-3, and the weight range of the structure loss function is generally set to 1-3. Of course, scaling up and down the weight values in equal proportion is still equivalent to the range of values using the weight values.
Of course, the method is not limited to the above-mentioned range, but generally follows a similar rule of magnitude, as shown in fig. 5, the intensity loss and the contrast loss affect the brightness and the contrast, the saturation loss and the hue loss affect the true color of the image, and the structural loss results in loss of details, so based on experience, the invention summarizes and derives the weight range of each loss function applicable to the technical scheme provided by the embodiment of the invention.
While details regarding the remaining steps of the solution provided by the present invention, in some embodiments, the global mapping includes sequentially performing luminance compression and color restoration on the high dynamic range image.
The luminance compression is expressed as:
;
;
wherein,is an asymmetric parameter>For the brightness of the low dynamic range image, < >>For the brightness of the high dynamic range image, < >>For the average luminance of the high dynamic range image, +.>For the maximum brightness of the high dynamic range image,to be the instituteMinimum brightness of the high dynamic range image.
The color recovery is expressed as:
;
wherein,color channel intensity for color restored low dynamic range image, +.>Color channel intensity for high dynamic range images.
In some embodiments, the convolutional layer in the enhanced neural network adopts mixed hole convolution, the output layer in the active layer adopts sigmod as an activation function, and the rest layers adopt LeakyReLU as an activation function.
In some embodiments, the image enhancement employs pixel-level recursive enhancement expressed as:
;
wherein,for the enhancement factor, < >>For the output of the n-th step of recursion enhancement, n is the number of recursions, and when the number of recursions is 1,/is>Is the low dynamic range image input.
Of course, the manner of enhancement should be consistent with the final image enhancement when training is performed.
Corresponding to the above-mentioned high dynamic range image tone mapping method, the embodiment of the invention also provides a high dynamic range image tone mapping system based on deep learning, which comprises:
and the global mapping module is used for compressing the high dynamic range image into a low dynamic range image through global mapping.
And the network construction module is used for constructing an enhanced neural network under the encoder-decoder structure, and the enhanced neural network takes jump connection as a connection mode.
And the online training module is used for carrying out online training on the low dynamic range image through the enhanced neural network and extracting an enhanced factor.
And the image enhancement module is used for carrying out image enhancement on the low dynamic range image through the enhancement factors to obtain a mapping image.
The nonlinear loss function is selected as the loss function of the online training, and comprises a first type of loss function reflecting the difference between the image parameters and the statistical ideal values of the iterative image formed in the online training process and a second type of loss function reflecting the difference between the iterative image and the high dynamic range image.
Correspondingly, the embodiment of the invention also provides a readable storage medium, wherein the readable storage medium stores a computer program, and the computer program is executed to execute the steps of the high dynamic range image tone mapping method.
The technical scheme of the invention is further described in detail below through a plurality of embodiments and with reference to the accompanying drawings. However, the examples are chosen to illustrate the invention only and are not intended to limit the scope of the invention.
Example 1
The present embodiment illustrates a process of a tone mapping method for a high dynamic range image, specifically as follows:
s1: the image is compressed from a high dynamic range image to a low dynamic range image by global mapping.
S2: an enhanced neural network under the encoder-decoder structure is constructed and takes a jump connection as a connection mode.
S3: the low dynamic range image is trained online through the enhanced neural network to extract the enhancement factors.
S4: the low dynamic range image is pixel-level enhanced by an enhancement factor.
Considering that the brightness at the same pixel point between the low dynamic range image and the high dynamic range image always has a linear relationship, in this embodiment, a compression object for global mapping is selected by using a brightness channel, and the compression is implemented through an asymmetric omentum response model, and the specific expression is as follows:
;
;
in the method, in the process of the invention,is an asymmetric parameter>For the brightness of low dynamic range images, +.>For the brightness of high dynamic range images, +.>For the average luminance of the high dynamic range image, +.>Maximum brightness for high dynamic range image, +.>Is the minimum brightness of the high dynamic range image.
However, in view of the calculation of luminance channels (taking a high dynamic range image as an example,) Is calculated according to the channel intensity of each color channel, thus, the brightness is switched onAfter the channels are compressed, the channels of each color are required to be restored, and the restoration formula is as follows:
;
in the method, in the process of the invention,color channel intensity for low dynamic range image after color recovery,/->Color channel intensity for high dynamic range images.
Further, as shown in fig. 3, in order to improve the receptive field of the enhancement neural network, in this embodiment, it is proposed that the encoder-decoder enhancement neural network constructed uses mixed-hole convolution instead of the common convolution layer, and the convolution kernels are all 3×3 in size and 1 in step length. Besides, the active layer uses the sigmod as an active function except the output layer, and the other layers use the LeakyReLU as an active function. Meanwhile, in order to avoid the problem of gradient disappearance, a jump connection mode is adopted among layers.
To solve the problem of tag loss, the online training in this embodiment is performed under the guidance of a series of nonlinear loss functions, which considers the constraint on training by each loss function, and ensures the reliability and quality of training even without tags. Specific loss functions include an intensity loss function, a contrast loss function, a hue loss function, a saturation loss function, and a structure loss function.
Wherein the intensity loss function is expressed as the following formula:
;
in the method, in the process of the invention,for strength loss->For the average intensity of the color channels of the iterative image formed in the training,is the ideal intensity of the color channel.
The contrast loss function is expressed as the following formula:
;
in the method, in the process of the invention,for contrast loss +.>To the standard deviation of the color channels of the iterative images formed in the training,is an ideal value for the color channel.
The hue loss function is expressed as the following formula:
;
in the method, in the process of the invention,for hue loss, N is the total number of pixels, i is a constant with a value of 1, N, P is the red-green component in IPT space, and T is yellow Lan Fenliang in IPT space (where subscript ldr represents the iterative image formed in training, rather than the low dynamic range image obtained via global mapping).
The saturation loss function is expressed as the following formula:
;
in the method, in the process of the invention,for saturation loss, +_>Saturation for the i-th pixel in the high dynamic range image, < >>Saturation for the i-th pixel in the low dynamic range image (similarly, the subscript ldr represents the iterative image formed in the training, rather than the low dynamic range image obtained via global mapping).
The structural loss function is expressed as the following formula:
;
in the method, in the process of the invention,for structural loss->Is a multi-scale structural similarity index.
;
Is the weight of the nth scale; (the value of n can be set to 5 in general, but can be adjusted up and down
;
Wherein sigma x 、σ y Sum sigma xy The local standard deviation and cross-correlation between corresponding blocks in the HDR and LDR images respectively,is->Reflecting the importance of signal strength, C 1 、C 2 Is a stable constant.
The multiple loss functions are added and integrated according to a certain weight, and the weighted average obtains a total nonlinear loss function to guide the training, specifically, the weight of the intensity loss function 50, the weight of the contrast loss function 1, the weight of the hue loss function 100, the weight of the saturation loss function 1 and the weight of the structure loss function 1 are given in the embodiment.
Finally, the enhancement factors obtained through training carry out pixel-level enhancement under the recursion enhancement on the low dynamic range image, so that the high dynamic range image is finally converted into a high-quality low dynamic range image, and the high dynamic range image is suitable for HDR image presentation of dynamic range limited equipment. The specific formula expression for pixel enhancement is as follows:
;
in the method, in the process of the invention,to enhance the factor->For the output of the n-th recursion enhancement, n is the number of recursions, when the number of recursions is 1 (i.e. the first round of recursion),>is an input low dynamic range image.
In the specific implementation process, referring to fig. 2, taking a high dynamic range image as an example, the display effect is as shown in the first image in fig. 2, and is limited by the dynamic range of the display device, so that details cannot be clearly presented; the second image is obtained through global mapping, so that the dynamic range is reduced, the whole is clearer, and a large number of defects still exist in local details; and the enhancement factors extracted through training are shown as a third image, the second image is enhanced in pixel level by utilizing the enhancement factors, and a final mapping image shown as a fourth chapter image is obtained, so that the local details are full and clear, and the display effect is better.
Example 2
For better understanding of the technical content of the present invention, the present embodiment illustrates the present invention by way of a system structure, as shown in fig. 4, a high dynamic range image tone mapping system based on deep learning, comprising:
and the global mapping module is used for mapping the image from the high dynamic range image to the low dynamic range image.
And the network construction module is used for constructing an enhancement network under the encoder-decoder structure and takes jump connection as a connection mode.
And the online training module is used for carrying out online training on the low dynamic range image through the enhancement network so as to extract the enhancement factors.
And the pixel enhancement module is used for carrying out pixel-level enhancement on the low dynamic range image through the enhancement factors.
Further, in the global mapping module, the global mapping is specifically to compress the brightness channel and then perform color recovery.
Further, in the pixel enhancement module, pixel-level enhancement is achieved by recursive enhancement.
Based on the above embodiments, it can be clear that, according to the high dynamic range image tone mapping method and system provided by the embodiments of the present invention, a set of non-reference loss functions that can be used for tone mapping tasks are designed, training of an enhanced neural network is effectively guided without pairing tags, supervision of tone mapping directly through high dynamic range images is achieved, and finally high quality mapping from high dynamic range images to low dynamic range display images is achieved, and the technical problem that high dynamic range images are limited in display is effectively solved; on the other hand, tone mapping is converted into global mapping and image enhancement, and the learning difficulty and the calculation cost are reduced by combining the traditional global mapping and learning-based image enhancement.
It should be understood that the above embodiments are merely for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the present invention and implement the same according to the present invention without limiting the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.
Furthermore, descriptions such as those referred to herein as "first," "second," "a," and the like are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.