[go: up one dir, main page]

CN119832029A - Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization - Google Patents

Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization Download PDF

Info

Publication number
CN119832029A
CN119832029A CN202411927402.6A CN202411927402A CN119832029A CN 119832029 A CN119832029 A CN 119832029A CN 202411927402 A CN202411927402 A CN 202411927402A CN 119832029 A CN119832029 A CN 119832029A
Authority
CN
China
Prior art keywords
image
low
features
optical flow
flow estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411927402.6A
Other languages
Chinese (zh)
Other versions
CN119832029B (en
Inventor
戴玮辰
武鹤星
孔万增
翁啸洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202411927402.6A priority Critical patent/CN119832029B/en
Publication of CN119832029A publication Critical patent/CN119832029A/en
Application granted granted Critical
Publication of CN119832029B publication Critical patent/CN119832029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于多模态协同优化的隐式图像增强与光流估计方法。该方法首先获取正常光照场景下的RGB图像以及对应视角的深度图;根据深度图和相机内参计算对应的三维点云数据。利用正常光照场景下的RGB图像合成低光照图像数据。利用高低频特征增强网络分解低光照图像的高频特征与低频特征,用于对低光照图像进行增强,然后提取低光照图像的图像特征和上下文特征;利用2D‑3D特征融合网络提取2D图像特征与3D点云特征并进行对齐融合,得到正常RGB图像的图像特征和上下文特征,用于监督高低频特征增强网络的特征提取过程。最后基于低光照图像的图像特征和上下文特征,构建4D相关体积表,并利用GRU推理光流。

The present invention discloses an implicit image enhancement and optical flow estimation method based on multimodal collaborative optimization. The method first obtains an RGB image in a normal lighting scene and a depth map of the corresponding viewing angle; and calculates the corresponding three-dimensional point cloud data according to the depth map and the camera intrinsic parameters. The low-light image data is synthesized using the RGB image in the normal lighting scene. The high-frequency features and low-frequency features of the low-light image are decomposed using a high- and low-frequency feature enhancement network to enhance the low-light image, and then the image features and context features of the low-light image are extracted; the 2D-3D feature fusion network is used to extract 2D image features and 3D point cloud features and align and fuse them to obtain the image features and context features of the normal RGB image, which are used to supervise the feature extraction process of the high- and low-frequency feature enhancement network. Finally, based on the image features and context features of the low-light image, a 4D correlation volume table is constructed, and the optical flow is inferred using GRU.

Description

Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization.
Background
Optical flow estimation aims at calculating pixel-level two-dimensional motion between successive frames, describing motion fields in two-dimensional space. Conventional optical flow algorithms typically rely on photometric consistency assumptions to derive motion information through optimization. The core of such methods is the manual extraction of features, while the introduction of deep learning significantly alters the way the optical flow is estimated. Early deep learning optical flow methods (e.g., flowNet) estimate optical flow directly from image pairs via two convolutional neural networks, eliminating the step of manual feature extraction in conventional methods, and verifying the feasibility of learning optical flow directly from images.
With the development of deep learning, the design of optical flow estimation networks becomes increasingly diverse. For example, PWC-Net adopts a spatial pyramid structure to gradually estimate the optical flow at a plurality of layers, and introduces a feature pyramid and cost volume-based matching method, so that the accuracy of optical flow estimation is improved, and meanwhile, the calculation efficiency is maintained. RAFT updates optical flow through a novel network structure by using multi-scale 4D correlation volume search and GRU iteration, while GMA enhances global motion matching capability in feature extraction and encoding through global motion information aggregation and local attention mechanism, thereby improving optical flow calculation performance.
However, existing optical flow methods are designed primarily for image data acquired under normal environmental conditions. In complex environments, image data is often affected by information loss. For example, in low light conditions, noise increases and texture feature attenuation can significantly reduce image quality, breaking the photometric consistency assumption that optical flow estimation relies on. The increase in noise and the lack of texture detail reduce inter-frame texture consistency, resulting in reduced performance of the subsequent optical flow estimation network.
Optical flow estimation for challenging environments such as severe weather conditions or low-light environments, corresponding solutions have also emerged in the prior art.
For example, in a rainy scene RobustNet estimates optical flow by using a rainless residual channel, while RainFlow reduces the effects of rain streaks and haze on image features by generating features that are invariant to rain streaks and rain mist. In haze scenes, some methods perform image defogging and optical flow estimation simultaneously by synthesizing haze data and applying style migration techniques.
In low-light environments, zheng et al propose a synthetic dark Noise optical flow reference dataset named FLYINGCHAIRS DARK & Noise (FCDN) dataset that solves the problem of dataset missing by adding dark image Noise to the normal illumination dataset. They also introduced Various Brightness Optical Flow (VBOF) datasets of different exposure levels, containing images of different exposure levels and optical flow pseudo-labels. In addition, CEDFlow introduces high-low frequency self-adaptive enhancement and edge enhancement to feature dimension, and specially designs a structure for extracting image features in low light environment to improve optical flow estimation performance.
However, extracting image features solely from limited information in low-light images limits optical flow estimation performance. The reduced light intake under low light conditions makes it difficult to extract the necessary information, which prevents the extraction of high quality features that are critical for subsequent optical flow calculations. Furthermore, since the network is trained using only low quality images, this reduces the performance that can typically be achieved under normal lighting conditions.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization, which combines RGB and Depth (Depth) image data, and remarkably improves the robustness and precision of optical flow estimation by introducing multi-mode information and multi-mode collaborative optimization frames, so as to solve the defect of limited optical flow estimation performance in difficult scenes such as low illumination, high noise or other complex dynamic scenes in the prior art.
An implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization specifically comprises the following steps:
step 1, acquiring an RGB image under a normal illumination scene and a depth map of a corresponding view angle. And calculating the three-dimensional point cloud corresponding to the pixel according to the depth map and the camera internal parameters.
And adding simulated low-light noise into the RGB image, adjusting the brightness of the image, and synthesizing low-light image data.
And 2, constructing a high-low frequency characteristic enhancement network, and decomposing the low-light image data obtained in the step 1 into high-frequency characteristics and low-frequency characteristics. The high frequency features contain detailed information of the image such as edges and textures. The low frequency features are the background and the overall outline of the image.
The method comprises the steps of inputting high-frequency features into a dense convolution network for enhancement, highlighting texture details, processing low-frequency features through a multi-scale feature enhancement network, and capturing global background information by using an attention mechanism. And then, performing weighted fusion through a channel attention mechanism and residual connection to generate enhanced image features F en.
Extracting image features of low-light image data from enhanced image features by an encoderAnd contextual features
Step 3, respectively extracting two-dimensional image features from the RGB image and the three-dimensional point cloud in the normal illumination scene through the encoderAnd three-dimensional point cloud featuresAfter feature alignment and fusion, obtaining the fused normal illumination image featuresAnd contextual features
By a priori feature loss functionThe feature extraction process in step2 is supervised:
And 2 represents computing a 2-norm.
Step 4, image characteristics based on low-light imageAnd contextual featuresCalculation of 4D correlation volumesIterative optimization of initial optical flow field using multi-scale correlation volumes, recursive update operator of GRU (gated loop iteration unit), progressive refinement of optical flow estimation results, calculation of optical flow estimation loss
Step 5, setting a total loss functionFor a priori feature lossAnd optical flow estimation lossIs used for completing model training:
wherein, Representing the loss weight.
Inputting the image pair needing optical flow estimation into a trained high-low frequency characteristic enhancement network to extract image characteristicsAnd contextual featuresAnd outputting an optical flow estimation result by the method of the step 4.
The invention has the following beneficial effects:
1. The information loss of the original low-quality image is supplemented by using the multi-mode data, and the performance bottleneck of the single-mode optical flow estimation method in a complex scene is broken through. And through high-low frequency feature decomposition and enhancement, the image detail and global information extraction capability are respectively optimized, and the optical flow calculation precision is obviously improved. An implicit feature supervision mechanism is introduced, training guidance is carried out on the enhanced network through multi-mode features, RGBD fusion and 3D-to-2D projection are utilized, the enhanced features are ensured to maintain geometric interpretability, and task and enhanced depth collaborative optimization is realized.
2. The method is suitable for wide computer vision tasks including robot vision navigation, automatic driving environment sensing, moving object detection and tracking and the like, and particularly has excellent performance in difficult scenes such as low-light and high-noise environments. In addition, the technology has potential value in image restoration, video analysis and other application scenes requiring high-precision dynamic scene estimation. The method provides a brand new solution for optical flow estimation in a complex environment, improves the performance of the optical flow estimation in a difficult scene through implicit multi-modal knowledge guiding and feature enhancement, and further promotes the technical progress in the optical flow calculation field.
Drawings
FIG. 1 is a flow chart of an implicit image enhancement and optical flow estimation method based on multi-modal collaborative optimization.
FIG. 2 is a qualitative comparison of the different methods on FLYINGTHING D datasets in the examples.
Detailed Description
The invention is further explained below with reference to the drawings;
As shown in FIG. 1, the implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization mainly comprises high-low frequency feature enhancement, 2D-3D feature fusion and iterative optical flow estimation. The method comprises the following specific steps:
Step 1, the embodiment uses FLYINGTHING D data set and VBOF (Variable Brightness Optical Flow) data set as data sources. The FLYINGTHING3D dataset is a synthetic dataset designed specifically for optical flow estimation in a 3D scene, containing image pairs with dynamic 3D objects and their corresponding optical flow real data. The images of the dynamic 3D object come from flying objects that simulate different camera perspectives and lighting conditions. The VBOF dataset then contains a set of images of the same scene under a plurality of exposure conditions, taken by four different cameras, and providing real optical flow data for each exposure condition.
And calculating a three-dimensional point cloud corresponding to the pixels according to the depth map and the camera internal parameters based on the normal illumination RGB image provided by the FLYINGTHING D data set and the corresponding visual angle depth map. Then, adding simulated low-light noise into the RGB image, introducing uncorrected white balance effect and noise model, and adjusting image brightness to generate synthetic low-light image data with low-light noise characteristics.
Step 2, decomposing the low-illumination image data into high-frequency characteristics and low-frequency characteristics through a high-low frequency characteristic enhancement network, and enhancing the original image by utilizing the double-frequency-domain characteristics, wherein the specific steps are as follows:
s2.1, firstly inputting the low-light image pair under the same scene into a convolution layer to obtain a low-light image characteristic F. Then, the low-frequency characteristic F low of the low-light image is obtained from F through an averaging pooling operation.
And carrying out bilinear interpolation up-sampling on the low-frequency characteristic F low, and subtracting the up-sampling result from the F to obtain a high-frequency characteristic F high of the low-illumination image.
S2.2, because the high frequency information mainly represents the details of the image, the local image information can be better focused by adopting a smaller receptive field, so that the details are more accurately enhanced. The high frequency characteristic F high is thus input into a dense convolution network:
Where Dense (-) represents a Dense convolutional network. The dense convolution network comprises a plurality of small convolution kernels and has a residual connection structure, so that the dense convolution network can be better focused on a detail area and is helpful for exploring high-frequency information.
The low frequency information contains the background and contours of the image from which long range dependencies can be suitably captured. Firstly, carrying out two continuous downsampling on a low-frequency characteristic F low, wherein the characteristic obtained by downsampling is F low1、Flow2, respectively inputting the three-scale low-frequency characteristics into the self-attention of a channel, capturing global background information, obtaining an enhanced multi-scale low-frequency characteristic F' low、F'low1、F'low2, and finally fusing through a wavelet fusion network (Wavelet Fusion Network):
where WF (∈) represents the wavelet fusion operation.
S2.3 for the processed high frequency characteristicsAnd low frequency characteristicsAnd splicing on the channel, performing residual connection with the low-light image characteristic F after one-dimensional convolution and a channel attention mechanism, and generating an enhanced image characteristic F en. The high-frequency and low-frequency characteristic enhancement network can respectively process damaged image texture details and edge contours, so that the characteristic extraction capability is improved.
S2.4, inputting the enhanced image feature F en into a CNN network-based encoder, extracting the image feature of the low-light imageAnd contextual features
And 3, aligning and fusing the point cloud features and the RGB features by using a pre-trained Encoder coding and a projection mode capable of learning interpolation through a 2D-3D feature fusion network, and implicitly supervising the feature extraction of low-quality data.
S3.1, firstly extracting two-dimensional features from RGB image pairs under the same scene through pretrained CNN,Representing two-dimensional features respectivelyIs defined by a height and a width of (a),Indicating the number of channels. Extracting three-dimensional features from point cloud dataMeanwhile, the position information of the point cloud is acquired from the depth map, M represents the number of the point cloud,Representing three-dimensional features of point cloud dataIs a number of channels.
S3.2, because the image features are dense and the point cloud features are sparse, in order to realize feature alignment, a learnable interpolation method is applied to the point cloud featuresConverting to dense point cloud features of the same size as image features. For each image pixel, neighborhood features are weighted by ScoreNet based on coordinate offset, and then employedThe projection points of the point cloud features are found in a nearest neighbor mode:
wherein, Representing the offset of each pixel with respect to the point cloud 2D projection point. ScoreNet () represents ScoreNet network.Indicating KNN neighbors.Representing dense point cloud featuresCharacteristic of pixel (i, j), i=1, 2..h, j=1, 2..w.
S3.3, throughConvolution versus dense point cloud featuresAnd image featuresFusion is performed in the channel dimension:
Conv
Fused normal image features And normal context featuresFor implicitly providing guidance to high and low frequency feature enhancement networks, optimizing feature extraction capabilities. By a priori feature loss functionThe characteristic extraction process of the high-low frequency characteristic enhancement network is supervised:
And 2 represents computing a 2-norm.
And 4, constructing a 4D related volume table, and deducing an optical flow by using the GRU.
S4.1 Low-light image-based image featuresAnd contextual featuresCalculation of 4D correlation volumesFor establishing a feature correspondence between pixels.
S4.2, using the multi-scale correlation volume, and gradually optimizing optical flow estimation by iteration based on an update operator of the GRU. Assume an initial optical flow fieldIn the first placeIn the iterations, the optical flow estimation value is updated:
wherein, Is a relevant feature retrieved from the relevant volume C, and GRU () represents a gated loop iteration unit.
S4.3 by predicting optical flowAnd true optical flowBetween (a) and (b)Distance supervision and prediction sequenceApplying exponentially decaying weightsCalculating optical flow estimation loss:
Wherein the method comprises the steps ofQ represents the number of iterative updates of the GRU.
The exponential decay weights in this embodimentSet to 0.9, 12 optical flow prediction iterations were performed to achieve a better coarse to fine optical flow update.
Final loss functionSet to a priori feature lossAnd optical flow estimation lossAnd (2) sum:
wherein, The loss weight is indicated and is set to 0.2 in this embodiment. The model herein was implemented using PyTorch and trained using Adam optimizer, which performed a total of 20 ten thousand iterations. The initial learning rate is set toAnd gradually reduces to the point in the training process through a cosine annealing strategy
The original image from the FLYINGTHING D dataset is denoted as C and the processed image with low light noise characteristics is denoted as CN.
Firstly, selecting part of original images C for model training, using the rest of original images as a test set for performance test, selecting an endpoint error (EPE) and 1 pixel precision (ACC 1 px) as evaluation indexes, and comparing the performance of the method with that of the prior art method, wherein the test results are shown in Table 1:
TABLE 1
As can be seen from the data in Table 1, the EPE of the present method was 2.91 and the ACC 1px was 86.54%, which is superior to all other methods. The present method increased about 9.3% on EPE compared to GMFlow for the second rank and about 19.3% compared to RAFT. Experimental results show that the method can indirectly guide the model to learn effective feature extraction by introducing implicit feature supervision in the training process, so that the subsequent optical flow estimation effect is remarkably improved.
Then, model training is performed by using the original image C and the image CN with low light noise characteristics simultaneously, and performances of different methods on the original normal image and the image after noise injection are compared, as shown in a table 2:
TABLE 2
The data in Table 2 shows that the method achieves the best performance on all the criteria. Compared with GMFlow, on the normal image and the injected noise image, the EPE index of the method is improved by 7.1% and 7.8% respectively, which shows that the method is superior to a model trained only on RGB images in solving the optical flow estimation challenge under the support of a high-low frequency characteristic enhancement network and a priori characteristic guidance.
To compare the performance of different models in low-light optical flow estimation, tests were performed on VBOF datasets, as shown in table 3:
TABLE 3 Table 3
On VBOF dataset and its SONY subset, the method achieved the best results on the SONY subset, with EPE of 20.05, an improvement of about 1.5% over GMFlow of the second name. Furthermore, the method also achieves optimal results over the entire VBOF dataset, which is about 1.1% higher than RAFT, indicating that the method can effectively cope with low-light noise scenes in the real world by training with synthetic low-light images.
FIG. 2 illustrates a comparison of visual results of optical flow estimation for different methods, the first behavior being the output of different methods for a clear RGB image. It can be seen that the method can effectively track the outline of a small object under the guidance of implicit features, and performs well in estimating the optical flow of a moving object. The second line shows the estimation result of RGB image injected with noise, and the method still has strong noise immunity despite the added noise, can accurately recover the information in the damaged image, and keeps high optical flow estimation precision. The third line shows the image in a real low light environment, and the method helps to efficiently process the light flow estimation in the face of real ambient noise by training the composite data. In contrast, other approaches perform poorly in the case of noisy regions and elongated object contours, highlighting the limitations of noise-affected image features in the absence of additional prior guidance.
Finally, in order to verify the effectiveness of the high-low frequency characteristic enhancement network and the prior characteristic loss in the method, a series of ablation experiments are performed, and the results are shown in table 4:
TABLE 4 Table 4
It can be seen that the a priori feature loss improves the performance of the RAFT frame by about 7.9%, while the high and low frequency feature enhancement networks improve performance by about 4.1%. The prior characteristic loss and the high-low frequency characteristic enhancement network are simultaneously applied, namely the method improves the performance by about 12.7 percent.
In summary, the present application proposes a novel multi-modal collaborative implicit image enhancement method for improving the optical flow estimation performance in challenging environments. By introducing multi-mode (RGBD) collaborative training, the quality of an input image is effectively improved, so that more accurate optical flow estimation is realized under a complex condition. Collaborative learning of RGB and depth images at the feature level enables the enhancement network to implicitly acquire multi-modal knowledge with geometric consistency, thereby improving feature extraction capability of optical flow calculation. A large number of experimental verification is carried out on the synthetic data set and the real data set, and the result shows that the optical flow estimation performance can be improved through the method.

Claims (10)

1. The implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization is characterized by comprising the following steps of:
step 1, acquiring an RGB image in a normal illumination scene and a depth map of a corresponding visual angle, calculating corresponding three-dimensional point cloud data according to the depth map and camera internal parameters, and synthesizing low-illumination image data by utilizing the RGB image in the normal illumination scene;
Step 2, constructing a high-low frequency characteristic enhancement network, and inputting a low-light image;
The method comprises the steps of firstly decomposing high-frequency features and low-frequency features, then utilizing a dense convolution network to strengthen the high-frequency features, utilizing multi-scale attention enhancement and wavelet fusion to strengthen the low-frequency features, then conducting channel attention mechanism and residual connection weighted fusion to generate enhanced image features F en, and finally extracting image features of a low-illumination image from the enhanced image features F en through an encoder And contextual features;
Step 3, constructing a 2D-3D feature fusion network, and extracting two-dimensional image features in the RGB imageAnd three-dimensional point cloud features in a three-dimensional point cloudAfter feature alignment and fusion, obtaining the features of the normal illumination imageAnd contextual features;
By a priori feature loss functionThe characteristic extraction process of the high-low frequency characteristic enhancement network is supervised:
the term 2 denotes a calculation 2 norm;
Step 4, image characteristics based on low-light image And contextual featuresCalculation of 4D correlation volumesIterative optimization is carried out on an initial optical flow field through a recursion update operator of the GRU by using the multi-scale correlation volume, an optical flow estimation result is gradually refined, and optical flow estimation loss is calculated;
Step 5, setting a total loss functionFor a priori feature lossAnd optical flow estimation lossThe image pair needing to be subjected to optical flow estimation is input into a trained high-low frequency characteristic enhancement network to extract image characteristicsAnd contextual featuresAnd outputting an optical flow estimation result by the method of the step 4.
2. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization of claim 1, wherein simulated low-light noise is added to an RGB image in a normal light scene, uncorrected white balance effects and noise models are introduced, and image brightness is adjusted to synthesize low-light image data with low-light noise characteristics.
3. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization as set forth in claim 1, wherein the high-low frequency feature enhancement network decomposes an input low-light image into high-frequency features and low-frequency features, and enhances an original image by using dual-frequency domain features, and the method comprises the specific steps of:
S2.1, inputting the low-light image into a convolution layer to extract the characteristic F of the low-light image, and then obtaining the low-frequency characteristic F low of the low-light image through an average pooling operation;
Performing bilinear interpolation upsampling on the low-frequency characteristic F low, and subtracting the upsampling result from the F to obtain a high-frequency characteristic F high of the low-light image;
s2.2, inputting the high-frequency characteristic F high into a dense convolution network to obtain an enhanced high-frequency characteristic ;
The method comprises the steps of continuously downsampling a low-frequency characteristic F low twice, inputting the characteristic F low1、Flow2 obtained by downsampling each time into the self-attention of a channel, capturing global background information to obtain an enhanced multi-scale low-frequency characteristic F 'low、F'low1、F'low2, and finally fusing the enhanced multi-scale low-frequency characteristic F' low、F'low1、F'low2 by a wavelet fusion network to obtain the enhanced low-frequency characteristic;
Where WF (∈) represents a wavelet fusion operation;
s2.3, pair And (3) withAnd splicing on the channel, performing residual connection with the low-light image characteristic F after one-dimensional convolution and a channel attention mechanism, and generating an enhanced image characteristic F en.
4. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization according to claim 1, wherein the 2D-3D feature fusion network fuses the point cloud features and RGB features after alignment, specifically comprising the following steps:
s3.1, firstly extracting two-dimensional features from RGB images in a normal illumination scene through a pre-trained encoder Extracting three-dimensional features from point cloud dataSimultaneously acquiring the position information of the point cloud from the depth map;
s3.2, applying a learnable interpolation method to characteristic the point cloud Converting to dense point cloud features of the same size as image features;
S3.3, for dense point cloud featuresAnd image featuresFusion is performed in the channel dimension.
5. The method for implicit image enhancement and optical flow estimation based on multimodal collaborative optimization of claim 4, wherein for each image pixel, neighborhood features are weighted by ScoreNet based on coordinate offset and then employedThe projection points of the point cloud features are found in a nearest neighbor mode:
wherein, Representing an offset of each pixel with respect to a point cloud 2D projection point; scoreNet () represents ScoreNet network; represents KNN neighbors; Representing dense point cloud features The characteristics of pixel (i, j), i=1, 2..h, j=1, 2,..w,Representing two-dimensional features respectivelyIs a height and width of (a).
6. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization according to any one of claims 1 or 4, wherein feature extraction is performed using encoders based on CNN network architecture.
7. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization as set forth in claim 1, wherein the optical flow estimation loss:
Wherein the method comprises the steps ofRepresents an exponential decay weight andQ represents the number of iterative updates of the GRU.
8. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization according to claim 1 or 7, wherein the loss functionThe method comprises the following steps:
wherein, Representing the loss weight.
9. The method for implicit image enhancement and optical flow estimation based on multi-modal collaborative optimization of claim 8, wherein loss weights are setTraining with Adam optimizer, setting the iteration number of model to 20 ten thousand times, and initial learning rate to beAnd reduced to during training by cosine annealing strategy
10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-5 or 7.
CN202411927402.6A 2024-12-25 2024-12-25 Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization Active CN119832029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411927402.6A CN119832029B (en) 2024-12-25 2024-12-25 Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411927402.6A CN119832029B (en) 2024-12-25 2024-12-25 Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization

Publications (2)

Publication Number Publication Date
CN119832029A true CN119832029A (en) 2025-04-15
CN119832029B CN119832029B (en) 2025-10-03

Family

ID=95307501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411927402.6A Active CN119832029B (en) 2024-12-25 2024-12-25 Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization

Country Status (1)

Country Link
CN (1) CN119832029B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121032823A (en) * 2025-11-03 2025-11-28 西安电子科技大学杭州研究院 A Hyper-Multispectral Image Fusion Method Based on Deep Kalman Filtering
CN121053346A (en) * 2025-11-05 2025-12-02 杭州将古文化发展有限公司 Digital display visualization system based on XR technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332355A (en) * 2021-12-03 2022-04-12 南京航空航天大学 Weak light multi-view geometric reconstruction method based on deep learning
CN116468770A (en) * 2023-03-15 2023-07-21 中国矿业大学 A self-supervised depth estimation method in 3D reconstruction of mine safety hidden danger scenes
WO2023236445A1 (en) * 2022-06-09 2023-12-14 北京大学 Low-illumination image enhancement method using long-exposure compensation
CN117274321A (en) * 2023-09-26 2023-12-22 北京理工大学 A multi-modal optical flow estimation method based on event cameras
WO2024159082A2 (en) * 2023-01-27 2024-08-02 Google Llc Monocular depth and optical flow estimation using diffusion models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332355A (en) * 2021-12-03 2022-04-12 南京航空航天大学 Weak light multi-view geometric reconstruction method based on deep learning
WO2023236445A1 (en) * 2022-06-09 2023-12-14 北京大学 Low-illumination image enhancement method using long-exposure compensation
WO2024159082A2 (en) * 2023-01-27 2024-08-02 Google Llc Monocular depth and optical flow estimation using diffusion models
CN116468770A (en) * 2023-03-15 2023-07-21 中国矿业大学 A self-supervised depth estimation method in 3D reconstruction of mine safety hidden danger scenes
CN117274321A (en) * 2023-09-26 2023-12-22 北京理工大学 A multi-modal optical flow estimation method based on event cameras

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李宇琦;赵海涛;: "基于红外和可见光图像逐级自适应融合的场景深度估计", 应用光学, no. 01, 15 January 2020 (2020-01-15) *
王硕;王亚飞;: "基于多流对极卷积神经网络的光场图像深度估计", 计算机应用与软件, no. 08, 12 August 2020 (2020-08-12) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121032823A (en) * 2025-11-03 2025-11-28 西安电子科技大学杭州研究院 A Hyper-Multispectral Image Fusion Method Based on Deep Kalman Filtering
CN121032823B (en) * 2025-11-03 2026-02-10 西安电子科技大学杭州研究院 High-multispectral image fusion method based on depth Kalman filtering
CN121053346A (en) * 2025-11-05 2025-12-02 杭州将古文化发展有限公司 Digital display visualization system based on XR technology

Also Published As

Publication number Publication date
CN119832029B (en) 2025-10-03

Similar Documents

Publication Publication Date Title
CN111340922B (en) Positioning and mapping method and electronic device
US12340556B2 (en) System and method for correspondence map determination
CN113963117B (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
US20110176722A1 (en) System and method of processing stereo images
CN113947538B (en) A multi-scale efficient convolutional self-attention single image rain removal method
EP3293700B1 (en) 3d reconstruction for vehicle
CN119832029A (en) Implicit image enhancement and optical flow estimation method based on multi-mode collaborative optimization
Fan et al. Multiscale cross-connected dehazing network with scene depth fusion
CN116703996B (en) Monocular three-dimensional target detection method based on instance-level self-adaptive depth estimation
CN113160278A (en) Scene flow estimation and training method and device of scene flow estimation model
Guo et al. Joint raindrop and haze removal from a single image
CN116523790A (en) SAR image denoising optimization method, system and storage medium
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
Haji-Esmaeili et al. Large-scale monocular depth estimation in the wild
CN117911480B (en) An attention-guided multi-view depth estimation method
CN117830533B (en) Three-dimensional reconstruction method and device based on defocusing characteristics
Du et al. A comprehensive survey: Image deraining and stereo‐matching task‐driven performance analysis
Li et al. Derainnerf: 3d scene estimation with adhesive waterdrop removal
CN116168162A (en) Three-dimensional point cloud reconstruction method for multi-view weighted aggregation
CN116385987B (en) Lane line fusion recognition method based on semantic segmentation and edge detection algorithm
CN117132503A (en) Method, system, equipment and storage medium for repairing local highlight region of image
CN119904349B (en) Fisheye camera SLAM method, fisheye camera SLAM device, fisheye camera SLAM system and storage medium
CN120863391B (en) Charging equipment, charging control method, device and medium thereof
CN119991919B (en) Cross-device rendering method, device, equipment and medium
CN118644562B (en) Model training method, three-dimensional point cloud acquisition method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant