CN119478648B - A method for underwater robot vision clarity based on multimodal fusion network - Google Patents
A method for underwater robot vision clarity based on multimodal fusion networkInfo
- Publication number
- CN119478648B CN119478648B CN202411488920.2A CN202411488920A CN119478648B CN 119478648 B CN119478648 B CN 119478648B CN 202411488920 A CN202411488920 A CN 202411488920A CN 119478648 B CN119478648 B CN 119478648B
- Authority
- CN
- China
- Prior art keywords
- underwater
- polarization
- network
- features
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/05—Underwater scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an underwater robot vision sharpening method based on a multi-mode fusion network, which belongs to the technical field of underwater image processing and mainly comprises the steps of collecting turbid underwater images and corresponding clear underwater images, constructing an underwater polarized image dataset comprising underwater polarized images, polarized images and polarized angle images with different angles, constructing an underwater robot vision sharpening model based on the multi-mode fusion network comprising the multi-mode fusion network and an image enhancement network, training the multi-mode fusion network based on the underwater polarized image dataset, updating RGB information and polarized information by using pixel multi-scale fusion and generating fusion characteristics during training, training the image enhancement network by using the underwater polarized image dataset to obtain an image enhancement model, acquiring fusion characteristics with turbidity to be processed, and inputting the fusion characteristics into the image enhancement model based on the network so as to obtain clear underwater images.
Description
Technical Field
The invention relates to the technical fields of marine environment perception, digital image processing and image enhancement, in particular to an underwater robot vision sharpening method based on a multi-mode fusion network.
Background
With the development of human exploration for the ocean, underwater robots are becoming an important tool for acquiring information on the ocean floor. However, due to the strong absorption and scattering of light by the aqueous medium and the complex underwater environment, the underwater environment has energy attenuation due to the light in the underwater propagation process, and the impurities and suspended particles in the water can cause the light to scatter in the propagation process, so that the image collected underwater is more blurred than the image collected on land, and the problems can cause the obtained image to have low color deviation and definition, thereby seriously affecting the visual quality and affecting the performance of the visual task under the water. The underwater optical imaging is a core mode for completing underwater environment sensing and detecting tasks at present, plays an indispensable role in scientific research fields such as underwater robots, ocean science investigation, a plurality of downstream visual tasks (underwater target identification or tracking) and the like, and due to complexity and instability of the underwater environment, the underwater image often faces problems such as color cast, low contrast, blurring and the like. Specifically, when light propagates in water, the light is firstly affected by the water depth, and as the depth increases, the light with different wavelengths is continuously attenuated. Thus, since red light first disappears, the underwater image tends to be mainly blue or green, resulting in color shift of the image. Meanwhile, unlike air media on land, water contains a large amount of suspended particles, which can cause scattering of light in water, further challenging underwater optical imaging. In addition, because of the uncertainty of the imaging equipment in underwater movement, the acquired images have the problems of fuzzy details, low contrast and the like, the visual perception is influenced, and more serious challenges are presented for subsequent advanced visual tasks. Meanwhile, due to the lack of underwater scenes and high-quality images, UIE faces various challenges such as over-enhancement, detail feature blurring, and the like. These problems limit the performance of the UIE method, resulting in poor downstream task performance.
Disclosure of Invention
In view of the shortcomings of the prior art, the invention provides an underwater robot vision sharpening method based on a multi-mode fusion network, which tries to introduce polarization information as an additional mode to strengthen an original underwater image according to inspiration obtained from multi-mode learning, provides a novel detail focusing and polarization guiding multi-mode fusion network, integrates RGB modes and polarization modes to strengthen the underwater image, utilizes detail focusing differential convolution to capture more detail and edge information, and uses polarization degree information and polarization angle information to strengthen contrast and texture details of different areas in the image, thereby more accurately restoring the true color of the image, reducing serious interference of the image to subsequent computer vision tasks and remarkably improving the quality of the underwater image. The technical proposal is as follows:
an underwater robot vision definition method based on a U-Net multi-mode fusion network is characterized by comprising the following steps:
s1, acquiring a turbid underwater image and a corresponding clear underwater image, and constructing an underwater polarized image data set, wherein the underwater polarized image data set comprises underwater polarized images with different angles, a polarization degree image and a polarization angle image;
s2, constructing an underwater robot vision definition model based on a multi-mode fusion network, wherein the model comprises the multi-mode fusion network and an image enhancement U-Net network;
s3, training the multi-mode fusion network based on the underwater polarized image dataset, and updating RGB information and polarized information by using pixel multi-scale fusion and generating fusion characteristics during training;
S4, training the image enhancement U-Net network by using the underwater polarized image data set to obtain an image enhancement model based on the U-Net network;
S5, acquiring fusion characteristics with turbidity to be processed, and inputting an image enhancement model based on a U-Net network, so that a clear underwater image is acquired.
Further, the method for constructing the underwater polarized image dataset in the step S1 comprises the steps of manufacturing different-color water bodies with different turbidity degrees in a water body scene, collecting turbidity underwater images corresponding to objects in the different-color water bodies with different turbidity degrees by utilizing a polarized camera, collecting clear underwater images corresponding to the objects in purified water, taking the clear underwater images as tag images, and constructing a training set and a testing set according to the turbidity underwater images and the tag images.
Further, in step S2, the network architecture of the underwater robot vision definition model based on the multi-mode fusion network is an end-to-end structure, feature maps with different sizes are generated at each level, so that the network can capture features with different scales, the turbid underwater polarized images in the training set are preprocessed to obtain RGB modal information and polarized modal information before the fusion network is performed, the polarized modal information comprises polarization degree information DoLP and polarization angle information AoLP, and the obtained RGB modal information and polarized modal information are used as input of the multi-mode fusion network.
Further, the multimode fusion network comprises two modules, namely a characteristic fusion module and a polarization guiding fusion module,
The feature fusion module performs robust fusion on the DoLP and AoLP features from the polarized mode input domain by utilizing global and local information, generates two spatial attention patterns according to two mark embedding sequences provided by two Conformers for the two input features DoLP and AoLP, and then weights and fuses the extracted convolution features according to the spatial attention patterns to obtain polarized mode features;
The polarization guiding fusion module is used for processing the modal deviation, enhancing the input characteristic by using the attention operation on the RGB modal input characteristic X, updating and generating the fusion characteristic X * by using the polarization modal characteristic M for guiding, carrying out series connection and projection on the polarization modal characteristic and the RGB modal characteristic by a multi-layer perceptron to generate a key (k x)、query(qx) and a value (v x), reducing the embedding height and width H, W of the query and the key by the space dimension, and learning channel statistics S q、Sk of the query and the key, thereby obtaining a guiding updated channel relation M *.
Further, the polarization mode feature and the RGB mode feature are connected in series and projected by a multi-layer perceptron in order to generate key (k x)、query(qx) and value (v x),As a learnable parameter:
[X*]=FC(softmax(FC([qx;kx]))⊙vx)
wherein, as indicated by the symbols, the symbols are multiplied and FC represents the fully connected layer with filtering.
Further, the channel statistics S q、Sk of the query and the key are learned by reducing the embedding height and width H, W of the query and the key in the spatial dimension, so as to obtain the formula of the channel relation M * for guiding update as follows:
Km,Qm,Vm=X,M,kx
M*=Mx+FC((sqQm+skKm)⊙Vm)。
The further image enhancement model based on the U-Net network consists of three parts, namely an encoder part, a feature change part and a decoder part, wherein feature extraction blocks are deployed from a first layer to a third layer in the image enhancement U-Net network, namely corresponding features are extracted by adopting different blocks in different layers, and the third layer captures more details and edge features by using a Detail Enhancement Attention Block (DEAB);
The Detail Enhancement Attention Block (DEAB) comprises a detail focusing convolution block and a content guiding attention block, wherein the detail focusing convolution block integrates prior information by using differential convolution to supplement a convolution layer of parallel processing operation, so as to enhance the characterization capability;
The content guiding attention block adopts a dynamic fusion mode, acquires more useful information coded in the characteristics by distributing a unique space importance graph for each channel, fuses the low-dimensional characteristics from the encoder part and the characteristic high-dimensional characteristics from the decoder part together, modulates the characteristics through the learned space weight, so as to adaptively fuse the low-latitude characteristics of the encoder part and the corresponding high-dimensional characteristics from the decoder part, adds input characteristics through jump connection to relieve gradient vanishing problems and simplify the learning process, and maps the fused characteristics through a 3X 3 convolution layer to obtain a final clear result.
Further, the consistency of the dimensions is ensured by adopting two downsampling and two upsampling operations between different layers, the downsampling operation halves the space dimension and doubles the channel number, the method is realized by a convolution layer, the step value is set to 2, and the output channel number is set to 2 times of the input channel number, the upsampling operation is regarded as the inverse form of the downsampling operation, the downsampling operation is realized by a deconvolution layer, the sizes of the first layer, the second layer and the third layer are C multiplied by H multiplied by W respectively,
Compared with the prior art, the invention has the following advantages:
The invention provides an underwater robot vision definition method based on a multi-mode fusion network. The method can effectively avoid the imaging defect problem caused by the existing underwater image definition method under the condition of high turbidity, improves the quality of the underwater image while removing turbidity, and has effectiveness and robustness, thereby laying a theoretical technical foundation for subsequent visual tasks such as submarine panoramic observation and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic flow chart of an underwater robot vision sharpening method based on a multi-mode fusion network in an embodiment of the invention.
Fig. 2 is a diagram of an underwater robot vision-based network architecture for the multi-mode fusion network in an embodiment of the invention.
Fig. 3 is a turbid image, a polarization angle image and a polarization degree image obtained by an underwater robot vision sharpening network based on a multi-mode fusion network in an embodiment of the invention.
Fig. 4 is a clear underwater image output by an underwater robot vision-sharpening network based on a multi-mode fusion network in an embodiment of the invention.
Fig. 5 is a table comparing the underwater robot vision-sharpening network based on the multi-mode fusion network with the existing other networks on 5 common underwater image quality evaluation indexes in the embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms in the description of the present invention and the claims and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, the invention provides an underwater robot vision sharpening method based on a multi-mode fusion network, which mainly comprises the following steps:
S1, constructing an underwater polarized image data set, wherein the training data set comprises an underwater polarized image with four different angles, a polarization degree image and a polarization angle image. Specifically, different-color water bodies with different turbidity degrees are manufactured in an indoor water body scene constructed manually, turbid underwater images corresponding to objects in the different-color water bodies with different turbidity degrees are collected, clear underwater images corresponding to the objects in purified water are collected, and the clear underwater images are used as label images.
For example, by constructing an indoor turbid underwater polarized image acquisition platform, the platform comprises a glass water tank, a polarized camera, a computer, an illumination system and a camera tripod. The glass water tank has the size of 150cm multiplied by 35cm multiplied by 50cm, and the illumination system is three different-color illumination lamps, namely blue, green and white. The acquisition steps of the turbid underwater image are as follows:
Firstly, object images are acquired in water bodies with different turbidity degrees in different color scenes, each group acquires object images with 5 different turbidity degrees and 3 different color scenes, namely, a turbidity underwater image, clear label images are also acquired, and objects in a polarization camera and a water tank are fixed through a physical method in the shooting and acquisition process, so that the relative positions of the objects cannot be changed. Secondly, preparing various objects such as coral, starfish, conch, shell and the like, and dividing the objects into two groups in the process of collecting data. One group is to pave the bottom sand and broken stone at the bottom of the water tank to simulate the sea bottom or the river bed, fix the object on the bottom as the underwater image under the complex scene, and the other group is to fix the object on the pure white background plate as the underwater image under the simple scene. 1600 turbid underwater images and corresponding tag images thereof are acquired through a turbid underwater image acquisition platform, wherein 800 turbid underwater images are respectively arranged in a simple scene and a complex scene, and the resolution ratio of the images is 1024 multiplied by 1224.
S2, constructing an underwater robot vision definition model based on a multi-mode fusion network, wherein the model comprises the multi-mode fusion network and an image enhancement U-Net network, the underwater robot vision definition model based on the multi-mode fusion network is of an end-to-end structure, and feature maps with different sizes are generated at each level, so that the network can capture features with different scales.
Before the fusion network is carried out, the turbid underwater polarized image in the training set is preprocessed to obtain RGB information and polarization information, the polarized camera measures the light intensity I pol of a polarized angle phi pol passing through the linear polaroid through focal plane light splitting, and the calculation formula is as follows:
Ipol=Iun*(1+ρ*cos(2φ-2φpol))
S0=I0°+I90°=I45°+I135°
S1=I0°-I90°
S2=I45°-I135°
Wherein I un is the total incident light entering the camera, generally unpolarized light, ρ is the degree of linear polarization, φ is the linear polarization angle, S 0、S1、S2 is Stokes constant, S 0 is also used to represent RGB mode information, the polarized mode information comprises DoLP and AoLP, wherein DoLP represents the degree of polarization information AoLP represents the polarization angle information, I 0°I90°I45°I135° represents the images captured by the polarized camera recording light in four linear polarization states at angles of 0 °, 45 °, 90 ° and 135 °, respectively, and the obtained RGB mode information and polarized mode information are then used as inputs to the multi-mode fusion network.
S3, the multimode fusion network comprises a feature fusion module and a polarization guiding fusion module. Training the multi-mode fusion network based on the underwater polarized image dataset, and updating RGB modal information and polarized modal information by utilizing a feature fusion module and polarization guiding fusion and generating fusion features during training.
S301, performing robust fusion on the DoLP and AoLP features from the polarized mode input domain by using global and local information by adopting a feature fusion module. Two spatial attention profiles are generated from the two marker embedded sequences provided by two Conformers for the two input features DoLP and AoLP, and the extracted convolution features are then weighted according to the attention profiles and fused together:
Mφ,Mρ=softmax(Ω(Tφ),Ω(Tρ))
Wherein C and T are the convolution features and tag embeddings generated by conv and trans branches in Conformer, respectively, Is element multiplication. M is an attention map generated by phi (AoLP) and rho (DoLP), respectively, omega is a function that first reduces the dimension of each marker embedding to 1 through a fully connected layer, and then reshapes the generated embedding into a two-dimensional map. Features extracted from Conformers in different layers are utilized to capture more details and edge information by utilizing the DoLP and AoLP, and contrast and texture details of different areas in the image are enhanced, so that the true color of the image is restored more accurately.
S302, because the polarization mode and RGB mode deviate greatly, the importance of the clues collected from the RGB mode and the polarization mode is related to a scene, the clues are simply combined together, the influence of the strong clues can be diluted by weak signals, even the adverse influence of the mixed clues can be amplified, in order to deal with the mode deviation, a polarization guiding fusion module is designed, the input characteristic X of the RGB mode is enhanced by using the operation of attention, the input characteristic M of the polarization mode is guided to update and generate a fusion characteristic X *, the polarization mode characteristic and the RGB mode characteristic are connected in series and projected by a multi-layer perceptron to generate key (k x)、query(qx) and value (v x),As a learnable parameter:
[X*]=FC(softmax(FC([qx;kx]))⊙vx)
wherein, as indicated by the symbols, the symbols are multiplied and FC represents the fully connected layer with filtering.
And (3) learning channel statistics S q、Sk of the query and the key by reducing the embedding height and width H, W of the query and the key in the space dimension, so as to obtain a channel relation for guiding update.
Km,Qm,Vm=X,M,kx
M*=Mx+FC((sqQm+skKm)⊙Vm)
And S4, training the enhanced network by utilizing an underwater polarized image data set to obtain the output of the multi-mode fusion network, namely, the fusion characteristic with turbidity to be processed, and inputting the fusion characteristic into an image enhanced network based on a U-Net network so as to obtain a clear underwater image, wherein the image enhanced network based on the U-Net network comprises three parts, namely, an encoder part, a characteristic change part and a decoder part.
For fusion features with turbidity to be processed, the U-Net goal is to restore the corresponding sharp image. However, for the task of detail sensitivity like turbidity removal, only feature conversion in low resolution space cannot be considered, which leads to information loss, so that feature extraction blocks are deployed from the first layer to the third layer in the U-Net network, and different blocks are used in different layers to extract corresponding features, the first layer and the second layer use conventional feature extraction blocks (DEBs), and the third layer uses detail enhancement attention blocks (DEBs) to capture more detail and edge features. Meanwhile, in U-Net, two downsampling operations and two upsampling operations are adopted, the downsampling operations halve the space dimension and double the channel number, the method is realized by a common convolution layer, the step value is set to be 2, the output channel number is set to be 2 times of the input channel number, the upsampling operation can be regarded as an inverse form of the downsampling operation, the downsampling operations are realized by a deconvolution layer, the sizes of a first layer, a second layer and a third layer are C multiplied by H multiplied by W respectively,
Further, the detail enhancement attention block consists of a detail focusing convolution block and a content guiding attention block, which are used for enhancing feature learning so as to improve the haze removal performance.
The detail focusing convolution block integrates prior information by using differential convolution, supplements common convolution information, enhances the characterization capability, and can be equivalently converted into common convolution by using a re-parameterization technology, thereby reducing parameters and calculation cost.
The content-guided attention block consists of channel attention and spatial attention, which in turn are used to calculate the attention weights for the channel and spatial dimensions. Channel attention calculates a channel vector, i.eTo recalibrate the feature. Spatial attention computation of spatial importance map, i.eTo adaptively indicate the information area. The content-guided attention block is non-uniform with respect to the processing of different channels and pixels, thereby improving the denoising performance.
Where max (0, x) represents the ReLU activation function,Denote convolutional layers with a kernel size of k x k, [ ] denote channel connect operations.AndFeatures of global average pooling operation through space dimension, global average pooling operation through channel dimension and global maximum pooling operation processing through channel dimension are respectively represented. To reduce the number of parameters and limit the complexity of the model, the first 1×1 convolution reduces the channel dimension from C toThe second 1 x 1 convolution again expands the channel dimension to C.
Wcoa=Wc+Ws
The content-guided attention block is used to acquire a dedicated spatial importance map of each single channel of the input features from thick to thin, while fully mixing channel attention weights and spatial attention weights to ensure information interaction. According to the broadcasting rule, W c and W s are fused together through simple addition operation to obtain a coarse spatial importance mapSince W c is channel-based, W coa and X are consistent across channels. In order to obtain the final perfect spatial importance map W, each channel of W coa is adjusted according to the corresponding input features. And generating a final specific channel space importance graph W by taking the content of the input characteristics as a guide. In particular, each of the channels of W coa and X are rearranged in an alternating fashion by a channel shuffling operation. The number of parameters can be greatly reduced in combination with subsequent group convolution layers.
Wherein sigma represents a sigmoid operation, CS (·) represents a channel shuffling operation,Representing a group convolution layer with a kernel size of k x k, the number of groups is set to C in an implementation. The content-guided attention mechanism assigns each channel a unique spatial importance map, and the guided model focuses on the important areas of each channel. Thus, more useful information encoded in the features can be emphasized, effectively improving dehazing performance.
And secondly, the dynamic fusion scheme based on the content-guided attention block can effectively fuse the characteristics and help gradient flow, fuse the characteristics after the downsampling operation with the corresponding characteristics before the upsampling operation, and adopt a coder-decoder-like architecture. Fusing the feature F low from the encoder portion with the feature F high from the decoder portion to get F fuse is an effective technique in dehazing and other low-level visual tasks. Low-level features (such as edges and contours) have a non-negligible effect on restoring a sharp image, but gradually lose their effect after passing through many intermediate layers. Feature fusion can enhance information flow from shallow layer to deep layer, and is beneficial to feature preservation and gradient back propagation. The features are modulated by the learned spatial weights using a dynamic fusion scheme based on content-guided attention blocks, thereby adaptively fusing low-level features of the encoder portion with corresponding high-level features. The core selects to compute spatial weights for feature modulation using a content-guided attention mechanism. The low-level features and corresponding high-level features of the encoder section are input to the content-guided attention mechanism to calculate weights, then combined by a weighted summation method, and input features are added by jump connection to alleviate the gradient vanishing problem and simplify the learning process. Finally, mapping the fused features through a3×3 convolution layer to obtain a final clear result.
Ffuse=C1×1(Flow·W+Fhigh·(1-W)+Flow+Fhigh)
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411488920.2A CN119478648B (en) | 2024-10-24 | 2024-10-24 | A method for underwater robot vision clarity based on multimodal fusion network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411488920.2A CN119478648B (en) | 2024-10-24 | 2024-10-24 | A method for underwater robot vision clarity based on multimodal fusion network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119478648A CN119478648A (en) | 2025-02-18 |
CN119478648B true CN119478648B (en) | 2025-07-18 |
Family
ID=94596171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411488920.2A Active CN119478648B (en) | 2024-10-24 | 2024-10-24 | A method for underwater robot vision clarity based on multimodal fusion network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119478648B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN120355594A (en) * | 2025-06-24 | 2025-07-22 | 苏州城市学院 | Sparse aperture optical system polarization image fusion method based on deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740515A (en) * | 2023-05-19 | 2023-09-12 | 中北大学 | CNN-based intensity image and polarization image fusion enhancement method |
CN117048814A (en) * | 2023-09-15 | 2023-11-14 | 南通奇致智能科技有限公司 | High-flexibility underwater intelligent robot |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7420675B2 (en) * | 2003-06-25 | 2008-09-02 | The University Of Akron | Multi-wavelength imaging system |
CN114549548B (en) * | 2022-01-28 | 2024-09-13 | 大连理工大学 | Glass image segmentation method based on polarization clues |
CN117291832A (en) * | 2023-08-25 | 2023-12-26 | 天津市天开海洋科技有限公司 | Underwater polarized image polarization information restoration method based on deep neural network |
-
2024
- 2024-10-24 CN CN202411488920.2A patent/CN119478648B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740515A (en) * | 2023-05-19 | 2023-09-12 | 中北大学 | CNN-based intensity image and polarization image fusion enhancement method |
CN117048814A (en) * | 2023-09-15 | 2023-11-14 | 南通奇致智能科技有限公司 | High-flexibility underwater intelligent robot |
Also Published As
Publication number | Publication date |
---|---|
CN119478648A (en) | 2025-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ikoma et al. | Depth from defocus with learned optics for imaging and occlusion-aware depth estimation | |
CN101422035B (en) | Light source estimation device, light source estimation system, and light source estimation method, and image high resolution device and image high resolution method | |
CN119478648B (en) | A method for underwater robot vision clarity based on multimodal fusion network | |
CN106997581A (en) | A kind of method that utilization deep learning rebuilds high spectrum image | |
Agrafiotis et al. | Underwater photogrammetry in very shallow waters: main challenges and caustics effect removal | |
Singh et al. | Low-light image enhancement for UAVs with multi-feature fusion deep neural networks | |
CN113160053B (en) | An underwater video image restoration and stitching method based on pose information | |
CN113160085B (en) | A method for collecting water splash occlusion image dataset based on generative adversarial network | |
CN115035010A (en) | Underwater image enhancement method based on convolutional network guided model mapping | |
CN112906675A (en) | Unsupervised human body key point detection method and system in fixed scene | |
CN118469842B (en) | A remote sensing image dehazing method based on generative adversarial network | |
CN113592755B (en) | Image reflection elimination method based on panoramic camera | |
CN119784943A (en) | An underwater 3D measurement method based on the fusion of vision and line structured light | |
CN112950481A (en) | Water bloom shielding image data collection method based on image mosaic network | |
CN117094895B (en) | Image panorama stitching method and system | |
Vijayalakshmi et al. | Variants of generative adversarial networks for underwater image enhancement | |
CN115439376B (en) | Compound eye camera multi-focal-length image fusion model, method and device | |
Huang et al. | AFNet: Asymmetric fusion network for monocular panorama depth estimation | |
CN117115038A (en) | An image glare removal system and method based on glare degree estimation | |
Xu et al. | Real-time panoramic map modeling method based on multisource image fusion and three-dimensional rendering | |
CN115471397A (en) | Multimodal Image Registration Method Based on Disparity Estimation | |
CN115034974A (en) | Method, device and storage medium for natural color restoration of visible light and infrared fusion images | |
Tandekar et al. | Underwater Image Enhancement through Deep Learning and Advanced Convolutional Encoders | |
Li et al. | Context convolution dehazing network with channel attention | |
CN107038706A (en) | Infrared image confidence level estimation device and method based on adaptive mesh |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |