[go: up one dir, main page]

WO2020019761A1 - 单目图像深度估计方法及装置、设备、程序及存储介质 - Google Patents

单目图像深度估计方法及装置、设备、程序及存储介质 Download PDF

Info

Publication number
WO2020019761A1
WO2020019761A1 PCT/CN2019/082314 CN2019082314W WO2020019761A1 WO 2020019761 A1 WO2020019761 A1 WO 2020019761A1 CN 2019082314 W CN2019082314 W CN 2019082314W WO 2020019761 A1 WO2020019761 A1 WO 2020019761A1
Authority
WO
WIPO (PCT)
Prior art keywords
monocular image
depth map
preset
depth
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/082314
Other languages
English (en)
French (fr)
Inventor
甘宇康
许翔宇
孙文秀
林倞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to SG11202003878TA priority Critical patent/SG11202003878TA/en
Priority to JP2020542490A priority patent/JP6963695B2/ja
Priority to KR1020207009304A priority patent/KR102292559B1/ko
Publication of WO2020019761A1 publication Critical patent/WO2020019761A1/zh
Priority to US16/830,363 priority patent/US11443445B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to computer vision technology, and in particular, to a monocular image depth estimation method and device, an electronic device, a computer program, and a storage medium.
  • Depth estimation of images is an important issue in the field of computer vision.
  • the depth estimation of images mainly includes: monocular image depth estimation and binocular image depth estimation.
  • monocular image depth estimation is mainly based on monocular image to estimate the depth information of the image, but because monocular image depth estimation is a very challenging problem, many current monocular image depth estimation methods predict The accuracy of the depth information of the monocular image is still poor.
  • the embodiments of the present application provide a monocular image depth estimation technical solution.
  • a method for estimating a monocular image depth including:
  • a predicted depth map of the monocular image is obtained according to the global feature, the absolute feature of each preset region in the monocular image, and the relative feature between the preset regions.
  • the relative features between the preset areas in the monocular image are obtained.
  • obtaining the relative features between the preset regions in the monocular image according to the absolute features of the preset regions in the monocular image includes: :
  • a vector operation is performed on an absolute feature of each preset area in the monocular image through a correlation degree layer to obtain a relative feature between the preset areas in the monocular image.
  • the method before performing feature extraction on the monocular image through the first neural network, the method further includes:
  • the global feature of the monocular image is obtained according to the absolute feature of each preset region in the monocular image and the relative feature between the preset regions, include:
  • the global feature of the monocular image is obtained by combining the absolute features of each preset region in the monocular image and the relative features between the preset regions through a fully connected layer.
  • the obtaining is based on the global feature, the absolute feature of each preset region in the monocular image, and the relative feature between the preset regions.
  • Predicted depth maps for monocular images including:
  • a depth estimation is performed by a depth estimator to obtain a predicted depth map of the monocular image.
  • the obtaining is based on the global feature, the absolute feature of each preset region in the monocular image, and the relative feature between the preset regions.
  • the predicted depth map of the monocular image it also includes:
  • the predicted depth map is optimized according to a vertical change rule of the depth information of the monocular image to obtain a target depth map of the monocular image.
  • the optimizing the predicted depth map according to a vertical change rule of the depth information of the monocular image to obtain a target depth map of the monocular image includes: :
  • the predicted depth map is optimized according to the residual map to obtain a target depth map of the monocular image.
  • the residual estimation is performed on the predicted depth map according to a vertical change rule of the depth information of the monocular image to obtain the residual of the predicted depth map.
  • Figures including:
  • the optimizing the predicted depth map according to the residual map to obtain a target depth map of the monocular image includes:
  • the predicted depth map is optimized according to a vertical change rule of the depth information of the monocular image, and before the target depth map of the monocular image is obtained, Also includes:
  • a longitudinal change rule of the depth information of the monocular image is obtained according to the predicted depth map.
  • the obtaining a vertical change rule of the depth information of the monocular image according to the predicted depth map includes:
  • the predicted depth map is processed by a vertical pooling layer to obtain a vertical change rule of the depth information of the monocular image.
  • the optimizing the predicted depth map according to a vertical change rule of the depth information of the monocular image includes:
  • Up-sampling the predicted depth map a preset number of times, and obtaining a vertical change rule of depth information based on the predicted depth map that is multiplied by the dimension obtained by each upsampling, and sequentially multiplied by the dimension obtained by each upsampling
  • the vertical change rule of the depth information of the increased predicted depth map is optimized by optimizing the predicted depth map in which the dimension obtained by each upsampling is increased by a multiple of one by one in order to obtain an optimized target depth map;
  • the optimized target depth map obtained by each upsampling is used as the predicted depth map for the next upsampling, and the optimized target depth map obtained from the last upsampling is used as the single unit.
  • the depth estimation neural network includes: an association degree layer, a fully connected layer, and a depth estimator, using a sparse depth map and a stereo matching obtained through binocular image stereo matching.
  • a dense depth map is obtained as training data for the depth estimation neural network as labeled data.
  • a monocular image depth estimation device including:
  • a depth estimation neural network configured to obtain global features of the monocular image based on absolute features of each preset region in the monocular image and relative features between the preset regions; and according to the global features, the monocular image An absolute feature of each preset region in the target image and a relative feature between the preset regions are used to obtain a predicted depth map of the monocular image.
  • a first neural network is configured to perform feature extraction on the monocular image, obtain characteristics of each preset area in the monocular image, and use the features of each preset area as pre-images in the monocular image. Set the absolute characteristics of the area;
  • the depth estimation neural network is further configured to obtain the relative features between the preset regions in the monocular image according to the absolute features of the preset regions in the monocular image.
  • the deep estimation neural network includes:
  • the correlation degree layer is configured to perform a vector operation on the absolute features of each preset area in the monocular image to obtain the relative features between the preset areas in the monocular image.
  • any one of the foregoing device embodiments of the present application it further includes:
  • the down-sampling layer is configured to down-sample the monocular image to obtain a monocular image having a preset dimension before performing feature extraction on the monocular image, wherein the dimension of the monocular image is the Multiples of preset dimensions.
  • the deep estimation neural network includes:
  • the fully connected layer is configured to combine the absolute features of each preset area in the monocular image and the relative features between the preset areas to obtain global features of the monocular image.
  • the deep estimation neural network includes:
  • a depth estimator configured to perform a depth estimation according to the global feature, the absolute feature of each preset region in the monocular image, and the relative feature between the preset regions, to obtain a predicted depth map of the monocular image .
  • any one of the foregoing device embodiments of the present application it further includes:
  • a second neural network is configured to optimize the predicted depth map according to a vertical change rule of the depth information of the monocular image to obtain a target depth map of the monocular image.
  • the second neural network is configured to perform residual estimation on the predicted depth map according to a vertical change rule of the depth information of the monocular image to obtain The residual map of the predicted depth map; and optimizing the predicted depth map according to the residual map to obtain a target depth map of the monocular image.
  • the second neural network includes:
  • a residual estimation network configured to perform residual estimation on the predicted depth map according to a vertical change rule of the depth information of the monocular image to obtain a residual map of the predicted depth map;
  • the addition operation unit is configured to perform a pixel-by-pixel superposition operation on the residual map and the predicted depth map to obtain a target depth map of the monocular image.
  • the second neural network is further configured to obtain a vertical change rule of the depth information of the monocular image according to the predicted depth map.
  • the second neural network includes:
  • the vertical pooling layer is configured to process the predicted depth map and obtain a vertical change rule of the depth information of the monocular image.
  • any one of the foregoing device embodiments of the present application it further includes:
  • An upsampling layer configured to perform upsampling on the predicted depth map a preset number of times
  • the vertical pooling layer is configured to obtain a vertical change rule of depth information according to a predicted depth map in which the dimensions obtained by each upsampling are successively increased in multiples;
  • the second neural network is configured to vertically predict the depth information of the predicted depth map in which the dimensions obtained by each upsampling increase in multiples in sequence, and the predicted depth in which the dimensions obtained in each upsampling increase in multiples in order. Map to optimize to obtain the optimized target depth map;
  • the optimized target depth map obtained by each upsampling is used as the predicted depth map for the next upsampling, and the optimized target depth map obtained from the last upsampling is used as the single unit.
  • the depth estimation neural network includes: a correlation degree layer, a fully connected layer, and a depth estimator, using a sparse depth map and a dense depth obtained through stereo matching of binocular images
  • the graph is obtained as labeled data by training the deep estimation neural network.
  • an electronic device including the device described in any one of the foregoing embodiments.
  • an electronic device including:
  • Memory configured to store executable instructions
  • a processor configured to execute the executable instructions to complete the method according to any one of the foregoing embodiments.
  • a computer program including computer-readable code.
  • the computer-readable code runs on a device
  • a processor in the device executes a program for implementing any of the foregoing. Instructions for the methods described in the embodiments.
  • a computer storage medium configured to store computer-readable instructions, and when the instructions are executed, the method according to any one of the foregoing embodiments is implemented.
  • the monocular image depth estimation method and device Based on the monocular image depth estimation method and device, electronic device, computer program, and storage medium provided by the foregoing embodiments of this application, based on the depth estimation neural network, according to the absolute characteristics of each preset region in the monocular image and the Obtain the global features of the monocular image, and obtain the predicted depth map of the monocular image based on the global features, the absolute features of each preset area in the monocular image, and the relative features between the preset areas.
  • the relative and absolute features of each preset region in the image are used to complement each other, which improves the accuracy of the relative distance prediction in the depth estimation, thereby improving the accuracy of the monocular image depth estimation.
  • FIG. 1 is a flowchart of a monocular image depth estimation method according to some embodiments of the present application.
  • FIG. 2 is a flowchart of a monocular image depth estimation method according to another embodiment of the present application.
  • 3 is a flowchart of optimization of each scale when multi-scale learning is used for optimization according to an embodiment of the present application
  • 4A to 4C are schematic diagrams of a network structure for implementing a monocular image depth estimation method according to some embodiments of the present application.
  • FIG. 5 is a schematic structural diagram of a monocular image depth estimation device according to some embodiments of the present application.
  • FIG. 6 is a schematic structural diagram of a monocular image depth estimation device according to another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a monocular image depth estimation device according to still another embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by some embodiments of the present application.
  • the embodiments of the present application can be applied to a computer system / server, which can operate with many other general or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and / or configurations suitable for use with computer systems / servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and so on.
  • a computer system / server may be described in the general context of computer system executable instructions, such as program modules, executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and so on, which perform specific tasks or implement specific abstract data types.
  • the computer system / server can be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including a storage device.
  • FIG. 1 is a flowchart of a monocular image depth estimation method according to some embodiments of the present application.
  • the method includes:
  • the global features of the monocular image are obtained according to the absolute features of each preset region and the relative features between the preset regions in the monocular image.
  • the monocular image may be an image acquired from an image acquisition device or an image acquired from a storage device.
  • the image acquisition device may be a camera, a video camera, a scanner, etc.
  • the storage device may be a USB flash drive.
  • Optical disc, hard disk, etc. this embodiment does not limit the manner of acquiring the monocular image.
  • the absolute features of each preset area in the monocular image can be used to represent the local appearance of each preset area in the monocular image. For example, it can include texture features, geometric features, and so on.
  • the relative features between the preset areas in the monocular image can be used to represent the differences between the local appearances of the preset areas in the monocular image.
  • each preset area in the monocular image can be set according to the characteristics of the image.
  • the depth map in this embodiment refers to an image in which the distance between each pixel in the image and the image acquisition device is represented by the pixel value of each pixel in the image.
  • the global features of the monocular image can be obtained by combining the absolute features of each preset region in the monocular image and the relative features between the preset regions through a fully connected layer.
  • a depth estimation can be performed by a depth estimator to obtain a monocular according to the global features of the monocular image, the absolute features of each preset region in the monocular image, and the relative features between the preset regions.
  • the predicted depth map of the image can be used by the depth estimator.
  • the full convolutional network is mainly composed of convolutional layers and deconvolutional layers. It can be based on the geometric distribution information of the image, that is, the global characteristics of the image,
  • the absolute features and the relative features between the preset regions return the depth values of each pixel in the image to obtain a predicted depth map.
  • the depth estimation method of the monocular image provided in this embodiment is based on the depth estimation neural network, and obtains the global features of the monocular image according to the absolute features of each preset area in the monocular image and the relative features between the preset areas. Global features, absolute features of each preset area in the monocular image, and relative features between the preset areas, to obtain a predicted depth map of the monocular image.
  • the preset areas in the image are used.
  • the relative features and absolute features of each other complement each other, which improves the accuracy of the relative distance prediction in the depth estimation, thereby improving the accuracy of the monocular image depth estimation.
  • the monocular image may be further characterized by the first neural network. Extract and obtain the characteristics of each preset area in the monocular image, and use the characteristics of each preset area as the absolute characteristics of each preset area in the monocular image, and then obtain according to the absolute characteristics of each preset area in the monocular image.
  • the first neural network may use an encoder network composed of a convolution layer and a pooling layer. The monocular image is subjected to feature extraction through the encoder network to obtain high-dimensional features of the image.
  • the absolute feature of each preset area in the monocular image may be subjected to vector operation through the correlation degree layer to obtain the relative features between the preset areas in the monocular image.
  • the relative features between the preset areas in the image can be the relative features between the preset areas in the image and the preset areas in the surrounding preset range.
  • the presets in the monocular image can be determined by A feature product between a region and a preset region within a preset range around the region is subjected to a dot product operation to obtain the relative features between the preset regions in the monocular image.
  • the monocular image may also be down-sampled to obtain a monocular image having a predetermined dimension, and the monocular image having a predetermined dimension is used as Depth estimation neural networks perform monocular images of depth estimation to reduce the amount of calculation and increase the speed of data processing.
  • the dimension of the monocular image is a multiple of the preset dimension, for example, the dimension of the monocular image is 8 times the preset dimension.
  • the depth change of the image in the vertical direction is greater than the depth change in the horizontal direction.
  • the road in the image often extends vertically to a place farther from the camera. It can be seen that the image depth
  • the vertical change of information will help to estimate the absolute distance of the image. Therefore, we can use the vertical change rule of the monocular image depth information for the depth estimation of the monocular image. For example, we can optimize the predicted depth map according to the vertical change rule of the monocular image depth information.
  • a predicted depth map of the monocular image is obtained according to the global feature, the absolute feature of each preset region in the monocular image, and the relative feature between the preset regions. After that, you can also include:
  • a residual estimation is performed on the predicted depth map to obtain a residual map of the predicted depth map, and then the predicted depth map is optimized according to the residual map to obtain a monocular image.
  • Target depth map is performed on the predicted depth map to obtain a monocular image.
  • the residual depth estimation can be performed on the predicted depth map through the residual estimation network to obtain the residual map of the predicted depth map, and then the residual map and prediction The depth map is superimposed pixel by pixel to obtain the target depth map of the monocular image.
  • the vertical change rule of the depth information of the monocular image may also be obtained according to the predicted depth map.
  • the predicted depth map may be processed by the vertical pooling layer to obtain the vertical change rule of the depth information of the monocular image.
  • the vertical pooling layer can use a column vector as a pooling kernel to pool the predicted depth map.
  • the vertical pooling layer can use a pooling kernel of size H ⁇ 1 to average the pool of predicted depth maps. Processing, where H is an integer greater than 1.
  • the depth estimation method of the monocular image provided in this embodiment is based on the depth estimation neural network, and obtains the global features of the monocular image according to the absolute features of each preset area in the monocular image and the relative features between the preset areas. Global features, absolute features of each preset region in the monocular image, and relative features between the preset regions, to obtain the predicted depth map of the monocular image, and optimize the predicted depth map according to the vertical change rule of the depth information of the monocular image.
  • the accuracy of the relative distance prediction in depth estimation is also improved.
  • the vertical change rule of image depth information is optimized, which improves the accuracy of absolute distance prediction in depth estimation, so that the accuracy of depth estimation of monocular images can be improved comprehensively.
  • the monocular image before the monocular image is subjected to feature extraction through the first neural network, the monocular image is down-sampled to obtain a monocular image having a predetermined dimension, and the monocular image having a predetermined dimension is used.
  • the predicted depth map is optimized according to the vertical change rule of the depth information of the monocular image.
  • a multi-scale learning method can be used to improve the accuracy of the monocular image depth estimation.
  • a predetermined number of upsampling can be performed on the predicted depth map, and the vertical change rule of depth information is obtained according to the predicted depth map in which the dimension obtained by each upsampling is multiplied in turn.
  • the depth change of the depth information of the predicted depth map that is increased by multiples in sequence is optimized.
  • the predicted depth map of which the dimension obtained by each upsampling is sequentially increased by multiples is optimized to obtain the optimized target depth map.
  • the optimized target depth map obtained by each upsampling is used as the predicted depth map of the next upsampling, and the optimized target depth map obtained by the last upsampling is used as a single
  • the target depth map of the target image has the same dimensions as the monocular image.
  • the method includes:
  • the predicted depth map having the first preset dimension may be a predicted depth map obtained from a depth estimation neural network, or may be an optimized target depth map obtained from a previous scale optimization process.
  • the second preset dimension is a multiple of the first preset dimension, and the sizes of the first preset dimension and the second preset dimension may be determined according to the number of upsampling, the frequency, and the size of the monocular image.
  • the predicted depth map having the first preset dimension may be up-sampled by the upsampling layer to obtain the predicted depth map having the second preset dimension.
  • the predicted depth map having the second preset dimension may be processed by the vertical pooling layer to obtain the vertical change rule of the corresponding depth information.
  • a residual estimation may be performed on a predicted depth map having a second preset dimension through a residual estimation network according to a vertical change rule of corresponding depth information to obtain a corresponding residual map.
  • a corresponding target depth map having a second preset dimension may be obtained by performing a pixel-by-pixel superposition operation on a corresponding residual map and a predicted depth map having a second preset dimension.
  • 4A to 4C are schematic diagrams of a network structure for implementing a monocular image depth estimation method according to some embodiments of the present application.
  • the network that implements the monocular image depth estimation method of the embodiment of the present application includes a convolutional neural network, a depth estimation neural network, and a depth-optimized neural network.
  • the convolutional neural network includes a downsampling layer and a first neural network.
  • the downsampling layer performs 8 times downsampling of the monocular image to obtain a monocular image with a dimension of 1/8 of the monocular image, and then passes the first neural network.
  • Feature extraction is performed on a 1 / 8-dimensional monocular image to obtain absolute features of each preset region in the 1 / 8-dimensional monocular image.
  • the depth estimation neural network includes a correlation degree layer, a fully connected layer, and a depth estimator, where the correlation degree layer can obtain 1 according to the absolute characteristics of each preset region in the monocular image in the 1/8 dimension.
  • the fully connected layer can be based on the absolute features of the preset areas and the relative features between the preset regions in the monocular image of the 1/8 dimension.
  • the depth estimator can calculate the global features of the image in the 1/8 dimension, the absolute features of each preset region in the monocular image in the 1/8 dimension, and the The relative features are obtained, and a predicted depth map of 1/8 dimension is obtained.
  • the deep optimization neural network includes a first-scale optimization network, a second-scale optimization network, and a third-scale optimization network.
  • the structure of each scale optimization network includes: upsampling Layer, vertical pooling layer, residual estimation network, and addition unit.
  • the upsampling layer of the first-scale optimized network can up-sample the 1 / 8th-dimensional predicted depth map to obtain a 1 / 4-dimensional predicted depth map
  • the vertical pooling layer of the first-scale optimized network can be based on 1
  • the predicted depth map of the / 4 dimension obtains the vertical change rule of the corresponding depth information.
  • the residual estimation network of the first-scale optimized network can follow the vertical change rule of the depth information corresponding to the predicted depth map of the 1/4 dimension.
  • the 4-dimensional predicted depth map is used to estimate the residuals to obtain the corresponding residual map.
  • the addition unit of the first-scale optimized network can perform a pixel-by-pixel superposition operation on the corresponding residual map and the 1 / 4-dimensional predicted depth map to obtain
  • the optimized depth map of the 1/4 dimension can be used as the predicted depth map of the second-scale optimized network.
  • the up-sampling layer of the second-scale optimized network can up-sample the 1 / 4-dimensional target depth map after optimization to obtain a 1 / 2-dimensional predicted depth map.
  • the second-scale optimized network's vertical pooling layer can be based on 1
  • the / 2-dimensional predicted depth map obtains the vertical change rule of the corresponding depth information.
  • the residual estimation network of the second-scale optimized network can calculate the vertical change rule of the depth information corresponding to the 1 / 2-dimensional predicted depth map.
  • the 2-dimensional prediction depth map is used to estimate the residual error to obtain the corresponding residual map.
  • the addition unit of the second-scale optimized network can perform pixel-by-pixel superposition operation on the corresponding residual map and 1 / 2-dimensional prediction depth map to obtain
  • the target depth map of the optimized 1/2 dimension can be used as the predicted depth map of the third-scale optimized network.
  • the upsampling layer of the third-scale optimized network can up-sample the target depth map with 1/2 dimension after optimization to obtain the predicted depth map with the same dimensions as the monocular image.
  • the vertical pooling of the third-scale optimized network The layer can obtain the vertical change rule of the corresponding depth information according to the predicted depth map with the same dimensions as the monocular image.
  • the residual estimation network of the third scale optimization network can predict the depth map with the same dimensions as the monocular image.
  • the vertical change rule of the corresponding depth information, the residual depth estimation is performed on the predicted depth map with the same dimensions as the monocular image, and the corresponding residual map is obtained.
  • the addition unit of the third-scale optimized network can perform the corresponding residual map.
  • the predicted depth map with the same dimensions as the monocular image is superimposed pixel by pixel to obtain a target depth map with the same optimized dimension as that of the monocular image, and the optimized depth map is used as the target depth map of the monocular image. .
  • the depth estimation neural network of each of the foregoing embodiments may be obtained through dense superimposition and sparse depth maps obtained through stereo matching of binocular images as labeled data, and obtained through semi-supervised training.
  • the depth map obtained by binocular matching is used as the “labeled data” of the training data.
  • the monocular image depth estimation method provided in the embodiments of the present application can be used in the fields of scene geometric structure analysis, automatic driving, assisted driving, target tracking, and robot autonomous obstacle avoidance.
  • the monocular image depth estimation method provided in the embodiment of the present application may be used to predict the distance of a preceding vehicle or a pedestrian.
  • the depth information predicted by the monocular image depth estimation method provided in the embodiment of the present application may be used for monocular blurring operation; using the prediction result of the monocular image depth estimation method provided by the embodiment of the present application may help improve Object tracking algorithm.
  • FIG. 5 is a schematic structural diagram of a monocular image depth estimation device according to some embodiments of the present application.
  • the apparatus includes: a deep estimation neural network 510. among them,
  • the depth estimation neural network 510 is configured to obtain the global features of the monocular image based on the absolute features of each preset region in the monocular image and the relative features between the preset regions; Set the absolute features of the area and the relative features between the preset areas to obtain the predicted depth map of the monocular image.
  • the monocular image may be an image acquired from an image acquisition device or an image acquired from a storage device.
  • the image acquisition device may be a camera, a video camera, a scanner, etc.
  • the storage device may be a USB flash drive.
  • Optical disc, hard disk, etc. this embodiment does not limit the manner of acquiring the monocular image.
  • the absolute features of each preset area in the monocular image can be used to represent the local appearance of each preset area in the monocular image. For example, it can include texture features, geometric features, and so on.
  • the relative features between the preset areas in the monocular image can be used to represent the differences between the local appearances of the preset areas in the monocular image.
  • each preset area in the monocular image can be set according to the characteristics of the image.
  • the depth map in this embodiment refers to an image in which the distance between each pixel in the image and the image acquisition device is represented by the pixel value of each pixel in the image.
  • the depth estimation neural network 510 may include: a fully connected layer 511 configured to combine an absolute feature of each preset region in the monocular image and a relative between the preset regions. Feature to obtain the global features of the monocular image.
  • the depth estimation neural network 510 may further include a depth estimator 512 configured to perform depth estimation based on the global features of the monocular image, the absolute features of each preset region in the monocular image, and the relative features between the preset regions, Obtain a predicted depth map of a monocular image.
  • the depth estimator can use a full convolutional network.
  • the full convolutional network is mainly composed of convolutional layers and deconvolutional layers. It can be based on the geometric distribution information of the image, that is, the global characteristics of the image,
  • the absolute features and the relative features between the preset regions return the depth values of each pixel in the image to obtain a predicted depth map.
  • the monocular image depth estimation device obtains global features of a monocular image based on the depth estimation neural network based on the absolute features of each preset region in the monocular image and the relative features between the preset regions. Global features, absolute features of each preset area in the monocular image, and relative features between the preset areas, to obtain a predicted depth map of the monocular image.
  • the preset areas in the image are used.
  • the relative features and absolute features of each other complement each other, which improves the accuracy of the relative distance prediction in the depth estimation, thereby improving the accuracy of the monocular image depth estimation.
  • FIG. 6 is a schematic structural diagram of a monocular image depth estimation device according to another embodiment of the present application.
  • the device further includes: a first neural network 620. among them,
  • the first neural network 620 is configured to perform feature extraction on the monocular image, obtain the features of each preset area in the monocular image, and use the features of each preset area as the absolute features of each preset area in the monocular image.
  • the first neural network may use an encoder network composed of a convolution layer and a pooling layer.
  • the monocular image is subjected to feature extraction through the encoder network to obtain high-dimensional features of the image.
  • the depth estimation neural network 610 is further configured to obtain the relative features between the preset regions in the monocular image according to the absolute features of the preset regions in the monocular image.
  • the depth estimation neural network 610 may further include a correlation degree layer 613 configured to perform a vector operation on the absolute features of each preset region in the monocular image via the correlation degree layer.
  • the relative features between the preset areas in the image can be the relative features between the preset areas in the image and the preset areas in the surrounding preset range.
  • the presets in the monocular image can be determined by A feature product between a region and a preset region within a preset range around the region is subjected to a dot product operation to obtain the relative features between the preset regions in the monocular image.
  • the device may further include a downsampling layer configured to downsample the monocular image to obtain a monocular image with a preset dimension before performing feature extraction on the monocular image.
  • the depth estimation neural network 610 is to perform depth estimation on a monocular image with a preset dimension to reduce the amount of calculation and improve the speed of data processing.
  • the dimension of the monocular image is a multiple of the preset dimension, for example, the dimension of the monocular image is 8 times the preset dimension.
  • FIG. 7 is a schematic structural diagram of a monocular image depth estimation apparatus according to another embodiment of the present application.
  • the device further includes a second neural network 730. among them,
  • the second neural network 730 is configured to optimize the predicted depth map according to the vertical change rule of the depth information of the monocular image to obtain a target depth map of the monocular image.
  • the second neural network 730 is configured to perform residual estimation on the predicted depth map according to the vertical change rule of the depth information of the monocular image, obtain a residual map of the predicted depth map, and then perform prediction on the predicted depth map according to the residual map. Optimize to obtain the target depth map of the monocular image.
  • the second neural network 730 may include: a residual estimation network 731 configured to use the residual estimation network to predict the depth based on the vertical change rule of the depth information of the monocular image.
  • the residual estimation is performed on the map to obtain a residual map of the predicted depth map.
  • the addition operation unit 732 is configured to perform a pixel-by-pixel superposition operation on the residual map and the predicted depth map to obtain a target depth map of the monocular image.
  • the second neural network 730 is further configured to obtain a vertical change rule of the depth information of the monocular image according to the predicted depth map.
  • the second neural network 730 may further include a vertical pooling layer 733 configured to process the predicted depth map through the vertical pooling layer to obtain depth information of the monocular image. Law of vertical change.
  • the vertical pooling layer can use a column vector as a pooling kernel to pool the predicted depth map.
  • the vertical pooling layer can use a pooling kernel of size H ⁇ 1 to average the pool of predicted depth maps. Processing, where H is an integer greater than 1.
  • the monocular image depth estimation device obtains global features of a monocular image based on the depth estimation neural network based on the absolute features of each preset region in the monocular image and the relative features between the preset regions. Global features, absolute features of each preset region in the monocular image, and relative features between the preset regions, to obtain the predicted depth map of the monocular image, and optimize the predicted depth map according to the vertical change rule of the depth information of the monocular image.
  • the accuracy of the relative distance prediction in depth estimation is also improved.
  • the vertical change rule of image depth information is optimized, which improves the accuracy of absolute distance prediction in depth estimation, so that the accuracy of depth estimation of monocular images can be improved comprehensively.
  • the monocular image before the monocular image is subjected to feature extraction by the first neural network, the monocular image is down-sampled through a down-sampling layer to obtain a monocular image having a preset dimension, and
  • a dimensional monocular image is used as a monocular image for depth estimation by a depth estimation neural network
  • the predicted depth image is optimized according to the vertical change rule of the monocular image depth information.
  • Multi-scale learning methods can be used to improve the monocular image. Accuracy of depth estimation.
  • the apparatus may further include: an upsampling layer configured to perform upsampling for a predetermined number of times on the predicted depth map; a longitudinal pooling layer configured to sequentially increase the predictions based on the dimensionality obtained by each upsampling
  • the depth map obtains the vertical change rule of depth information
  • the second neural network is configured to predict the vertical change rule of the depth information of the depth map in accordance with the dimensionality obtained by each upsampling in order to increase by multiples in turn, and for each upsampled dimensionality Optimize the predicted depth maps that are multiplied in order to obtain the optimized target depth map.
  • the optimized target depth map obtained by each upsampling is used as the predicted depth map of the next upsampling, and the optimized target depth map obtained by the last upsampling is used as a single
  • the target depth map of the target image has the same dimensions as the monocular image.
  • the depth estimation neural network of each of the foregoing embodiments may be obtained through dense superimposition and sparse depth maps obtained through stereo matching of binocular images as labeled data, and obtained through semi-supervised training.
  • the depth map obtained by stereo matching of the binocular image is used as the training data.
  • An embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 8 illustrates a schematic structural diagram of an electronic device 800 suitable for implementing a terminal device or a server in the embodiment of the present application.
  • the electronic device 800 includes one or more processors and a communication unit.
  • the one or more processors are, for example, one or more central processing units (CPUs) 801, and / or one or more image processors (GPUs) 813, etc.
  • the processors may be stored in a read-only memory (ROM) 802 or executable instructions loaded from the storage section 808 into a random access memory (RAM) 803 to perform various appropriate actions and processes.
  • the communication unit 812 may include, but is not limited to, a network card.
  • the network card may include, but is not limited to, an IB (Infiniband) network card.
  • the processor may communicate with the read-only memory 802 and / or RAM 803 to execute executable instructions, and communicate with the communication unit through the bus 804.
  • the communication unit 812 is connected, and communicates with other target devices via the communication unit 812, thereby completing the operation corresponding to any of the methods provided in the embodiments of the present application, for example, based on the depth estimation neural network, and based on the absolute characteristics of each preset region in the monocular image And the relative features between the preset regions to obtain the global features of the monocular image; according to the global features, the absolute features of the preset regions in the monocular image, and the relative features between the preset regions To obtain a predicted depth map of the monocular image.
  • RAM 803 can also store various programs and data required for device operation.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • ROM802 is an optional module.
  • the RAM 803 stores executable instructions, or writes executable instructions to the ROM 802 at runtime, and the executable instructions cause the central processing unit 801 to perform operations corresponding to the foregoing communication method.
  • An input / output (I / O) interface 805 is also connected to the bus 804.
  • the communication unit 812 may be provided in an integrated manner, or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and connected on a bus link.
  • the following components are connected to the I / O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output portion 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 808 including a hard disk and the like ; And a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet.
  • the driver 810 is also connected to the I / O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that a computer program read out therefrom is installed into the storage section 808 as needed.
  • FIG. 8 is only an optional implementation manner. In the specific practice process, the number and types of components in FIG. 8 may be selected, deleted, added or replaced according to actual needs. Different functional component settings can also be implemented in separate settings or integrated settings. For example, GPU813 and CPU801 can be set separately or GPU813 can be integrated on CPU801. The communication department can be set separately or integrated on CPU801 or GPU813. and many more. These alternative embodiments all fall into the protection scope disclosed in this application.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present application include a computer program product including a computer program tangibly embodied on a machine-readable medium.
  • the computer program includes program code for performing a method shown in a flowchart, and the program code may include a corresponding
  • the instructions corresponding to the method steps provided in the embodiments of the present application are executed, for example, based on a depth estimation neural network, to obtain the monocular image according to the absolute features of each preset region in the monocular image and the relative features between the preset regions.
  • the computer program may be downloaded and installed from a network through the communication section 809, and / or installed from a removable medium 811.
  • a central processing unit (CPU) 801 the above-mentioned functions defined in the method of the present application are executed.
  • an embodiment of the present application further provides a computer program program product configured to store computer-readable instructions.
  • the computer executes any of the foregoing possible implementation manners. Image recovery method.
  • the computer program product may be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK) and the like.
  • SDK Software Development Kit
  • an embodiment of the present application further provides a monocular image depth estimation method and a corresponding device, an electronic device, a computer storage medium, a computer program, and a computer program product, wherein the method
  • the method includes: the first device sends a monocular image depth estimation instruction to the second device, and the instruction causes the second device to execute the monocular image depth estimation method in any of the foregoing possible embodiments; the first device receives the monocular image depth estimation method sent by the second device; The result of the target image depth estimation.
  • the monocular image depth estimation instruction may be specifically a calling instruction.
  • the first device may instruct the second device to perform the monocular image depth estimation in a calling manner. Accordingly, in response to receiving the calling instruction, the second device The device may perform the steps and / or processes in any of the embodiments of the above-mentioned monocular image depth estimation method.
  • a plurality may refer to two or more, and “at least one” may refer to one, two, or more.
  • the methods and apparatus of the present application may be implemented in many ways.
  • the methods and devices of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-mentioned order of the steps of the method is for illustration only, and the steps of the method of the present application are not limited to the order specifically described above, unless otherwise specifically stated.
  • the present application may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing a method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例公开了一种单目图像深度估计方法及装置、设备、计算机程序及存储介质,其中,方法包括:基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。本申请实施例可以提高单目图像深度估计的准确度。

Description

单目图像深度估计方法及装置、设备、程序及存储介质
相关申请的交叉引用
本申请基于申请号为201810845040.4、申请日为2018年07月27日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机视觉技术,尤其是一种单目图像深度估计方法及装置、电子设备、计算机程序及存储介质。
背景技术
图像的深度估计是计算机视觉领域的重要问题,目前图像的深度估计主要包括:单目图像深度估计和双目图像深度估计。其中,单目图像深度估计主要是基于单目图像来估计图像的深度信息,但由于单目图像深度估计是一个非常具有挑战性的问题,因此目前很多现有的单目图像深度估计方法预测的单目图像的深度信息的准确度仍然较差。
发明内容
本申请实施例提供一种单目图像深度估计技术方案。
根据本申请实施例的一个方面,提供一种单目图像深度估计方法,包括:
基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;
根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设 区域之间的相对特征,获得所述单目图像的预测深度图。
可选地,在本申请上述方法实施例中,在所述根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征之前,还包括:
将所述单目图像经第一神经网络进行特征提取,获取所述单目图像中各预设区域的特征,并将所述各预设区域的特征作为所述单目图像中各预设区域的绝对特征;
根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征。
可选地,在本申请上述任一方法实施例中,所述根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征,包括:
对所述单目图像中各预设区域的绝对特征经关联度层进行矢量运算,获得所述单目图像中各预设区域之间的相对特征。
可选地,在本申请上述任一方法实施例中,在将所述单目图像经第一神经网络进行特征提取之前,还包括:
对所述单目图像进行下采样,获得具有预设维度的单目图像;其中,所述单目图像的维度为所述预设维度的倍数。
可选地,在本申请上述任一方法实施例中,所述根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征,包括:
通过全连接层结合所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征。
可选地,在本申请上述任一方法实施例中,所述根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获 得所述单目图像的预测深度图,包括:
根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,通过深度估计器进行深度估计,获得所述单目图像的预测深度图。
可选地,在本申请上述任一方法实施例中,所述根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图之后,还包括:
根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图。
可选地,在本申请上述任一方法实施例中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图,包括:
根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;
根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图。
可选地,在本申请上述任一方法实施例中,所述根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图,包括:
根据所述单目图像深度信息的纵向变化规律,通过残差估计网络对所述预测深度图进行残差估计,获得所述预测深度图的残差图;
所述根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图,包括:
对所述残差图和所述预测深度图进行逐像素叠加运算,获得所述单目图像的目标深度图。
可选地,在本申请上述任一方法实施例中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图之前,还包括:
根据所述预测深度图获取所述单目图像深度信息的纵向变化规律。
可选地,在本申请上述任一方法实施例中,所述根据所述预测深度图获取所述单目图像深度信息的纵向变化规律,包括:
通过纵向池化层对所述预测深度图进行处理,获取所述单目图像深度信息的纵向变化规律。
可选地,在本申请上述任一方法实施例中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,包括:
对所述预测深度图进行预设次数的上采样,根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律,根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图;
其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图作为下一次上采样的预测深度图,最末一次上采样获得的优化后的目标深度图作为所述单目图像的目标深度图,所述目标深度图的维度与所述单目图像的维度相同。
可选地,在本申请上述任一方法实施例中,其中,所述深度估计神经网络包括:关联度层、全连接层和深度估计器,利用稀疏深度图和通过双目图像立体匹配获得的稠密深度图作为标注数据对所述深度估计神经网络进行训练获得。
根据本申请实施例的另一个方面,提供一种单目图像深度估计装置,包括:
深度估计神经网络,配置为根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;以及根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。
可选地,在本申请上述装置实施例中,还包括:
第一神经网络,配置为对所述单目图像进行特征提取,获取所述单目图像中各预设区域的特征,并将所述各预设区域的特征作为所述单目图像中各预设区域的绝对特征;
所述深度估计神经网络,还用于根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征。
可选地,在本申请上述任一装置实施例中,所述深度估计神经网络,包括:
关联度层,配置为对所述单目图像中各预设区域的绝对特征进行矢量运算,获得所述单目图像中各预设区域之间的相对特征。
可选地,在本申请上述任一装置实施例中,还包括:
下采样层,配置为在对所述单目图像进行特征提取之前,对所述单目图像进行下采样,获得具有预设维度的单目图像;其中,所述单目图像的维度为所述预设维度的倍数。
可选地,在本申请上述任一装置实施例中,所述深度估计神经网络,包括:
全连接层,配置为结合所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征。
可选地,在本申请上述任一装置实施例中,所述深度估计神经网络,包括:
深度估计器,配置为根据所述全局特征、所述单目图像中各预设区域 的绝对特征和各预设区域之间的相对特征,进行深度估计,获得所述单目图像的预测深度图。
可选地,在本申请上述任一装置实施例中,还包括:
第二神经网络,配置为根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图。
可选地,在本申请上述任一装置实施例中,所述第二神经网络,配置为根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;以及根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图。
可选地,在本申请上述任一装置实施例中,所述第二神经网络,包括:
残差估计网络,配置为根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;
加法运算单元,配置为对所述残差图和所述预测深度图进行逐像素叠加运算,获得所述单目图像的目标深度图。
可选地,在本申请上述任一装置实施例中,所述第二神经网络,还用于根据所述预测深度图获取所述单目图像深度信息的纵向变化规律。
可选地,在本申请上述任一装置实施例中,所述第二神经网络,包括:
纵向池化层,配置为对所述预测深度图进行处理,获取所述单目图像深度信息的纵向变化规律。
可选地,在本申请上述任一装置实施例中,还包括:
上采样层,配置为对所述预测深度图进行预设次数的上采样;
纵向池化层,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律;
所述第二神经网络,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维 度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图;
其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图作为下一次上采样的预测深度图,最末一次上采样获得的优化后的目标深度图作为所述单目图像的目标深度图,所述目标深度图的维度与所述单目图像的维度相同。
可选地,在本申请上述任一装置实施例中,所述深度估计神经网络包括:关联度层、全连接层和深度估计器,利用稀疏深度图和通过双目图像立体匹配获得的稠密深度图作为标注数据对所述深度估计神经网络进行训练获得。
根据本申请实施例的又一个方面,提供的一种电子设备,包括上述任一实施例所述的装置。
根据本申请实施例的再一个方面,提供的一种电子设备,包括:
存储器,配置为存储可执行指令;以及
处理器,配置为执行所述可执行指令从而完成上述任一实施例所述的方法。
根据本申请实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现上述任一实施例所述方法的指令。
根据本申请实施例的再一个方面,提供的一种计算机存储介质,配置为存储计算机可读指令,所述指令被执行时实现上述任一实施例所述的方法。
基于本申请上述实施例提供的单目图像深度估计方法及装置、电子设备、计算机程序及存储介质,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征,根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之 间的相对特征,获得单目图像的预测深度图,通过在单目图像深度估计中,利用图像中各预设区域的相对特征与绝对特征相互补充,提高了深度估计中相对距离预测的准确度,从而可以提高单目图像深度估计的准确度。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请一些实施例的单目图像深度估计方法的流程图;
图2为本申请另一些实施例的单目图像深度估计方法的流程图;
图3为本申请实施例采用多尺度学习进行优化时每一个尺度优化的流程图;
图4A至图4C为实现本申请一些实施例的单目图像深度估计方法的网络结构的示意图;
图5为本申请一些实施例的单目图像深度估计装置的结构示意图;
图6为本申请另一些实施例的单目图像深度估计装置的结构示意图;
图7为本申请又一些实施例的单目图像深度估计装置的结构示意图;
图8是本申请一些实施例提供的电子设备的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本申请一些实施例的单目图像深度估计方法的流程图。
如图1所示,该方法包括:
102,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征。
在本实施例中,单目图像可以是从图像采集设备获取的图像,也可以是从存储装置获取的图像,例如:图像采集设备可以为照相机、摄像机、扫描仪等,存储装置可以为U盘、光盘、硬盘等,本实施例对单目图像的获取方式不作限定。其中,单目图像中各预设区域的绝对特征可以用来表示单目图像中各预设区域的局部外观,例如:它可以包括纹理特征、几何特征等。单目图像中各预设区域之间的相对特征可以用来表示单目图像中各预设区域局部外观之间的差异性,例如:它可以包括纹理差异、几何差异等。单目图像中的各预设区域可以根据图像的特征设定。本实施例的深度图是指以图像中各像素的像素值表征图像中的各像素到图像采集设备之间的距离的图像。
在一个可选的例子中,可以通过全连接层结合单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征。
104,根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图。
在一个可选的例子中,可以根据单目图像的全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,通过深度估计器进行深度估计,获得单目图像的预测深度图。例如:深度估计器可以采用全卷积网络,全卷积网络主要由卷积层和反卷积层组成,它可以根据图像的几何分布信息,即图像的全局特征、图像中各预设区域的绝对特征和各预设区域之间的相对特征,回归出图像中各像素的深度值,从而获得预测深度图。
本实施例提供的单目图像深度估计方法,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征,根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图,通过在单目 图像深度估计中,利用图像中各预设区域的相对特征与绝对特征相互补充,提高了深度估计中相对距离预测的准确度,从而可以提高单目图像深度估计的准确度。
可选地,在根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征之前,还可以将单目图像经第一神经网络进行特征提取,获取单目图像中各预设区域的特征,并将各预设区域的特征作为单目图像中各预设区域的绝对特征,然后根据单目图像中各预设区域的绝对特征,获取单目图像中各预设区域之间的相对特征。例如:第一神经网络可以采用由于一个卷积层和池化层组成的编码器网络,单目图像经编码器网络进行特征提取,可以获得图像的高维特征。
在一个可选的例子中,可以对单目图像中各预设区域的绝对特征经关联度层进行矢量运算,获得单目图像中各预设区域之间的相对特征。其中,图像中各预设区域之间的相对特征,可以为图像中各预设区域与其周边预设范围内的预设区域之间的相对特征,例如:可以通过对单目图像中各预设区域与其周边预设范围内的预设区域之间的特征向量,进行点积运算,获得单目图像中各预设区域之间的相对特征。
可选地,在将单目图像经第一神经网络进行特征提取之前,还可以对单目图像进行下采样,获得具有预设维度的单目图像,并以具有预设维度的单目图像作为深度估计神经网络进行深度估计的单目图像,以减少计算量,提高数据处理的速度。其中,单目图像的维度为预设维度的倍数,例如:单目图像的维度为预设维度的8倍。
通常,图像在垂直方向上的深度变化比在水平方向上的深度变化更大,例如:在驾驶场景中,图像中的道路往往是沿垂直方向延伸到距离摄像机更远的地方,可见,图像深度信息的纵向变化规律将有助于对图像绝对距离的估计。因此,我们可以将单目图像深度信息的纵向变化规律用于单目 图像的深度估计,例如:可以根据单目图像深度信息的纵向变化规律对预测深度图进行优化。
在一些实施例中,如图2所示,在操作204在根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图之后,还可以包括:
206,根据单目图像深度信息的纵向变化规律对预测深度图进行优化,获得单目图像的目标深度图。
可选地,可以根据单目图像深度信息的纵向变化规律,对预测深度图进行残差估计,获得预测深度图的残差图,然后根据残差图对预测深度图进行优化,获得单目图像的目标深度图。
在一个可选的例子中,可以根据单目图像深度信息的纵向变化规律,通过残差估计网络对预测深度图进行残差估计,获得预测深度图的残差图,然后对残差图和预测深度图进行逐像素叠加运算,获得单目图像的目标深度图。
可选地,在根据单目图像深度信息的纵向变化规律对预测深度图进行优化,获得单目图像的目标深度图之前,还可以根据预测深度图获取单目图像深度信息的纵向变化规律。
在一个可选的例子中,可以通过纵向池化层对预测深度图进行处理,获取单目图像深度信息的纵向变化规律。其中,纵向池化层可以使用一个列向量作为池化核,对预测深度图进行池化处理,例如:纵向池化层可以使用大小为H×1的池化核,对预测深度图进行平均池化处理,其中H为大于1的整数。
本实施例提供的单目图像深度估计方法,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征,根据全局特征、单目图像中各预设区域的绝对特征 和各预设区域之间的相对特征,获得单目图像的预测深度图,根据单目图像深度信息的纵向变化规律对预测深度图进行优化,获得单目图像的目标深度图,通过在单目图像深度估计中,除了利用图像中各预设区域的相对特征与绝对特征相互补充,提高了深度估计中相对距离预测的准确度,还利用图像深度信息的纵向变化规律进行优化,提高了深度估计中绝对距离预测的准确度,从而可以全面提高单目图像深度估计的准确度。
在一些实施例中,当在将单目图像经第一神经网络进行特征提取之前,对单目图像进行下采样,获得具有预设维度的单目图像,并以具有预设维度的单目图像作为深度估计神经网络进行深度估计的单目图像时,根据单目图像深度信息的纵向变化规律对预测深度图进行优化,可以采用多尺度学习的方法,以提高单目图像深度估计的准确度。
可选地,可以对预测深度图进行预设次数的上采样,根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律,根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图。其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图,作为下一次上采样的预测深度图,最末一次上采样获得的优化后的目标深度图,作为单目图像的目标深度图,该目标深度图的维度与单目图像的维度相同。
下面将结合图3,详细描述采用多尺度学习进行优化时每一个尺度优化的流程。
如图3所示,该方法包括:
302,对具有第一预设维度的预测深度图进行上采样,获得具有第二预设维度的预测深度图。
在本实施例中,具有第一预设维度的预测深度图可以是获取自深度估 计神经网络的预测深度图,也可以是获取自上一个尺度优化流程的优化后的目标深度图。第二预设维度为第一预设维度的倍数,其中第一预设维度和第二预设维度的大小可以根据上采样的次数、频率以及单目图像的尺寸等确定。
在一个可选的例子中,可以通过上采样层对具有第一预设维度的预测深度图进行上采样,获得具有第二预设维度的预测深度图。
304,根据具有第二预设维度的预测深度图,获取对应的深度信息的纵向变化规律。
在一个可选的例子中,可以通过纵向池化层对具有第二预设维度的预测深度图进行处理,获取对应的深度信息的纵向变化规律。
306,根据对应的深度信息的纵向变化规律,对具有第二预设维度的预测深度图进行残差估计,获得对应的残差图。
在一个可选的例子中,可以根据对应的深度信息的纵向变化规律,通过残差估计网络对具有第二预设维度的预测深度图进行残差估计,获得对应的残差图。
308,根据对应的残差图对具有第二预设维度的预测深度图进行优化,获得优化后具有第二预设维度的目标深度图。
在一个可选的例子中,可以通过对对应的残差图和具有第二预设维度的预测深度图进行逐像素叠加运算,获得优化后具有第二预设维度的目标深度图。
图4A至图4C为实现本申请一些实施例的单目图像深度估计方法的网络结构的示意图。
在本实施例中,如图4A所示,实现本申请实施例单目图像深度估计方法的网络包括:卷积神经网络、深度估计神经网络和深度优化神经网络。其中,卷积神经网络包括下采样层和第一神经网络,通过下采样层对单目 图像进行8倍下采样,获得维度为单目图像1/8的单目图像,然后通过第一神经网络对1/8维度的单目图像进行特征提取,获得1/8维度的单目图像中各预设区域的绝对特征。
如图4B所示,深度估计神经网络包括:关联度层、全连接层和深度估计器,其中,关联度层可以根据1/8维度的单目图像中各预设区域的绝对特征,获得1/8维度的单目图像中各预设区域之间的相对特征,全连接层可以根据1/8维度的单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取1/8维度的单目图像的全局特征,深度估计器可以根据1/8维度的图像的全局特征、1/8维度的单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得1/8维度的预测深度图。
在本实施例中,深度优化神经网络包括第一尺度优化网络、第二尺度优化网络和第三尺度优化网络,其中,每一个尺度优化网络的结构,如图4C所示,均包括:上采样层、纵向池化层、残差估计网络和加法运算单元。
其中,第一尺度优化网络的上采样层可以对1/8维度的预测深度图进行2倍上采样,获得1/4维度的预测深度图,第一尺度优化网络的纵向池化层可以根据1/4维度的预测深度图,获取对应的深度信息的纵向变化规律,第一尺度优化网络的残差估计网络可以根据1/4维度的预测深度图对应的深度信息的纵向变化规律,对1/4维度的预测深度图进行残差估计,获得对应的残差图,第一尺度优化网络的加法运算单元可以对对应的残差图和1/4维度的预测深度图进行逐像素叠加运算,获得优化后1/4维度的目标深度图,可以将该优化后1/4维度的目标深度图作为第二尺度优化网络的预测深度图。
第二尺度优化网络的上采样层可以对优化后1/4维度的目标深度图进行2倍上采样,获得1/2维度的预测深度图,第二尺度优化网络的纵向池化层可以根据1/2维度的预测深度图,获取对应的深度信息的纵向变化规律, 第二尺度优化网络的残差估计网络可以根据1/2维度的预测深度图对应的深度信息的纵向变化规律,对1/2维度的预测深度图进行残差估计,获得对应的残差图,第二尺度优化网络的加法运算单元可以对对应的残差图和1/2维度的预测深度图进行逐像素叠加运算,获得优化后1/2维度的目标深度图,可以将该优化后1/2维度的目标深度图作为第三尺度优化网络的预测深度图。
第三尺度优化网络的上采样层可以对优化后1/2维度的目标深度图进行2倍上采样,获得维度与单目图像的维度相同的预测深度图,第三尺度优化网络的纵向池化层可以根据维度与单目图像的维度相同的预测深度图,获取对应的深度信息的纵向变化规律,第三尺度优化网络的残差估计网络可以根据维度与单目图像的维度相同的预测深度图对应的深度信息的纵向变化规律,对维度与单目图像的维度相同的预测深度图进行残差估计,获得对应的残差图,第三尺度优化网络的加法运算单元可以对对应的残差图和维度与单目图像的维度相同的预测深度图进行逐像素叠加,获得优化后维度与单目图像的维度相同的目标深度图,并将该优化后的深度图作为单目图像的目标深度图。
在一个可选的例子中,上述各实施例的深度估计神经网络,可以通过双目图像立体匹配获得的稠密深度图和稀疏深度图作为标注数据,进行半监督的训练获得。
在本实施例中,由于采用其它方法获得的训练数据的“标注数据”比较稀疏,即深度图中有效的像素值比较少,因此采用双目匹配获得的深度图作为训练数据的“标注数据”。
本申请实施例提供的单目图像深度估计方法可以用于场景几何结构分析、自动驾驶、辅助驾驶、目标跟踪以及机器人自主避障等领域。例如:在驾驶场景中,可以利用本申请实施例提供的单目图像深度估计方法对前 车或者行人的距离进行预测。在手机拍照时,可以利用本申请实施例提供的单目图像深度估计方法预测的深度信息进行单目虚化操作;利用本申请实施例提供的单目图像深度估计方法的预测结果,可以帮助改善物体跟踪算法。
图5为本申请一些实施例的单目图像深度估计装置的结构示意图。
如图5所示,该装置包括:深度估计神经网络510。其中,
深度估计神经网络510,配置为根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征;以及根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图。
在本实施例中,单目图像可以是从图像采集设备获取的图像,也可以是从存储装置获取的图像,例如:图像采集设备可以为照相机、摄像机、扫描仪等,存储装置可以为U盘、光盘、硬盘等,本实施例对单目图像的获取方式不作限定。其中,单目图像中各预设区域的绝对特征可以用来表示单目图像中各预设区域的局部外观,例如:它可以包括纹理特征、几何特征等。单目图像中各预设区域之间的相对特征可以用来表示单目图像中各预设区域局部外观之间的差异性,例如:它可以包括纹理差异、几何差异等。单目图像中的各预设区域可以根据图像的特征设定。本实施例的深度图是指以图像中各像素的像素值表征图像中的各像素到图像采集设备之间的距离的图像。
在一个可选的例子中,如图5所示,深度估计神经网络510可以包括:全连接层511,配置为结合单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征。深度估计神经网络510还可以包括:深度估计器512,配置为根据单目图像的全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,进行深度估计,获得 单目图像的预测深度图。例如:深度估计器可以采用全卷积网络,全卷积网络主要由卷积层和反卷积层组成,它可以根据图像的几何分布信息,即图像的全局特征、图像中各预设区域的绝对特征和各预设区域之间的相对特征,回归出图像中各像素的深度值,从而获得预测深度图。
本实施例提供的单目图像深度估计装置,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征,根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图,通过在单目图像深度估计中,利用图像中各预设区域的相对特征与绝对特征相互补充,提高了深度估计中相对距离预测的准确度,从而可以提高单目图像深度估计的准确度。
图6为本申请另一些实施例的单目图像深度估计装置的结构示意图。
如图6所示,与图5的实施例相比,两者的不同之处在于,该装置还包括:第一神经网络620。其中,
第一神经网络620,配置为对单目图像进行特征提取,获取单目图像中各预设区域的特征,并将各预设区域的特征作为单目图像中各预设区域的绝对特征。例如:第一神经网络可以采用由于一个卷积层和池化层组成的编码器网络,单目图像经编码器网络进行特征提取,可以获得图像的高维特征。
深度估计神经网络610,还用于根据单目图像中各预设区域的绝对特征,获取单目图像中各预设区域之间的相对特征。
在一个可选的例子中,如图6所示,深度估计神经网络610还可以包括:关联度层613,配置为对单目图像中各预设区域的绝对特征经关联度层进行矢量运算,获得单目图像中各预设区域之间的相对特征。其中,图像中各预设区域之间的相对特征,可以为图像中各预设区域与其周边预设范 围内的预设区域之间的相对特征,例如:可以通过对单目图像中各预设区域与其周边预设范围内的预设区域之间的特征向量,进行点积运算,获得单目图像中各预设区域之间的相对特征。
可选地,该装置还可以包括:下采样层,配置为在对单目图像进行特征提取之前,对单目图像进行下采样,获得具有预设维度的单目图像,此时深度估计神经网络610是对具有预设维度的单目图像进行深度估计,以减少计算量,提高数据处理的速度。其中,单目图像的维度为预设维度的倍数,例如:单目图像的维度为预设维度的8倍。
图7为本申请另一些实施例的单目图像深度估计装置的结构示意图。
如图7所示,与图5的实施例相比,两者的不同之处在于,该装置还包括:第二神经网络730。其中,
第二神经网络730,配置为根据单目图像深度信息的纵向变化规律对预测深度图进行优化,获得单目图像的目标深度图。
可选地,第二神经网络730,配置为根据单目图像深度信息的纵向变化规律,对预测深度图进行残差估计,获得预测深度图的残差图,然后根据残差图对预测深度图进行优化,获得单目图像的目标深度图。
在一个可选的例子中,如图7所示,第二神经网络730可以包括:残差估计网络731,配置为可以根据单目图像深度信息的纵向变化规律,通过残差估计网络对预测深度图进行残差估计,获得预测深度图的残差图;加法运算单元732,配置为对残差图和预测深度图进行逐像素叠加运算,获得单目图像的目标深度图。
可选地,第二神经网络730还用于根据预测深度图获取单目图像深度信息的纵向变化规律。
在一个可选的例子中,如图7所示,第二神经网络730还可以包括:纵向池化层733,配置为通过纵向池化层对预测深度图进行处理,获取单目 图像深度信息的纵向变化规律。其中,纵向池化层可以使用一个列向量作为池化核,对预测深度图进行池化处理,例如:纵向池化层可以使用大小为H×1的池化核,对预测深度图进行平均池化处理,其中H为大于1的整数。
本实施例提供的单目图像深度估计装置,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取单目图像的全局特征,根据全局特征、单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得单目图像的预测深度图,根据单目图像深度信息的纵向变化规律对预测深度图进行优化,获得单目图像的目标深度图,通过在单目图像深度估计中,除了利用图像中各预设区域的相对特征与绝对特征相互补充,提高了深度估计中相对距离预测的准确度,还利用图像深度信息的纵向变化规律进行优化,提高了深度估计中绝对距离预测的准确度,从而可以全面提高单目图像深度估计的准确度。
在一个可选的例子中,当在将单目图像经第一神经网络进行特征提取之前,通过下采样层对单目图像进行下采样,获得具有预设维度的单目图像,并以具有预设维度的单目图像作为深度估计神经网络进行深度估计的单目图像时,根据单目图像深度信息的纵向变化规律对预测深度图进行优化,可以采用多尺度学习的方法,以提高单目图像深度估计的准确度。
可选地,该装置还可以包括:上采样层,配置为对预测深度图进行预设次数的上采样;纵向池化层,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律;第二神经网络,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图。其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图,作为下一次上采样的预测 深度图,最末一次上采样获得的优化后的目标深度图,作为单目图像的目标深度图,该目标深度图的维度与单目图像的维度相同。
在一个可选的例子中,上述各实施例的深度估计神经网络,可以通过双目图像立体匹配获得的稠密深度图和稀疏深度图作为标注数据,进行半监督的训练获得。
在一个可选的例子中,由于采用其它方法获得的训练数据的“标注数据”比较稀疏,即深度图中有效的像素值比较少,因此采用双目图像立体匹配获得的深度图作为训练数据的“标注数据”。本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图8,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备800的结构示意图:如图8所示,电子设备800包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)801,和/或一个或多个图像处理器(GPU)813等,处理器可以根据存储在只读存储器(ROM)802中的可执行指令或者从存储部分808加载到随机访问存储器(RAM)803中的可执行指令而执行各种适当的动作和处理。通信部812可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器802和/或RAM 803通信以执行可执行指令,通过总线804与通信部812相连、并经通信部812与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。
此外,在RAM 803中,还可存储有装置操作所需的各种程序和数据。CPU801、ROM802以及RAM803通过总线804彼此相连。在有RAM803 的情况下,ROM802为可选模块。RAM803存储可执行指令,或在运行时向ROM802中写入可执行指令,可执行指令使中央处理单元801执行上述通信方法对应的操作。输入/输出(I/O)接口805也连接至总线804。通信部812可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
需要说明的,如图8所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图8的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU813和CPU801可分离设置或者可将GPU813集成在CPU801上,通信部可分离设置,也可集成设置在CPU801或GPU813上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;根据所述全局特征、所述单目图像中各预设区域的绝 对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本申请的方法中限定的上述功能。
在一个或多个可选实施方式中,本申请实施例还提供了一种计算机程序程序产品,配置为存储计算机可读指令,该指令被执行时使得计算机执行上述任一可能的实现方式中的图像恢复方法。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选例子中,该计算机程序产品具体体现为计算机存储介质,在另一个可选例子中,该计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
在一个或多个可选实施方式中,本申请实施例还提供了一种单目图像深度估计方法及其对应的装置、电子设备、计算机存储介质、计算机程序以及计算机程序产品,其中,该方法包括:第一装置向第二装置发送单目图像深度估计指示,该指示使得第二装置执行上述任一可能的实施例中的单目图像深度估计方法;第一装置接收第二装置发送的单目图像深度估计的结果。
在一些实施例中,该单目图像深度估计指示可以具体为调用指令,第一装置可以通过调用的方式指示第二装置执行单目图像深度估计,相应地,响应于接收到调用指令,第二装置可以执行上述单目图像深度估计方法中的任意实施例中的步骤和/或流程。
应理解,本申请实施例中的“第一”、“第二”等术语仅仅是为了区分,而不应理解成对本申请实施例的限定。
还应理解,在本申请中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本申请中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本申请对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (30)

  1. 一种单目图像深度估计方法,包括:
    基于深度估计神经网络,根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;
    根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。
  2. 根据权利要求1所述的方法,其中,在所述根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征之前,还包括:
    将所述单目图像经第一神经网络进行特征提取,获取所述单目图像中各预设区域的特征,并将所述各预设区域的特征作为所述单目图像中各预设区域的绝对特征;
    根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征。
  3. 根据权利要求2所述的方法,其中,所述根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征,包括:
    对所述单目图像中各预设区域的绝对特征经关联度层进行矢量运算,获得所述单目图像中各预设区域之间的相对特征。
  4. 根据权利要求2或3所述的方法,其中,在将所述单目图像经第一神经网络进行特征提取之前,还包括:
    对所述单目图像进行下采样,获得具有预设维度的单目图像;其中,所述单目图像的维度为所述预设维度的倍数。
  5. 根据权利要求1至4中任意一项所述的方法,其中,所述根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所 述单目图像的全局特征,包括:
    通过全连接层结合所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征。
  6. 根据权利要求1至5中任意一项所述的方法,其中,所述根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图,包括:
    根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,通过深度估计器进行深度估计,获得所述单目图像的预测深度图。
  7. 根据权利要求1至6中任意一项所述的方法,其中,所述根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图之后,还包括:
    根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图。
  8. 根据权利要求7所述的方法,其中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图,包括:
    根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;
    根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图。
  9. 根据权利要求8所述的方法,其中,所述根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图,包括:
    根据所述单目图像深度信息的纵向变化规律,通过残差估计网络对 所述预测深度图进行残差估计,获得所述预测深度图的残差图;
    所述根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图,包括:
    对所述残差图和所述预测深度图进行逐像素叠加运算,获得所述单目图像的目标深度图。
  10. 根据权利要求7至9中任意一项所述的方法,其中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图之前,还包括:
    根据所述预测深度图获取所述单目图像深度信息的纵向变化规律。
  11. 根据权利要求10所述的方法,其中,所述根据所述预测深度图获取所述单目图像深度信息的纵向变化规律,包括:
    通过纵向池化层对所述预测深度图进行处理,获取所述单目图像深度信息的纵向变化规律。
  12. 根据权利要求7所述的方法,其中,所述根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,包括:
    对所述预测深度图进行预设次数的上采样,根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律,根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图;
    其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图作为下一次上采样的预测深度图,最末一次上采样获得的优化后的目标深度图作为所述单目图像的目标深度图,所述目标深度图的维度与所述单目图像的维度相同。
  13. 根据权利要求1至12中任意一项所述的方法,其中,所述深度 估计神经网络包括:关联度层、全连接层和深度估计器,利用稀疏深度图和通过双目图像立体匹配获得的稠密深度图作为标注数据对所述深度估计神经网络进行训练获得。
  14. 一种单目图像深度估计装置,包括:
    深度估计神经网络,配置为根据单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获取所述单目图像的全局特征;以及根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,获得所述单目图像的预测深度图。
  15. 根据权利要求14所述的装置,其中,还包括:
    第一神经网络,配置为对所述单目图像进行特征提取,获取所述单目图像中各预设区域的特征,并将所述各预设区域的特征作为所述单目图像中各预设区域的绝对特征;
    所述深度估计神经网络,还用于根据所述单目图像中各预设区域的绝对特征,获取所述单目图像中各预设区域之间的相对特征。
  16. 根据权利要求15所述的装置,其中,所述深度估计神经网络,包括:
    关联度层,配置为对所述单目图像中各预设区域的绝对特征进行矢量运算,获得所述单目图像中各预设区域之间的相对特征。
  17. 根据权利要求15或16所述的装置,其中,还包括:
    下采样层,配置为在对所述单目图像进行特征提取之前,对所述单目图像进行下采样,获得具有预设维度的单目图像;其中,所述单目图像的维度为所述预设维度的倍数。
  18. 根据权利要求14至17中任意一项所述的装置,其中,所述深度估计神经网络,包括:
    全连接层,配置为结合所述单目图像中各预设区域的绝对特征和各 预设区域之间的相对特征,获取所述单目图像的全局特征。
  19. 根据权利要求14至18中任意一项所述的装置,其中,所述深度估计神经网络,包括:
    深度估计器,配置为根据所述全局特征、所述单目图像中各预设区域的绝对特征和各预设区域之间的相对特征,进行深度估计,获得所述单目图像的预测深度图。
  20. 根据权利要求14至19中任意一项所述的装置,其中,还包括:
    第二神经网络,配置为根据所述单目图像深度信息的纵向变化规律对所述预测深度图进行优化,获得所述单目图像的目标深度图。
  21. 根据权利要求20所述的装置,其中,所述第二神经网络,配置为根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;以及根据所述残差图对所述预测深度图进行优化,获得所述单目图像的目标深度图。
  22. 根据权利要求21所述的装置,其中,所述第二神经网络,包括:
    残差估计网络,配置为根据所述单目图像深度信息的纵向变化规律,对所述预测深度图进行残差估计,获得所述预测深度图的残差图;
    加法运算单元,配置为对所述残差图和所述预测深度图进行逐像素叠加运算,获得所述单目图像的目标深度图。
  23. 根据权利要求20至22中任意一项所述的装置,其中,所述第二神经网络,还用于根据所述预测深度图获取所述单目图像深度信息的纵向变化规律。
  24. 根据权利要求23所述的装置,其中,所述第二神经网络,包括:
    纵向池化层,配置为对所述预测深度图进行处理,获取所述单目图像深度信息的纵向变化规律。
  25. 根据权利要求20所述的装置,其中,还包括:
    上采样层,配置为对所述预测深度图进行预设次数的上采样;
    纵向池化层,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图获取深度信息的纵向变化规律;
    所述第二神经网络,配置为根据每一次上采样获得的维度依次成倍数增大的预测深度图的深度信息的纵向变化规律,对每一次上采样获得的维度依次成倍数增大的预测深度图进行优化,获得优化后的目标深度图;
    其中,除最末一次上采样外,其余每一次上采样获得的优化后的目标深度图作为下一次上采样的预测深度图,最末一次上采样获得的优化后的目标深度图作为所述单目图像的目标深度图,所述目标深度图的维度与所述单目图像的维度相同。
  26. 根据权利要求14至25中任意一项所述的装置,其中,所述深度估计神经网络包括:关联度层、全连接层和深度估计器,利用稀疏深度图和通过双目图像立体匹配获得的稠密深度图作为标注数据对所述深度估计神经网络进行训练获得。
  27. 一种电子设备,包括权利要求14至26中任意一项所述的装置。
  28. 一种电子设备,包括:
    存储器,配置为存储可执行指令;以及
    处理器,配置为执行所述可执行指令从而完成权利要求1至13中任意一项所述的方法。
  29. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1至13中任意一项所述方法的指令。
  30. 一种计算机存储介质,配置为存储计算机可读取的指令,所述指令被执行时实现权利要求1至13中任意一项所述的方法。
PCT/CN2019/082314 2018-07-27 2019-04-11 单目图像深度估计方法及装置、设备、程序及存储介质 Ceased WO2020019761A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11202003878TA SG11202003878TA (en) 2018-07-27 2019-04-11 Monocular image depth estimation method and apparatus, device, program and storage medium
JP2020542490A JP6963695B2 (ja) 2018-07-27 2019-04-11 単眼画像深度推定方法及び装置、機器、プログラム及び記憶媒体
KR1020207009304A KR102292559B1 (ko) 2018-07-27 2019-04-11 단안 이미지 깊이 추정 방법 및 장치, 기기, 프로그램 및 저장 매체
US16/830,363 US11443445B2 (en) 2018-07-27 2020-03-26 Method and apparatus for depth estimation of monocular image, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810845040.4 2018-07-27
CN201810845040.4A CN109035319B (zh) 2018-07-27 2018-07-27 单目图像深度估计方法及装置、设备、程序及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/830,363 Continuation US11443445B2 (en) 2018-07-27 2020-03-26 Method and apparatus for depth estimation of monocular image, and storage medium

Publications (1)

Publication Number Publication Date
WO2020019761A1 true WO2020019761A1 (zh) 2020-01-30

Family

ID=64647384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082314 Ceased WO2020019761A1 (zh) 2018-07-27 2019-04-11 单目图像深度估计方法及装置、设备、程序及存储介质

Country Status (7)

Country Link
US (1) US11443445B2 (zh)
JP (1) JP6963695B2 (zh)
KR (1) KR102292559B1 (zh)
CN (1) CN109035319B (zh)
SG (1) SG11202003878TA (zh)
TW (1) TWI766175B (zh)
WO (1) WO2020019761A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784659A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 图像检测的方法、装置、电子设备以及存储介质
CN113344998A (zh) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质
CN115457153A (zh) * 2022-10-24 2022-12-09 深圳博升光电科技有限公司 单目结构光相机识别玻璃的识别方法及单目结构光相机

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035319B (zh) * 2018-07-27 2021-04-30 深圳市商汤科技有限公司 单目图像深度估计方法及装置、设备、程序及存储介质
US11589031B2 (en) * 2018-09-26 2023-02-21 Google Llc Active stereo depth prediction based on coarse matching
GB201900839D0 (en) * 2019-01-21 2019-03-13 Or3D Ltd Improvements in and relating to range-finding
US12008740B2 (en) * 2020-08-12 2024-06-11 Niantic, Inc. Feature matching using features extracted from perspective corrected image
CN112070817B (zh) * 2020-08-25 2024-05-28 中国科学院深圳先进技术研究院 一种图像深度估计方法、终端设备及计算机可读存储介质
CN112446328B (zh) * 2020-11-27 2023-11-17 汇纳科技股份有限公司 单目深度的估计系统、方法、设备及计算机可读存储介质
CN112183537B (zh) * 2020-11-30 2021-03-19 北京易真学思教育科技有限公司 模型训练方法及装置、文本区域检测方法及装置
CN112819874B (zh) * 2021-01-07 2024-05-28 北京百度网讯科技有限公司 深度信息处理方法、装置、设备、存储介质以及程序产品
CN112837361B (zh) * 2021-03-05 2024-07-16 浙江商汤科技开发有限公司 一种深度估计方法及装置、电子设备和存储介质
CN116745813A (zh) * 2021-03-18 2023-09-12 创峰科技 室内环境的自监督式深度估计框架
US12430778B2 (en) * 2021-04-19 2025-09-30 Google Llc Depth estimation using a neural network
CN113379813B (zh) * 2021-06-08 2024-04-30 北京百度网讯科技有限公司 深度估计模型的训练方法、装置、电子设备及存储介质
CN113344997B (zh) * 2021-06-11 2022-07-26 方天圣华(北京)数字科技有限公司 快速获取只含有目标对象的高清前景图的方法及系统
CN113313757B (zh) * 2021-07-27 2024-07-12 广州市勤思网络科技有限公司 一种基于单目测距的船舱乘客安全预警算法
KR20230047759A (ko) 2021-10-01 2023-04-10 삼성전자주식회사 깊이맵을 개선하는 방법 및 장치
KR102704310B1 (ko) * 2021-11-03 2024-09-05 네이버랩스 주식회사 단안 거리 추정 모델 학습 방법 및 시스템
CN114612544B (zh) * 2022-03-11 2024-01-02 北京百度网讯科技有限公司 图像处理方法、装置、设备和存储介质
CN117152223B (zh) * 2022-05-24 2025-12-12 鸿海精密工业股份有限公司 深度图像生成方法、系统、电子设备及可读存储介质
US12456212B2 (en) 2022-05-26 2025-10-28 Samsung Electronics Co., Ltd. Electronic apparatus and image processing method thereof
WO2025151008A1 (ko) * 2024-01-12 2025-07-17 주식회사 타이로스코프 안구 돌출 값을 추정하는 방법 및 이를 수행하는 시스템
KR20250139666A (ko) * 2024-03-15 2025-09-23 삼성전자주식회사 3차원 영상의 깊이맵을 획득하는 방법 및 장치
CN120107330A (zh) * 2025-01-21 2025-06-06 河北师范大学 单目遥感图像高度数据估计方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780588A (zh) * 2016-12-09 2017-05-31 浙江大学 一种基于稀疏激光观测的图像深度估计方法
CN107204010A (zh) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 一种单目图像深度估计方法与系统
CN107578436A (zh) * 2017-08-02 2018-01-12 南京邮电大学 一种基于全卷积神经网络fcn的单目图像深度估计方法
CN109035319A (zh) * 2018-07-27 2018-12-18 深圳市商汤科技有限公司 单目图像深度估计方法及装置、设备、程序及存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222419A (ja) * 2001-01-29 2002-08-09 Olympus Optical Co Ltd 画像領域分割装置及びその方法ならびに処理プログラムが記録された記録媒体
US8472699B2 (en) * 2006-11-22 2013-06-25 Board Of Trustees Of The Leland Stanford Junior University Arrangement and method for three-dimensional depth image construction
US9471988B2 (en) * 2011-11-02 2016-10-18 Google Inc. Depth-map generation for an input image using an example approximate depth-map associated with an example similar image
CN102750702B (zh) * 2012-06-21 2014-10-15 东华大学 基于优化bp神经网络模型的单目红外图像深度估计方法
EP2854104A1 (en) * 2013-09-25 2015-04-01 Technische Universität München Semi-dense simultaneous localization and mapping
CN106157307B (zh) * 2016-06-27 2018-09-11 浙江工商大学 一种基于多尺度cnn和连续crf的单目图像深度估计方法
CN106599805B (zh) * 2016-12-01 2019-05-21 华中科技大学 一种基于有监督数据驱动的单目视频深度估计方法
CN106952222A (zh) * 2017-03-17 2017-07-14 成都通甲优博科技有限责任公司 一种交互式图像虚化方法及装置
CN107230014B (zh) * 2017-05-15 2020-11-03 浙江仟和网络科技有限公司 一种末端即时物流的智能调度系统
CN108229478B (zh) * 2017-06-30 2020-12-29 深圳市商汤科技有限公司 图像语义分割及训练方法和装置、电子设备、存储介质和程序
CN107553490A (zh) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 一种基于深度学习的单目视觉避障方法
CN107767413B (zh) * 2017-09-20 2020-02-18 华南理工大学 一种基于卷积神经网络的图像深度估计方法
CN107945265B (zh) 2017-11-29 2019-09-20 华中科技大学 基于在线学习深度预测网络的实时稠密单目slam方法与系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780588A (zh) * 2016-12-09 2017-05-31 浙江大学 一种基于稀疏激光观测的图像深度估计方法
CN107204010A (zh) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 一种单目图像深度估计方法与系统
CN107578436A (zh) * 2017-08-02 2018-01-12 南京邮电大学 一种基于全卷积神经网络fcn的单目图像深度估计方法
CN109035319A (zh) * 2018-07-27 2018-12-18 深圳市商汤科技有限公司 单目图像深度估计方法及装置、设备、程序及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIAN, HU: "Depth Estimation on Monocular Image", INFORMATION & TECHNOLOGY, CHINA DOCTORAL DISSERTATONS FULL-TEXT DATABESE, no. 3, 15 March 2016 (2016-03-15), ISSN: 1674-022X *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784659A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 图像检测的方法、装置、电子设备以及存储介质
CN113344998A (zh) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质
CN113344998B (zh) * 2021-06-25 2022-04-29 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质
CN115457153A (zh) * 2022-10-24 2022-12-09 深圳博升光电科技有限公司 单目结构光相机识别玻璃的识别方法及单目结构光相机

Also Published As

Publication number Publication date
CN109035319B (zh) 2021-04-30
US11443445B2 (en) 2022-09-13
US20200226773A1 (en) 2020-07-16
TWI766175B (zh) 2022-06-01
TW202008308A (zh) 2020-02-16
KR20200044108A (ko) 2020-04-28
CN109035319A (zh) 2018-12-18
JP6963695B2 (ja) 2021-11-10
KR102292559B1 (ko) 2021-08-24
JP2021500689A (ja) 2021-01-07
SG11202003878TA (en) 2020-05-28

Similar Documents

Publication Publication Date Title
WO2020019761A1 (zh) 单目图像深度估计方法及装置、设备、程序及存储介质
CN109325972B (zh) 激光雷达稀疏深度图的处理方法、装置、设备及介质
US11270158B2 (en) Instance segmentation methods and apparatuses, electronic devices, programs, and media
KR102295403B1 (ko) 깊이 추정 방법 및 장치, 전자 기기, 프로그램 및 매체
US11380017B2 (en) Dual-view angle image calibration method and apparatus, storage medium and electronic device
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
CN108229497B (zh) 图像处理方法、装置、存储介质、计算机程序和电子设备
US10846870B2 (en) Joint training technique for depth map generation
CN108460411B (zh) 实例分割方法和装置、电子设备、程序和介质
CN108427927B (zh) 目标再识别方法和装置、电子设备、程序和存储介质
WO2019223382A1 (zh) 单目深度估计方法及其装置、设备和存储介质
WO2018166438A1 (zh) 图像处理方法、装置及电子设备
CN109300151B (zh) 图像处理方法和装置、电子设备
US11625846B2 (en) Systems and methods for training a machine-learning-based monocular depth estimator
CN113129352A (zh) 一种稀疏光场重建方法及装置
CN118365523A (zh) 任意尺度图像的表示方法、系统、电子设备及存储介质
JP2024521816A (ja) 無制約画像手ぶれ補正
Maslov et al. Fast depth reconstruction using deep convolutional neural networks
CN113592706A (zh) 调整单应性矩阵参数的方法和装置
CN116503460A (zh) 深度图获取方法、装置、电子设备及存储介质
US20230177722A1 (en) Apparatus and method with object posture estimating
WO2025230521A1 (en) Machine-learning fusion of dual-resolution image streams generated by a multi-camera system
CN121120906A (zh) 图像渲染方法、装置及设备
CN115908879A (zh) 基于点引导注意力机制的自适应局部图像特征匹配方法
CN121329799A (zh) 光流场的生成方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840314

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207009304

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020542490

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19840314

Country of ref document: EP

Kind code of ref document: A1