CN116993800A

CN116993800A - Depth image generation method, device and storage medium based on camera module

Info

Publication number: CN116993800A
Application number: CN202210464325.XA
Authority: CN
Inventors: 徐驰; 杨冬生; 刘柯; 王欢; 张谦
Original assignee: BYD Co Ltd
Current assignee: BYD Co Ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2023-11-03
Anticipated expiration: 2042-04-25
Also published as: CN116993800B

Abstract

Embodiments of the present application provide a depth image generation method, device and storage medium based on a camera module, which belong to the field of image processing technology. The method for generating a depth image includes: collecting a first image through a first camera, and collecting a second image through a second camera, where there is an overlapping target area in the fields of view of the first camera and the second camera; Feature points are matched to determine the first relative pose between the first camera and the second camera; when the difference between the first relative pose and the initial relative pose is greater than the preset pose change threshold, the initial relative pose is The pose is adjusted to obtain a second relative pose, and the depth map of the target area is determined based on the second relative pose. This application can correct the relative pose change parameters of each camera by estimating the small changes in camera status in real time, and improve the accuracy of the camera's relative pose.

Description

Depth image generation method and device based on camera module and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a depth image generating method based on a camera module, a depth image generating device based on a camera module, and a machine-readable storage medium.

Background

An autopilot is generally equipped with a plurality of cameras for sensing the external environment of the vehicle, but the relative pose between the cameras on the autopilot varies with the increase of the use time and the change of road conditions, and if calibration parameters (camera external parameters) of each camera are not recalibrated, the sensing of the external environment of the vehicle by the cameras may deviate, possibly causing accidents.

Disclosure of Invention

An object of an embodiment of the present application is to provide a depth image generating method based on a camera module, a depth image generating device based on a camera module, and a machine-readable storage medium, so as to solve the above-mentioned problems.

In order to achieve the above object, a first aspect of the present application provides a depth image generating method based on a camera module, the camera module at least includes a first camera and a second camera, the method includes:

acquiring a first image through a first camera and acquiring a second image through a second camera, wherein a target area is overlapped with the field of view of the first camera and the second camera;

performing feature point matching on the first image and the second image to determine a first relative pose between the first camera and the second camera;

Acquiring initial relative poses of the first camera and the second camera, and adjusting the initial relative poses to obtain second relative poses under the condition that the difference value between the first relative poses and the initial relative poses is larger than a preset pose change threshold;

a depth map of the target region is determined based on the second relative pose.

Optionally, when the difference between the first relative pose and the initial relative pose is greater than a preset pose change threshold, adjusting the initial relative pose to obtain a second relative pose, including:

when the difference value between the first relative pose and the initial relative pose is larger than a pose change threshold, adjusting the initial relative pose with a preset pose adjustment step length to obtain a second relative pose of the first camera and the second camera;

and if the difference value between the second relative pose and the first relative pose is larger than the pose change threshold, adjusting the second relative pose by the pose adjustment step length until the difference value between the second relative pose and the first relative pose is not larger than the pose change threshold, and taking the second relative pose as a new initial relative pose.

Optionally, feature point matching the first image and the second image to determine a first relative pose between the first camera and the second camera includes:

determining an object to be detected comprising two parallel lines extending along a shooting direction;

extracting characteristic points of the object to be detected in the first image, and determining first vanishing points of the object to be detected in the first image according to characteristic point fitting; extracting characteristic points of the target to be detected in the second image, and determining second vanishing points of the target to be detected in the second image according to characteristic point fitting;

and determining the relative pose between the first vanishing point and the second vanishing point, and taking the relative pose between the first vanishing point and the second vanishing point as the first relative pose between the first camera and the second camera.

Optionally, determining a first vanishing point of the target to be detected in the first image according to feature point fitting includes:

obtaining characteristic lines of two parallel lines of the target to be detected in the first image according to characteristic point fitting, and determining a first vanishing point of the target to be detected in the first image based on the two characteristic lines in the first image;

Determining a second vanishing point of the target to be detected in the second image according to characteristic point fitting, including:

and fitting according to the characteristic points to obtain characteristic lines of the two parallel lines of the target to be detected in the second image, and determining a second vanishing point of the target to be detected in the second image based on the two characteristic lines in the second image.

Optionally, the method further comprises:

extracting first semantic information of the first image and second semantic information of the second image;

determining a depth map of the target region based on the second relative pose, comprising:

a depth map of the target region is determined based on the second relative pose, the first semantic information, and the second semantic information.

Optionally, the determining the depth map of the target region based on the second relative pose, the first semantic information, and the second semantic information includes:

generating an initial depth map of the target area according to the characteristic points of the first image, the characteristic points of the second image and the second relative pose;

establishing a geometric model corresponding to the first semantic information and establishing a geometric model corresponding to the second semantic information;

And complementing the cavity area of the initial depth map through the geometric model corresponding to the first semantic information and the geometric model corresponding to the second semantic information to obtain the depth map of the target area.

Optionally, before the first image is acquired by the first camera and the second image is acquired by the second camera, the method further comprises:

camera times of the first camera and the second camera are synchronized in response to a clock synchronization signal.

In a second aspect of the present invention, there is provided a depth image generating apparatus for generating a depth image by applying the above depth image generating method, the apparatus comprising:

an image acquisition module configured to acquire a first image by a first camera and a second image by a second camera, the first camera and the second camera having a field of view with overlapping target areas;

a pose determination module configured to perform feature point matching on the first image and the second image to determine a first relative pose between the first camera and the second camera;

the depth map generation module is configured to acquire initial relative poses of the first camera and the second camera, and adjust the initial relative poses to obtain second relative poses under the condition that the difference value between the first relative poses and the initial relative poses is larger than a preset pose change threshold;

In a third aspect of the application, there is provided a machine-readable storage medium having stored thereon instructions that, when executed by a processor, cause the processor to be configured to perform the depth image generation method described above.

In a fourth aspect of the present application, an electronic device is provided, connected to a first camera and a second camera, where the electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for generating a depth image based on a camera module when executing the computer program.

In a fifth aspect of the application, a vehicle is provided, the vehicle comprising an electronic device as described above.

According to the technical scheme, the relative pose of the camera is monitored in real time, the micro-transformation quantity of the camera state can be estimated in real time, the relative pose change parameters of each camera are corrected through the transformation quantity, and the accuracy of the relative pose of the camera is improved, so that the problem that the real depth information cannot be accurately obtained only by adopting the offline calibrated camera parameters due to the change of the camera pose caused by the vibration of the vehicle in the running process of the running device is solved.

Additional features and advantages of embodiments of the application will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the embodiments of the application. In the drawings:

fig. 1 is a flowchart of a depth image generating method based on a camera module according to a preferred embodiment of the present application;

FIG. 2 is a schematic diagram of a distributed camera module according to a preferred embodiment of the present application;

FIG. 3 is a schematic view of pose adjustment logic provided by the preferred embodiment of the present application;

fig. 4 is a schematic block diagram of a depth image generating apparatus based on a camera module according to a preferred embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present application.

Description of the reference numerals

10-electronic device, 100-processor, 101-memory, 102-computer program.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the detailed description described herein is merely for illustrating and explaining the embodiments of the present application, and is not intended to limit the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of national laws and regulations. The technical solutions of the embodiments of the present application may be combined with each other, but it is necessary to base the implementation of those skilled in the art, and when the technical solutions are contradictory or cannot be implemented, it should be considered that the combination of the technical solutions does not exist and is not within the scope of protection claimed by the present application.

As described in the background art, the relative pose between the cameras on the automatic driving automobile is usually calibrated offline in advance, but the relative pose between the cameras can change with the increase of the service time, for example, the automobile can shake during the driving process, so that the relative pose between the cameras changes, and the change of the relative pose between the cameras can cause the reduction of the perceived accuracy of the automatic driving automobile to the external environment or the wrong perception of the automatic driving automobile, so that the automatic driving automobile has a certain potential safety hazard.

In order to solve the above-mentioned problems, as shown in fig. 1, in an embodiment of the present application, a depth image generating method based on a camera module is provided, wherein a distributed camera module at least includes a first camera and a second camera, and the method includes:

In the running process of the running device, a first image is acquired through a first camera, a second image is acquired through a second camera, and a target area with overlapping view fields exists between the first camera and the second camera;

Thus, through the above technical scheme, the embodiment can estimate the tiny conversion amount of the camera state in real time by monitoring the relative pose of the camera in real time, correct the relative pose change parameters of each camera through the conversion amounts, and improve the accuracy of the relative pose of the camera, thereby solving the problem that the real depth information cannot be accurately obtained only by adopting the offline calibrated camera parameters due to the change of the camera pose caused by the vibration of the vehicle in the running process of the running device.

As shown in fig. 2, in the present embodiment, the driving device may be, but not limited to, an autopilot, and the distributed camera modules may be mounted on the front, right, rear, and left sides of the autopilot, respectively, DP1, DP2, DP3, and DP4, respectively, and it is understood that the distributed camera modules may be mounted on other positions of the autopilot. The distributed camera module comprises a depth image acquisition device and a distributed processor connected with the depth image acquisition device, wherein the depth image acquisition device can be a monocular, binocular, trinocular or more purpose camera or a fisheye camera, a wide-angle camera, a depth camera and other devices capable of directly acquiring an image or indirectly acquiring depth information of the image. The distributed camera module of the embodiment comprises a binocular camera and a distributed processor connected with the binocular camera, wherein the binocular camera is used for collecting environmental images around a vehicle, in the embodiment, the first camera is a left-eye camera of the binocular camera in the distributed camera module, and the second camera is a right-eye camera of the binocular camera in the distributed camera module; the distributed processor is used for calibrating the relative pose of the left eye camera and the right eye camera of the binocular camera based on the environmental image acquired by the binocular camera, generating a depth image of a target area according to the acquired environmental image, and transmitting the depth image of the target area and the environmental image to the CPU for further processing.

To further improve the accuracy of the environmental perception, the method further comprises, before the first image is acquired by the first camera and the second image is acquired by the second camera: camera times of the first camera and the second camera are synchronized in response to the clock synchronization signal. After the vehicle system is started, the central processing unit sends clock synchronization signals to all the distributed camera modules, and each distributed camera module collects clock data of the camera after receiving the clock signals so as to perform time synchronization between the left-eye camera and the right-eye camera of the binocular camera and perform camera time synchronization between the binocular cameras.

Specifically, in the running process of the vehicle, the left-eye camera and the right-eye camera acquire a first image and a second image representing the external environment of the vehicle in real time, and it can be understood that the first image and the second image are acquired in parallel, namely, the first image and the second image are acquired at the same time, wherein the first image and the second image are RGB images. After the distributed processor receives the first image and the second image, feature point extraction is performed on the first image and the second image in parallel, wherein the feature point extraction can be sparse feature points extraction or dense feature points extraction, for example, the change of the pose of the binocular camera on the basis of offline calibration data can be verified by extracting feature information of known structures such as identification lines, identification plates and the like on a road, by calculating algorithms such as polar lines, homography transformation parameters and the like, or by extracting feature information of lane parallelism and road surface parallelism, and by calculating vanishing points of two lane lines at a distance. It can be understood that in order to improve accuracy of feature matching, when feature points of the first image and the second image are matched, error matching can be filtered through a RANSAC algorithm to achieve accurate feature matching, so that road surface prior information such as a marking line, a lane line and the like is utilized, feature points of the first image and the second image are calculated and compared by adopting various epipolar geometric methods, a method with the minimum error is selected to calculate relative pose of the two images in real time, and the relative pose is compared with off-line calibrated relative pose. It will be appreciated that sparse feature matching or dense feature matching is known in the art, and the matching process and the calculation process are not limited in this embodiment.

In this embodiment, the relative pose between the cameras is represented by the extrinsic rotation matrix R and the displacement matrix t of the camera, and the solving process of the relative pose between the left-eye camera and the right-eye camera, that is, R and t, is the prior art, for example, the relative pose between the left-eye camera and the right-eye camera can be solved by the following steps:

sparse feature extraction is carried out on the first image and the second image, first feature points of the first image and second feature points of the second image are respectively extracted, the extracted first feature points are matched with the second feature points, and an initial feature pair set S is obtained;

the geometric feature points are matched through the RANSAC algorithm, feature point pairs in the set S are screened, wrong matching items are filtered, a matching point set S1 corresponding to each feature point is obtained, a rotation matrix R and a displacement matrix t are calculated through the SVD singular value decomposition algorithm, but errors of R and t calculated directly through the method cannot effectively guarantee accuracy of pose calculation, and because a plurality of structured information such as parallel lane lines, indication boards and the like exist on a road, the embodiment calculates relative poses between a left camera and a right camera by detecting deviations of vanishing points of the same structured information in a first image and a second image in order to further improve accuracy of pose calculation. In this embodiment, the method for determining the first relative pose between the left-eye camera and the right-eye camera is: determining an object to be detected comprising two parallel lines extending along a shooting direction; extracting characteristic points of the object to be detected in the first image, and determining first vanishing points of the object to be detected in the first image according to characteristic point fitting; extracting characteristic points of the object to be detected in the second image, and determining second vanishing points of the object to be detected in the second image according to characteristic point fitting; and determining the relative pose between the first vanishing point and the second vanishing point, and taking the relative pose between the first vanishing point and the second vanishing point as the first relative pose between the left-eye camera and the right-eye camera.

The method for determining the first vanishing point of the target to be detected in the first image according to the characteristic point fitting comprises the following steps: obtaining characteristic lines of two parallel lines of the object to be detected in the first image according to characteristic point fitting, and determining a first vanishing point of the object to be detected in the first image based on the two characteristic lines in the first image; determining a second vanishing point of the object to be detected in the second image according to the feature point fitting, including: and obtaining characteristic lines of two parallel lines of the object to be detected in the second image according to the characteristic point fitting, and determining a second vanishing point of the object to be detected in the second image based on the two characteristic lines in the second image.

In a specific example of this embodiment, the object to be detected is a lane, first, the characteristic points of the lane mark in the first image are extracted by an image segmentation algorithm, and then two parallel lane lines are obtained by a characteristic point fitting algorithm, so that the first vanishing point of the lane line in the first image can be determined according to the obtained lane lines; similarly, a second vanishing point of the lane line in the second image is obtained, and it can be understood that the first vanishing point and the second vanishing point are different representations of the same vanishing point of the same object to be detected in the first image and the second image.

In order to improve the detection accuracy, before determining the vanishing point of the target to be detected, the present embodiment further screens the extracted feature points by:

acquiring homogeneous coordinates of the characteristic points in the first image or the second image, determining a homography matrix of the characteristic points according to the homogeneous coordinates of the characteristic points in the first image or the second image, and constructing a projection model for projecting the characteristic points to a preset projection plane according to the homogeneous coordinates of the characteristic points in the first image or the second image and the homography matrix of the characteristic points; and screening out all feature points meeting the constraint conditions through the projection model by taking the minimum projection distance as the constraint conditions.

In this embodiment, feature points are screened based on the RANSAC algorithm, specifically, n feature points (X1, X2) to be matched are randomly acquired, and a projection model x1=hxx2 is constructed, where n may be, for example, 4. In this embodiment, X1 is a homogeneous coordinate of a feature point in the original image on the original image, X2 is a homogeneous coordinate of the feature point in the original image on the projection plane, and X2 is a homography matrix of a transformation matrix, that is, X1, and by taking a minimum distance between X1 and the projection plane as a constraint condition, all feature points are tested by a projection model x1=hxx2, and feature points satisfying the model are marked as inner points, otherwise, outer points are filtered, so that the fitting precision of the feature points can be effectively improved, and an object to be detected such as a lane line can be more accurately identified.

As shown in fig. 3, after obtaining a rotation matrix R and a displacement matrix t of the left eye camera and the right eye camera, adjusting the initial relative pose to obtain a second relative pose when a difference between the first relative pose and the initial relative pose is greater than a preset pose change threshold, including:

under the condition that the difference value between the first relative pose and the initial relative pose is larger than the pose change threshold, the initial relative pose is adjusted according to a preset pose adjustment step length, and a second relative pose of the first camera and the second camera is obtained; and if the difference value between the second relative pose and the first relative pose is greater than the pose change threshold, adjusting the second relative pose by the pose adjustment step length until the difference value between the second relative pose and the first relative pose is not greater than the pose change threshold, and taking the second relative pose as a new initial relative pose.

The first relative pose between the left-eye camera and the right-eye camera calculated in real time through the steps is made to be a rotation matrix R ₁₁ And a displacement matrix t ₁₁ The off-line calibration parameter between the left-eye camera and the right-eye camera, namely the initial relative pose is R ₁₂ ，t ₁₂ Wherein R is ₁₂ For rotating matrix, t ₁₂ For the displacement matrix, R obtained by real-time calculation is determined ₁₁ ，t ₁₁ With off-line calibration parameter R ₁₂ ，t ₁₂ The difference between them δT, if δT is lower than the preset pose change threshold, the shake between the left eye camera and the right eye camera is considered to be negligible, if δT is higher than the preset pose change threshold, then R is ₁₂ ，t ₁₂ Camera perturbation ΔR, Δt for increasing the set step size on the basis of (1), for R ₁₂ ，t ₁₂ Updating to obtain a second relative pose, wherein a rotation matrix and a displacement matrix of the second relative pose are R 'respectively' ₁₂ ＝R ₁₂ *ΔR，t′ ₁₂ ＝t ₁₂ +R ₁₂ Δt. Further judge R' ₁₂ ，t′ ₁₂ And R is R ₁₁ ，t ₁₁ If the difference value is higher than the preset pose change threshold, continuing to change at R 'if the difference value is higher than the preset pose change threshold' ₁₂ ，t′ ₁₂ Camera perturbation ΔR, Δt, for R 'for increasing the set step size on the basis of (1)' ₁₂ ，t′ ₁₂ Updating and judging R 'after updating' ₁₂ ，t′ ₁₂ And R is R ₁₁ ，t ₁₁ Whether the difference value is higher than a preset pose change threshold value or not, if so, the updated R' ₁₂ ，t′ ₁₂ And R is R ₁₁ ，t ₁₁ If the difference value is still higher than the preset pose change threshold value, repeating the process until the updated R' ₁₂ ，t′ ₁₂ And R is R ₁₁ ，t ₁₁ The difference value is not larger than the preset pose change threshold value so as to update the second relative pose R' ₁₂ ，t′ ₁₂ As a new initial relative pose. It can be understood that after the on-line calibration of the binocular camera, the polar correction, the distortion correction and the like can be further performed on the binocular camera, and meanwhile, the method of the embodiment is not only suitable for correcting the external parameters between the left-eye camera and the right-eye camera, but also suitable for correcting the external parameters between the binocular camera, and is also suitable for correcting the internal parameters of the camera in the same way.

After the rotation matrix and the displacement matrix are obtained after correction between the binocular cameras, depth estimation needs to be performed based on binocular images output by the binocular cameras, namely a first image and a second image, and the method mainly comprises coarse precision depth estimation and fine precision depth optimization estimation and comprises the following specific processes:

coarse precision depth estimation:

the method comprises two processing methods, namely a sparse feature estimation method and a dense feature estimation method, which are respectively used for sparse point cloud depth estimation and MVS dense point cloud depth estimation. The sparse feature estimation method is to extract sparse features from two images respectively, perform feature matching by adopting an optical flow or feature matching method, and base on a corrected rotation matrix and a corrected displacement matrix R '' ₁₂ ＝R ₁₂ *ΔR，t′ ₁₂ ＝t ₁₂ +R ₁₂ Deltat and binocular solid geometry principle to calculate depth information of successfully matched feature points, therebyObtaining a sparse depth map of the target area; the dense feature estimation method utilizes the characteristic of unchanged image brightness to find a matching pair of pixel points between two frames, and is based on a corrected rotation matrix and a corrected displacement matrix R '' ₁₂ ＝R ₁₂ *ΔR，t′ ₁₂ ＝t ₁₂ +R ₁₂ And delta t calculating the depth of the point cloud of the successfully matched feature points, and obtaining a dense depth map of the target area through the principle of trigonometry of solid geometry after the matching information of each feature point is established. It will be appreciated that the depth information of the feature points calculated based on the rotation matrix and the displacement matrix is the prior art, and the calculation process is not limited herein.

In a preferred embodiment, in order to reduce the calculation amount and improve the calculation accuracy of the point cloud depth, the initial matching range of the target area may be determined by performing sparse feature matching on the binocular image, for example, a set of images includes 30 ten thousand pixels, the extracted sparse feature points are 500 pixels, after the sparse feature points are matched, the size of the initial matching range determined based on the matched sparse feature points is m×n, in order to reduce the error of sparse feature matching and improve the calculation accuracy of the point cloud, on the basis of the determined initial matching range, the size of the matching range after the initial matching range is expanded is (m+i) ×n+j), dense feature matching is performed in the expanded matching range, for example, in the expanded matching range at this time, the matched dense feature is 2 ten thousand pixels, the point cloud data of the target area is obtained by calculating the point cloud depth of each feature point, and then, based on the corrected rotational matrix and displacement matrix, the initial matching range is converted by using the three-dimensional triangulated data, so that the point cloud depth of the target area can be effectively converted to the point cloud depth of the target area, and the depth can be better understood than the point cloud depth of the existing image at the same time.

Precision depth optimization estimation:

since the initial depth map of the target area obtained by the above steps is a sparse/dense depth map, but not a full depth map, holes may be generated due to rotation or shielding of the image during the mapping process, and no pixel values are stored in the pixel points corresponding to the holes, so that holes are formed, and therefore, in order to obtain a complete depth image, the holes need to be complemented.

Specifically, the method of the present embodiment further includes: extracting first semantic information of the first image and second semantic information of the second image; determining a depth map of the target region based on the second relative pose, comprising: a depth map of the target region is determined based on the second relative pose, the first semantic information, and the second semantic information.

In practical situations, because various factors such as less texture and exposure exist on the road, and many holes exist in the obtained depth map, the embodiment can effectively complement the holes in the depth map by extracting semantic information from the object to be detected in the image and further judging the attribute of different areas in the image and combining the semantic information. It can be understood that the target to be detected in the first image and the target to be detected in the second image may be the same target to be detected, the semantic information of the target to be detected is used to represent the attribute of the target to be detected, for example, the target to be detected is a lane, after the feature point of the target to be detected is extracted, the image recognition is performed on the target to be detected, the semantic information is determined to be a "lane", so that the attribute of the detection area can be known to be a plane through the determined semantic information, and further, the hole completion of the depth map can be realized by constructing a plane equation.

In this embodiment, determining the depth map of the target region based on the second relative pose, the first semantic information, and the second semantic information includes:

generating an initial depth map of the target area according to the characteristic points of the first image, the characteristic points of the second image and the second relative pose; establishing a geometric model corresponding to the first semantic information and establishing a geometric model corresponding to the second semantic information; and complementing the cavity area of the initial depth map by the geometric model corresponding to the first semantic information and the geometric model corresponding to the second semantic information to obtain the depth map of the target area.

Taking the object to be detected as a lane as an example, determining the attribute of the object to be detected By acquiring semantic information of the object to be detected, for example, knowing that a certain area in an image is a road or other plane information, a plane model ax+by+cz=d can be established, and fitting or filling the empty hole area is performed, so that a depth map of the object area is obtained. It can be understood that the number of the objects to be detected can be 1 or more, and when the attribute of the objects to be detected is determined to be a curved surface according to the semantic information, a corresponding curved surface model is established to fit or fill the hollow area. The hole filling method may be, but not limited to, hole filling directly according to the average value of the surrounding pixels, or hole filling through an image restoration algorithm of weighted analysis. It can be understood that the extraction of semantic information and the precision depth optimization estimation can be realized based on the existing convolutional neural network CNN, and the depth estimation is realized through the CNN.

Wherein, the depth value of each characteristic point can be obtained by the following steps: firstly, determining the matching relation of feature points in a first image and a second image, calculating the parallax d=x2-x 1 in the horizontal direction of the corresponding feature points, wherein x1 and x2 are the horizontal distances of the feature points in the first image and the second image respectively, and according to a formulaAnd calculating the depth value of the feature point, wherein B is a base line, and f is the focal length of the camera.

In order to further improve the calculation accuracy of the depth information of the feature points, the embodiment corrects the obtained point cloud depth of the feature points based on the semantic information corresponding to the feature points, and takes the semantic information corresponding to the feature points as a lane as an example, regards the lane as a plane, and constructs a corresponding plane equation: ax+by+cz+d=0, and the pixel points in the pixel area are projected onto a preset projection plane, then there are: d= |ax+by+cz+d|/v (a) ² +B ² +C ² ) Wherein d is the projection distance of the pixel point projected to the projection plane, A, B, C, D is a constant, x, y and z are coordinates of the pixel point, so that the projection distance d is minimum, and a depth image of the target area is constructedOptimizing the depth information of sparse/dense feature points in the initial depth map of the target region based on the constructed cost function, specifically, optimizing the depth information of all feature points by taking the value of d meeting the cost function as the depth value of the feature point for the sparse/dense feature points in the initial depth map, updating the point cloud data of the target region, and converting the updated point cloud data into a final depth map of the target region.

As shown in fig. 3, a second aspect of the present application provides a depth image generating apparatus based on a camera module, the camera module at least includes a first camera and a second camera, the apparatus includes:

an image acquisition module configured to acquire a first image by a first camera and a second image by a second camera, the first camera having a target area overlapping a field of view of the second camera;

the pose determining module is configured to perform feature point matching on the first image and the second image to determine a first relative pose between the first camera and the second camera;

A third aspect of the application provides a machine-readable storage medium having instructions stored thereon that, when executed by a processor, cause the processor to be configured to perform the camera module-based depth image generation method described above.

Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

A fourth aspect of the present application provides an electronic device connected to a first camera and a second camera, the electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for generating a depth image based on a camera module when executing the computer program.

Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 10 of this embodiment includes: a processor 100, a memory 101, and a computer program 102 stored in the memory 101 and executable on the processor 100. The steps of the method embodiments described above are implemented by the processor 100 when executing the computer program 102. Alternatively, the processor 100, when executing the computer program 102, performs the functions of the modules/units of the apparatus embodiments described above.

By way of example, computer program 102 may be partitioned into one or more modules/units that are stored in memory 101 and executed by processor 100 to accomplish the present invention. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 102 in the terminal device 10. For example, the computer program 102 may be partitioned into an image acquisition module, a pose determination module, and a depth map generation module.

The electronic device 10 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 10 may include, but is not limited to, a processor 100, a memory 101. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 10 and is not intended to limit the electronic device 10, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.

The processor 100 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may be an internal storage unit of the electronic device 10, such as a hard disk or a memory of the electronic device 10. The memory 101 may also be an external storage device of the electronic device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 10. Further, the memory 101 may also include both internal storage units and external storage devices of the electronic device 10. The memory 101 is used to store computer programs and other programs and data required by the electronic device 10. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In a fifth aspect of the application there is provided a vehicle comprising an electronic device as described above.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A depth image generating method based on a camera module, the camera module at least comprising a first camera and a second camera, the method comprising:

2. The camera module-based depth image generating method according to claim 1, wherein adjusting the initial relative pose to obtain a second relative pose if a difference between the first relative pose and the initial relative pose is greater than a preset pose change threshold value, comprises:

3. The camera module-based depth image generating method of claim 1, wherein feature point matching the first image and the second image to determine a first relative pose between the first camera and the second camera comprises:

4. A camera module based depth image generating method according to claim 3, wherein determining a first vanishing point of the object to be detected in the first image according to feature point fitting comprises:

5. The camera module-based depth image generating method according to claim 1, further comprising:

6. The camera module-based depth image generating method according to claim 5, wherein the determining the depth map of the target region based on the second relative pose, the first semantic information, and the second semantic information comprises:

7. The camera module-based depth image generating method according to claim 1, wherein before the first image is acquired by the first camera and the second image is acquired by the second camera, the method further comprises:

8. A depth image generating device based on a camera module, wherein the camera module comprises at least a first camera and a second camera, the device comprising:

9. A machine-readable storage medium having instructions stored thereon, which when executed by a processor cause the processor to be configured to perform the depth image generation method of any one of claims 1 to 7.

10. An electronic device connected to a first camera and a second camera, the electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the camera module based depth image generation method of any one of claims 1 to 7 when executing the computer program.

11. A vehicle, characterized in that it comprises an electronic device according to claim 10.