CN106203429A

CN106203429A - Based on the shelter target detection method under binocular stereo vision complex background

Info

Publication number: CN106203429A
Application number: CN201610530766.XA
Authority: CN
Inventors: 杨涛; 贺战男; 任强; 张艳宁; 李广坡; 刘小飞
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2016-07-06
Filing date: 2016-07-06
Publication date: 2016-12-07

Abstract

本发明公开了一种基于双目立体视觉复杂背景下的遮挡目标检测方法，用于解决现有遮挡目标检测方法检测精度差的技术问题。技术方案是首先标定双目相机得到行对准的校正图像，然后通过立体匹配得到视差图进行背景建模，计算场景三维坐标生成俯视投影图，最后用MeanShift方法对俯视投影图进行聚类得到检测结果。本发明利用空间三维信息，有效地解决了目标遮挡、场景光线变化、阴影以及复杂背景的干扰等单目视觉中的技术问题，提高了检测精度。The invention discloses an occlusion target detection method based on binocular stereo vision in a complex background, which is used to solve the technical problem of poor detection accuracy of the existing occlusion target detection method. The technical solution is to first calibrate the binocular camera to obtain the corrected image for line alignment, then obtain the disparity map through stereo matching for background modeling, calculate the three-dimensional coordinates of the scene to generate a top-view projection map, and finally use the MeanShift method to cluster the top-view projection map to obtain detection result. The invention utilizes spatial three-dimensional information to effectively solve technical problems in monocular vision such as object occlusion, scene light changes, shadows, and interference from complex backgrounds, and improves detection accuracy.

Description

Occluded object detection method in complex background based on binocular stereo vision

技术领域technical field

本发明涉及一种遮挡目标检测方法，特别是涉及一种基于双目立体视觉复杂背景下的遮挡目标检测方法。The invention relates to a method for detecting an occluded target, in particular to a method for detecting an occluded target based on a complex background of binocular stereo vision.

背景技术Background technique

传统的运动目标检测大多基于单目视觉的方法，相对于立体视觉而言单目视觉有其优点，但也存在很大缺陷。单目视觉的信息量小，每次只需处理一幅图像，运算速度相对较快，但是图像在投影过程中丢失了实际场景的三维信息，因此有着不可弥补的缺陷。在采用基于单目视觉的方法进行运动目标检测中，如文献“Effective Gaussian mixturelearning for video background subtraction.Pattern Analysis and MachineIntelligence,IEEE Transactions on,2005,27(5):827-832”常常存在着目标遮挡以及周围场景光线的变化和阴影的干扰等问题，如何解决这些问题一直是研究的难点。针对这些问题许多学者做了大量研究，针对遮挡问题提出来基于目标特征匹配的算法，多子模板匹配的算法，针对光线变化和阴影干扰等问题，提出多高斯背景模型和阴影消除算法等，但这些方法受环境因素的影响较大，在实际应用中很容易出现目标检测失败的现象。Most of the traditional moving target detection methods are based on monocular vision. Compared with stereo vision, monocular vision has its advantages, but it also has great defects. Monocular vision has a small amount of information, and only needs to process one image at a time, and the calculation speed is relatively fast, but the image loses the three-dimensional information of the actual scene during the projection process, so it has irreparable defects. In the detection of moving targets based on monocular vision, such as the document "Effective Gaussian mixture learning for video background subtraction. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2005, 27(5): 827-832", there is often target occlusion As well as the change of light in the surrounding scene and the interference of shadows, how to solve these problems has always been a difficult point in research. Many scholars have done a lot of research on these problems, proposed algorithms based on target feature matching, multi-sub-template matching algorithms for occlusion problems, proposed multi-Gaussian background models and shadow elimination algorithms for problems such as light changes and shadow interference, etc., but These methods are greatly affected by environmental factors, and it is easy to fail in target detection in practical applications.

发明内容Contents of the invention

为了克服现有遮挡目标检测方法检测精度差的不足，本发明提供一种基于双目立体视觉复杂背景下的遮挡目标检测方法。该方法首先标定双目相机得到行对准的校正图像，然后通过立体匹配得到视差图进行背景建模，计算场景三维坐标生成俯视投影图，最后用MeanShift方法对俯视投影图进行聚类得到检测结果。本发明利用空间三维信息，有效地解决了目标遮挡、场景光线变化、阴影以及复杂背景的干扰等单目视觉中的技术问题，提高了检测精度。In order to overcome the deficiency of poor detection accuracy of existing occlusion target detection methods, the present invention provides a occlusion target detection method based on binocular stereo vision in a complex background. This method firstly calibrates the binocular camera to obtain the corrected image for line alignment, then obtains the disparity map through stereo matching for background modeling, calculates the three-dimensional coordinates of the scene to generate the top view projection map, and finally uses the MeanShift method to cluster the top view projection map to obtain the detection results . The invention utilizes spatial three-dimensional information to effectively solve technical problems in monocular vision such as object occlusion, scene light changes, shadows, and interference from complex backgrounds, and improves detection accuracy.

本发明解决其技术问题所采用的技术方案是：一种基于双目立体视觉复杂背景下的遮挡目标检测方法，其特点是包括以下步骤：The technical solution adopted by the present invention to solve the technical problem is: a method for detecting an occluded target based on a complex background of binocular stereo vision, which is characterized in that it comprises the following steps:

步骤一、双目相机标定。Step 1, binocular camera calibration.

首先采用张正友棋盘格标定方法，分别拍摄多张棋盘格图像，用以标定两个相机的内部参数M₁，外部参数M₂通过拍摄放置在地面的标定板图像解算。图像坐标(u,v)与世界坐标(X_w,Y_w,Z_w)的齐次转换关系如下：Firstly, Zhang Zhengyou’s checkerboard calibration method is used to shoot multiple checkerboard images to calibrate the internal parameter M ₁ of the two cameras, and the external parameter M ₂ is calculated by taking images of the calibration board placed on the ground. The homogeneous transformation relationship between image coordinates (u, v) and world coordinates (X _w , Y _w , Z _w ) is as follows:

$\begin{matrix} {z z}_{c c} [\begin{matrix} u u \\ v v \\ 11 \end{matrix}] = = [\begin{matrix} \frac{11}{{d d}_{x x}} & 00 & {u u}_{00} \\ 00 & \frac{11}{{d d}_{y the y}} & {v v}_{00} \\ 00 & 00 & 11 \end{matrix}] [\begin{matrix} f f & 00 & 00 & 00 \\ 00 & f f & 00 & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] = = \\ = = [\begin{matrix} {f f}_{u u} & 00 & {u u}_{00} & 00 \\ 00 & {f f}_{v v} & {v v}_{00} & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] = = {M m}_{11} {M m}_{22} [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] \end{matrix} - - - - - - ((11))$

其中，f为相机焦距，(u₀,v₀)为图像主点坐标，d_x和d_y分别表示每个像素在横轴x和纵轴y上的物理坐标。Among them, f is the focal length of the camera, (u ₀ , v ₀ ) is the coordinates of the principal point of the image, d _x and d _y represent the physical coordinates of each pixel on the horizontal axis x and vertical axis y, respectively.

然后立体标定计算空间上两台摄像机P₁和P₂的几何关系，即两个相机之间的旋转矩阵R和平移矩阵T。选择右相机作为参考相机。关系如下：Then the stereo calibration calculates the geometric relationship of the two cameras P ₁ and P ₂ in space, that is, the rotation matrix R and the translation matrix T between the two cameras. Select the right camera as the reference camera. The relationship is as follows:

P₁＝R*(P₂-T) (2)P ₁ =R*(P ₂ −T) (2)

最后通过非标定立体校正HartLey算法得到行对准的校正图像。要求双目相机采集图像同步。Finally, the line-aligned corrected image is obtained by uncalibrated stereo correction HartLey algorithm. It is required that the binocular camera acquires images synchronously.

步骤二、立体匹配获取视差。Step 2: Stereo matching to obtain disparity.

通过双目立体匹配计算左右相机视图之间的匹配点，得到视差图，选择高斯混合建模方法对视差图进行背景建模，消除复杂背景对目标检测的干扰。根据视差，基线和内参，采用三角测量计算场景三维坐标。选取以地面为XOY面的世界坐标系，将三维点投影到地面，投影到某个像素点的三维点的个数作为该像素点的颜色值，生成俯视投影图。The matching points between the left and right camera views are calculated through binocular stereo matching to obtain a disparity map, and the Gaussian mixture modeling method is selected to model the background of the disparity map to eliminate the interference of complex backgrounds on target detection. According to the disparity, baseline and internal parameters, triangulation is used to calculate the three-dimensional coordinates of the scene. Select the world coordinate system with the ground as the XOY plane, project the 3D points onto the ground, and use the number of 3D points projected to a certain pixel as the color value of the pixel to generate a top view projection map.

步骤三、俯视投影图聚类。Step 3: Clustering the top view projection map.

x处的概率密度为f_h,k(x)：The probability density at x is f _h,k (x):

${f f}_{h h,, k k} ((x x)) = = {Σ Σ}_{i i = = 11}^{n no} K K ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | | |)) - - - - - - ((33))$

其中，K(x)为核函数，h为半径Among them, K(x) is the kernel function, h is the radius

要使得f_h,k(x)最大，对f_h,k(x)求导得其中g(s)＝-k'(s)，To maximize f _h _,k (x), derive f h,k (x) where g(s)=-k'(s),

$\begin{matrix} &dtri; &dtri; {f f}_{h h,, k k} = = {Σ Σ}_{i i = = 11}^{n no} ((x x - - {x x}_{i i})) g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22})) \\ = = [[{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))]] [[\frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - x x]] \end{matrix} - - - - - - ((44))$

令：make:

${m m}_{h h,, g g} ((x x)) = = \frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - x x - - - - - - ((55))$

要使得当且仅当m_h,g(x)＝0，得出新的圆心坐标：to make If and only if m _h,g (x)=0, the new coordinates of the center of the circle are obtained:

$x x = = \frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - - - - - ((66))$

由于投影图的特殊性，仅考虑像素点的距离无法得到准确的聚类结果，计算概率密度时，需要满足：(a)像素点的颜色值与中心像素点的颜色值越相近，概率密度越高；(b)离中心点的位置越近的像素点，概率密度越高。因此，选择核函数K_h(x)：Due to the particularity of the projection map, accurate clustering results cannot be obtained only by considering the distance of the pixels. When calculating the probability density, it needs to satisfy: (a) the closer the color value of the pixel point is to the color value of the central pixel point, the higher the probability density is. High; (b) The closer the pixel is to the center point, the higher the probability density. Therefore, the kernel function K _h (x) is chosen:

${K K}_{h h} ((x x)) = = K K ((| | | | \frac{{x x}^{s the s} - - {x x}_{i i}^{s the s}}{h h} | | | |)) * * K K ((| | | | \frac{{x x}^{r r} - - {x x}_{i i}^{r r}}{h h} | | | |)) - - - - - - ((77))$

MeanShift聚类后，每一类代表一个目标。将这一结果投影到原右图像中显示最终的检测结果。After MeanShift clustering, each class represents a target. Projecting this result into the original right image shows the final detection result.

本发明的有益效果是：该方法首先标定双目相机得到行对准的校正图像，然后通过立体匹配得到视差图进行背景建模，计算场景三维坐标生成俯视投影图，最后用MeanShift方法对俯视投影图进行聚类得到检测结果。本发明利用空间三维信息，有效地解决了目标遮挡、场景光线变化、阴影以及复杂背景的干扰等单目视觉中的技术问题，提高了检测精度。The beneficial effects of the present invention are: the method first calibrates the binocular camera to obtain the corrected image for line alignment, then obtains the disparity map through stereo matching for background modeling, calculates the three-dimensional coordinates of the scene to generate a top view projection map, and finally uses the MeanShift method to perform the top view projection The graph is clustered to get the detection results. The invention utilizes spatial three-dimensional information to effectively solve technical problems in monocular vision such as object occlusion, scene light changes, shadows, and interference from complex backgrounds, and improves detection accuracy.

下面结合具体实施方式对本发明作详细说明。The present invention will be described in detail below in combination with specific embodiments.

具体实施方式detailed description

本发明基于双目立体视觉复杂背景下的遮挡目标检测方法具体步骤如下：The present invention is based on binocular stereo vision occlusion target detection method under complex background specific steps are as follows:

步骤一、双目相机标定。Step 1, binocular camera calibration.

首先采用张正友的棋盘格标定方法，分别拍摄20张左右棋盘格图像，用以标定两个相机的内部参数M₁，外部参数M₂通过拍摄放置在地面的标定板图像解算。图像坐标(u,v)与世界坐标(X_w,Y_w,Z_w)的齐次转换关系如下：Firstly, Zhang Zhengyou’s checkerboard calibration method is adopted, and about 20 checkerboard images are taken respectively to calibrate the internal parameter M ₁ of the two cameras, and the external parameter M ₂ is calculated by taking images of the calibration board placed on the ground. The homogeneous transformation relationship between image coordinates (u, v) and world coordinates (X _w , Y _w , Z _w ) is as follows:

$\begin{matrix} {z z}_{c c} [\begin{matrix} u u \\ v v \\ 11 \end{matrix}] = = [\begin{matrix} \frac{11}{{d d}_{x x}} & 00 & {u u}_{00} \\ 00 & \frac{11}{{d d}_{y the y}} & {v v}_{00} \\ 00 & 00 & 11 \end{matrix}] [\begin{matrix} f f & 00 & 00 & 00 \\ 00 & f f & 00 & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] \\ = = [\begin{matrix} {f f}_{u u} & 00 & {u u}_{00} & 00 \\ 00 & {f f}_{v v} & {v v}_{00} & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] = = {M m}_{11} {M m}_{22} [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] \end{matrix} - - - - - - ((11))$

其中f为相机焦距，(u₀,v₀)为图像主点坐标，d_x和d_y分别表示每个像素在横轴x和纵轴y上的物理坐标，这些参数均可通过相机标定获取。Where f is the focal length of the camera, (u ₀ , v ₀ ) is the coordinates of the principal point of the image, d _x and d _y represent the physical coordinates of each pixel on the horizontal axis x and vertical axis y, respectively, and these parameters can be obtained through camera calibration .

P₁＝R*(P₂-T) (2)P ₁ =R*(P ₂ −T) (2)

通过双目立体匹配计算左右相机视图之间的匹配点，得到视差图，选择高斯混合建模方法对视差图进行背景建模，以消除复杂背景对目标检测的干扰。根据视差，基线和内参，采用三角测量计算场景三维坐标。选取以地面为XOY面的世界坐标系，将三维点投影到地面，投影到某个像素点的三维点的个数作为该像素点的颜色值，生成俯视投影图。The matching points between the left and right camera views are calculated by binocular stereo matching to obtain a disparity map, and the Gaussian mixture modeling method is selected to model the background of the disparity map to eliminate the interference of complex backgrounds on target detection. According to the disparity, baseline and intrinsic parameters, triangulation is used to calculate the three-dimensional coordinates of the scene. Select the world coordinate system with the ground as the XOY plane, project the 3D points onto the ground, and use the number of 3D points projected to a certain pixel as the color value of the pixel to generate a top view projection map.

x处的概率密度为f_h,k(x)：The probability density at x is f _h,k (x):

其中K(x)为核函数，h为半径Where K(x) is the kernel function and h is the radius

令：make:

要使得当且仅当m_h,g(x)＝0，可以得出新的圆心坐标：to make If and only if m _h,g (x)=0, the new coordinates of the center of the circle can be obtained:

Claims

1. a kind of occlusion target detection method based on binocular stereo vision complex background, it is characterized in that comprising the following steps:

Step 1, binocular camera calibration;

Firstly, Zhang Zhengyou’s checkerboard calibration method is used to shoot multiple checkerboard images to calibrate the internal parameter M ₁ of the two cameras, and the external parameter M ₂ is calculated by shooting the image of the calibration board placed on the ground; the image coordinates (u,v ) and the world coordinates (X _w , Y _w , Z _w ) have the following homogeneous transformation relations:

\begin{matrix} {z z}_{c c} [\begin{matrix} u u \\ v v \\ 11 \end{matrix}] = = [\begin{matrix} \frac{11}{{d d}_{x x}} & 00 & {u u}_{00} \\ 00 & \frac{11}{{d d}_{y the y}} & {v v}_{00} \\ 00 & 00 & 11 \end{matrix}] [\begin{matrix} f f & 00 & 00 & 00 \\ 00 & f f & 00 & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] \\ = = [\begin{matrix} {f f}_{u u} & 00 & {u u}_{00} & 00 \\ 00 & {f f}_{v v} & {v v}_{00} & 00 \\ 00 & 00 & 11 & 00 \end{matrix}] [\begin{matrix} {R R}_{C C} & {T T}_{C C} \\ \overset{&RightArrow; &Right Arrow;}{00} & 11 \end{matrix}] [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] = = {M m}_{11} {M m}_{22} [\begin{matrix} {X x}_{w w} \\ {Y Y}_{w w} \\ {Z Z}_{w w} \\ 11 \end{matrix}] \end{matrix} - - - - - - ((11))

Among them, f is the focal length of the camera, (u ₀ , v ₀ ) is the coordinates of the principal point of the image, d _x and d _y represent the physical coordinates of each pixel on the horizontal axis x and vertical axis y, respectively;

Then the stereo calibration calculates the geometric relationship between the two cameras P ₁ and P ₂ in space, that is, the rotation matrix R and the translation matrix T between the two cameras; the right camera is selected as the reference camera; the relationship is as follows:

P ₁ =R*(P ₂ −T) (2)

Finally, the corrected image for line alignment is obtained through the uncalibrated stereo correction HartLey algorithm; the binocular camera is required to collect images synchronously;

Step 2, Stereo matching to obtain parallax;

Calculate the matching points between the left and right camera views through binocular stereo matching to obtain the disparity map, and choose the Gaussian mixture modeling method to model the background of the disparity map to eliminate the interference of the complex background on the target detection; according to the disparity, baseline and internal reference, use Triangulation calculates the three-dimensional coordinates of the scene; select the world coordinate system with the ground as the XOY plane, project the three-dimensional points onto the ground, and use the number of three-dimensional points projected to a certain pixel as the color value of the pixel to generate a top view projection map;

Step 3, clustering the top view projection map;

The probability density at x is f _h,k (x):

{f f}_{h h,, k k} ((x x)) = = {Σ Σ}_{i i = = 11}^{n no} K K ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | | |)) - - - - - - ((33))

Among them, K(x) is the kernel function, h is the radius

To maximize f _h _,k (x), derive f h,k (x) where g(s)=-k'(s),

\begin{matrix} &dtri; &dtri; {f f}_{h h,, k k} = = {Σ Σ}_{i i = = 11}^{n no} ((x x - - {x x}_{i i})) g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22})) \\ = = [[{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))]] [[\frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - x x]] \end{matrix} - - - - - - ((44))

make:

{m m}_{h h,, g g} ((x x)) = = \frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - x x - - - - - - ((55))

to make If and only if m _h,g (x)=0, the new coordinates of the center of the circle are obtained:

x x = = \frac{{Σ Σ}_{i i = = 11}^{n no} {x x}_{i i} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))}{{Σ Σ}_{i i = = 11}^{n no} g g ((| | | | \frac{x x - - {x x}_{i i}}{h h} | | {| |}^{22}))} - - - - - - ((66))

Due to the particularity of the projection map, accurate clustering results cannot be obtained only by considering the distance of the pixels. When calculating the probability density, it needs to satisfy: (a) the closer the color value of the pixel point is to the color value of the central pixel point, the higher the probability density is. High; (b) The closer the pixel is to the center point, the higher the probability density; therefore, the kernel function K _h (x) is selected:

{K K}_{h h} ((x x)) = = K K ((| | | | \frac{{x x}^{s the s} - - {x x}_{i i}^{s the s}}{h h} | | | |)) * * K K ((| | | | \frac{{x x}^{r r} - - {x x}_{i i}^{r r}}{h h} | | | |)) - - - - - - ((77))

After MeanShift clustering, each class represents a target; projecting this result into the original right image shows the final detection result.