CN107813310A

CN107813310A - One kind is based on the more gesture robot control methods of binocular vision

Info

Publication number: CN107813310A
Application number: CN201711176221.4A
Authority: CN
Inventors: 卫作龙; 夏晗; 林伟阳; 于兴虎; 佟明斯; 李湛
Original assignee: Zhejiang Youmai De Intelligent Equipment Co Ltd
Current assignee: Yu Xinghu
Priority date: 2017-11-22
Filing date: 2017-11-22
Publication date: 2018-03-20
Anticipated expiration: 2037-11-22
Also published as: CN107813310B

Abstract

The present invention relates to the field of robot control methods, in particular to a multi-gesture robot control method based on binocular vision. The teaching method has the disadvantages of large amount of calculation and high precision requirements for the accuracy of the robot model and the determination of the coordinate system. A multi-gesture robot control method based on binocular vision is proposed, including: setting binocular cameras; Rectangular box; the classifier is trained using the training sample set. The classifier detects the target; then tracks the target, and fuses the tracking result with the detection result; calculates the offset distance of the target center point moving from the initial point to the target point, and outputs a speed control command to make the robot perform translational motion; Extract the feature points in the target frame, and solve the rotation matrix corresponding to the feature points. The invention is suitable for the control method of the painting robot.

Description

A multi-gesture robot control method based on binocular vision

技术领域technical field

本发明涉及机器人控制方法，具体涉及一种基于双目视觉多手势机器人控制方法。The invention relates to a robot control method, in particular to a binocular vision-based multi-gesture robot control method.

背景技术Background technique

工业机器人的应用主要通过操作人员使用示教器，手动控制机器人的关节运动，以使机器人运动到预定的位置，同时将该位置进行记录，并传递到机器人控制器中，之后的机器人可根据指令自动重复该任务，但是目前的示教方法存在过程繁琐、效率低的问题The application of industrial robots mainly uses the operator to use the teaching pendant to manually control the joint movement of the robot, so that the robot moves to a predetermined position, and at the same time record the position and transmit it to the robot controller. After that, the robot can follow the instructions Automatically repeat the task, but the current teaching method has the problems of cumbersome process and low efficiency

目前在工业机器人示教方法上，主要有两种方法——人工示教方法和离线示教方法。人工示教是指由人工导引机器人末端执行器,或由人工操作导引机械模拟装置，或用示教盒来使机器人完成预期的动作，由于此类机器人的编程通过实时在线示教程序来实现,而机器人本身凭记忆操作,故能不断重复再现。离线示教法指先对模型进行采集，在计算机上模拟仿真编程，进行轨迹规划自动生成运动轨迹。At present, there are mainly two methods in the teaching method of industrial robots-manual teaching method and offline teaching method. Manual teaching refers to manually guiding the end effector of the robot, or guiding the mechanical simulation device by manual operation, or using the teaching box to make the robot complete the expected action, because the programming of this type of robot is realized through real-time online teaching program. Realization, and the robot itself operates by memory, so it can be reproduced continuously. The off-line teaching method refers to collecting the model first, simulating and programming on the computer, and performing trajectory planning to automatically generate motion trajectories.

现在工业机器人领域多应用示教盒，这种控制方式效率低，不直观。有基于视觉的机器人控制方法需要操作人员穿戴特定颜色的手套，只有一种控制模式，在需要进行微调时，位置和姿态指令会互相干扰，且控制空间和机器人工作空间不容易统一，导致操作不方便^[1]。对于裸手的识别主要基于颜色空间分割，此种方法受光照和背景颜色影响很大。而离线示教方法的计算量大，算法复杂，非规则边缘不便计算，而且对机器人模型的精度,以及机器人工具坐标系的确定都有很高的精度要求。Nowadays, teaching boxes are mostly used in the field of industrial robots. This control method is inefficient and not intuitive. There is a vision-based robot control method that requires the operator to wear gloves of a specific color. There is only one control mode. When fine-tuning is required, the position and attitude instructions will interfere with each other, and the control space and the robot workspace are not easy to unify, resulting in inconsistent operation. Convenience ^[1] . The recognition of bare hands is mainly based on color space segmentation, which is greatly affected by illumination and background color. However, the offline teaching method has a large amount of calculation, complex algorithms, and inconvenient calculation of irregular edges, and has high precision requirements for the accuracy of the robot model and the determination of the robot tool coordinate system.

发明内容Contents of the invention

本发明为了解决现有的基于视觉的机器人控制方法操作不便、对手的识别受光照和背景颜色影响很大，且离线示教方法计算量大、对机器人模型精度及机器人坐标系的确定有很高精度要求的缺陷，而提出一种基于双目视觉多手势机器人控制方法，包括：The present invention solves the inconvenient operation of the existing vision-based robot control method, the recognition of the opponent is greatly affected by the illumination and the background color, and the off-line teaching method has a large amount of calculation, which has a high impact on the accuracy of the robot model and the determination of the robot coordinate system. Due to the shortcomings of precision requirements, a control method for multi-gesture robots based on binocular vision is proposed, including:

步骤一、设置双目相机，并进行标定和矫正。Step 1. Set up the binocular camera, and perform calibration and correction.

步骤二、操作人员在双目相机的视野范围内进行手势演示，并在双目相机的左摄像机拍摄的视频中人工选取包含手势的矩形框，加入到训练样本集。Step 2: The operator performs a gesture demonstration within the field of view of the binocular camera, and manually selects a rectangular frame containing the gesture from the video captured by the left camera of the binocular camera, and adds it to the training sample set.

步骤三、使用训练样本集对最近邻分类器以及贝叶斯分类器进行训练。Step 3: Use the training sample set to train the nearest neighbor classifier and the Bayesian classifier.

步骤四、操作人员按照步骤二中的手势出现在双目相机的视野中；处理器根据左摄像机的图像利用级联方差分类器，基于随机森林的贝叶斯分类器，最近邻分类器检测得到目标；再对目标进行跟踪，将跟踪结果和检测结果进行融合，同时更新手势模板中的样本，在左摄像机图像中跟踪成功后，则在右相机图像的极线上进行检测和跟踪，如果左右视图同时跟踪成功则输出目标矩形框。Step 4, the operator appears in the field of view of the binocular camera according to the gesture in step 2; the processor uses the cascade variance classifier based on the image of the left camera, the Bayesian classifier based on random forest, and the nearest neighbor classifier to detect target; then track the target, fuse the tracking result with the detection result, and update the samples in the gesture template at the same time, after successful tracking in the left camera image, detect and track on the epipolar line of the right camera image, if the If the view is simultaneously tracked successfully, the target rectangle will be output.

步骤五、对目标矩形框的中心点进行跟踪；计算中心点从初始点移动到目标点的偏移距离，并输出速度控制指令，使机器人进行平移运动。Step 5. Track the center point of the target rectangular frame; calculate the offset distance of the center point from the initial point to the target point, and output a speed control command to make the robot perform translational motion.

步骤六、在目标矩形框中提取用于描述手势轮廓的特征点，求解特征点对应的旋转矩阵。Step 6: Extract the feature points used to describe the outline of the gesture in the target rectangular frame, and solve the rotation matrix corresponding to the feature points.

优选地，步骤一具体包括：Preferably, step one specifically includes:

步骤一一、设置双目相机中左摄像机和右摄像机的间距为20cm，且为水平放置。Step 11. Set the distance between the left camera and the right camera in the binocular camera to be 20cm, and place them horizontally.

步骤一二、对双目摄像机使用张正友标定法进行标定，并对左摄像机和右摄像机的视图消除畸变以及行对准，使得左右摄像机视图的成像原点坐标一致、光轴平行、成像平面共面、极线行对齐。Step 12: Calibrate the binocular camera using the Zhang Zhengyou calibration method, and eliminate distortion and line alignment for the views of the left and right cameras, so that the imaging origin coordinates of the left and right camera views are consistent, the optical axes are parallel, and the imaging planes are coplanar. Pole line alignment.

优选地，步骤二具体包括：Preferably, step two specifically includes:

步骤二一、操作人员在双目相机的视野范围内进行手势演示。Step 21, the operator performs a gesture demonstration within the field of view of the binocular camera.

步骤二二、在双目相机的左摄像机拍摄的视频中人工选取包含手势的矩形框。Step 22: Manually select a rectangular frame containing gestures from the video captured by the left camera of the binocular camera.

步骤二三、对选取的矩形框中的图像块进行缩放、旋转、仿射，并将缩放、旋转、仿射后的图像归一化为相同大小的图像块构成正样本集；并选取预定数量的在原图像中距离已选取的图像块的距离大于预定阈值的图像块构成负样本集；正样本集以及负样本集共同构成训练样本集。Step 23: Scale, rotate, and affine the image blocks in the selected rectangular frame, and normalize the scaled, rotated, and affine images into image blocks of the same size to form a positive sample set; and select a predetermined number The image blocks whose distance from the selected image block in the original image is greater than a predetermined threshold constitute a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.

优选地，步骤三具体包括：通过如下公式计算贝叶斯分类器的前景类后验概率：Preferably, step three specifically includes: calculating the foreground class posterior probability of the Bayesian classifier by the following formula:

其中y₁表示前景，当y₁＝0时表示图像中没有目标，y₁＝1时表示图像中包含目标；x_i表示图像的第i个特征；图像的每个特征为图像中任意选取的两个点的灰度值大小关系，灰度值大小关系用0或1表示。Among them, y ₁ represents the foreground, when y ₁ =0, it means that there is no target in the image, and when y ₁ =1, it means that the image contains the target; x _i represents the i-th feature of the image; each feature of the image is arbitrarily selected The relationship between the gray value of two points, the relationship between the gray value is represented by 0 or 1.

优选地，步骤四具体包括：步骤三中，Preferably, step four specifically includes: in step three,

贝叶斯分类器的个数为10；特征x_i对应的前景类样本的个数为#p，背景类样本的个数为#n，总样本个数为#m，则有：The number of Bayesian classifiers is 10; the number of foreground samples corresponding to feature x _i is #p, the number of background samples is #n, and the total number of samples is #m, then:

对每个贝叶斯分类器求取p(y₁|x_i)，并将结果取平均，若平均值大于预设的阈值，则认为该图像中存在目标。Calculate p(y ₁ | _xi ) for each Bayesian classifier, and average the results. If the average value is greater than the preset threshold, it is considered that there is a target in the image.

优选地，步骤三中，最近邻分类器用于计算两个图像块的相似度，计算公式为：Preferably, in step 3, the nearest neighbor classifier is used to calculate the similarity between two image blocks, and the calculation formula is:

式中μ₁,μ₂,σ₁,σ₂表示图像P1和P2的平均值和标准差；两个图像越相似则结果越接近1；定义两个图像的距离为：则当两个图像的距离小于预定的阈值时则认为图像片包含目标。In the formula, μ ₁ , μ ₂ , σ ₁ , σ ₂ represent the average value and standard deviation of images P1 and P2; the more similar the two images are, the closer the result is to 1; the distance between two images is defined as: Then when the distance between the two images is smaller than a predetermined threshold, the image slice is considered to contain the target.

优选地，步骤四具体为：Preferably, step four is specifically:

步骤四一、操作人员按照步骤二中的手势出现在双目相机的视野中。Step 41. The operator appears in the field of view of the binocular camera according to the gesture in step 2.

步骤四二、由操作人员手动选出初始矩形框。Step 42: The operator manually selects the initial rectangular frame.

步骤四二、处理器生成滑动矩形框，利用级联方差分类器过滤掉不符合方差阈值条件的矩形框，然后通过贝叶斯分类器筛选得到可能包含前景的图像块，再通过最近邻分类器计算滑动矩形框与手动选出的初始矩形框的相似度。Step 42: The processor generates a sliding rectangular frame, uses the cascade variance classifier to filter out the rectangular frames that do not meet the variance threshold condition, and then filters through the Bayesian classifier to obtain image blocks that may contain foreground, and then passes through the nearest neighbor classifier Computes the similarity between the sliding rectangle and the manually selected initial rectangle.

步骤四三、选出重叠度最高的矩形框作为样本矩形框，在样本矩形框内计算Shi-Tomasi角点作为特征点。Step 43: Select the rectangular frame with the highest overlapping degree as the sample rectangular frame, and calculate the Shi-Tomasi corner points in the sample rectangular frame as feature points.

步骤四四、计算样本矩形框内计算前向预测误差、后向预测误差和相似度，并筛选出小于前向预测误差与后向预测误差平均值且大于预设相似度阈值的特征点。Step 44: Calculate the forward prediction error, backward prediction error and similarity in the sample rectangle, and filter out the feature points that are smaller than the average value of the forward prediction error and backward prediction error and greater than the preset similarity threshold.

步骤四五、计算当前帧中筛选出的特征点与上一帧相应特征点的平均位移，得到当前帧目标框的位置，并根据特征点在上一帧与当前帧中的欧氏距离的比值，得到当前帧中目标框的大小。Step 4 and 5: Calculate the average displacement between the selected feature points in the current frame and the corresponding feature points in the previous frame to obtain the position of the target frame in the current frame, and according to the ratio of the Euclidean distance of the feature points in the previous frame to the current frame , get the size of the target box in the current frame.

步骤四六、将步骤四五中得到的目标框进行归一化处理，并计算归一化后的目标框与正样本集中所有图像的相似度，如果存在一个相似度大于指定的阈值，则跟踪有效，并将得到的目标框加入样本集，否则认为跟踪无效并丢弃。Step 46: Normalize the target frame obtained in step 45, and calculate the similarity between the normalized target frame and all images in the positive sample set. If there is a similarity greater than the specified threshold, track is valid, and the obtained target frame is added to the sample set, otherwise the tracking is considered invalid and discarded.

优选地，步骤五具体包括：Preferably, step five specifically includes:

步骤五一、将步骤四六得到的目标框的中心点作为手势中心点，利用立体视觉的原理视差测距法计算手势中心的空间坐标值，具体为：Step 51. Use the center point of the target frame obtained in step 46 as the center point of the gesture, and use the principle parallax distance measurement method of stereo vision to calculate the spatial coordinate value of the gesture center, specifically:

式中X，Y，Z为手势中心点在空间中的位置，u₁为标记球在左相机图像坐标系中的x坐标，u₀为左相机图像坐标系的x原点，u₂为标记球在右相机图像坐标系中的x坐标，d为两相机之间的平移距离，v₁为标记球在左相机图像坐标系中的y坐标，v₀为左相机图像坐标系的y原点，f为相机焦距。In the formula, X, Y, Z are the positions of the gesture center point in space, u ₁ is the x coordinate of the marker ball in the left camera image coordinate system, u ₀ is the x origin of the left camera image coordinate system, and u ₂ is the marker ball The x coordinate in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ is the y coordinate of the marker ball in the left camera image coordinate system, v ₀ is the y origin of the left camera image coordinate system, f is the focal length of the camera.

步骤五二、操作人员通过标定按钮在任意位置设置原点；处理器检测到手势中心点离开以预设的控制阈值为半径的球体时，则输出速度控制指令，计算公式为：Step 52. The operator sets the origin at any position through the calibration button; when the processor detects that the center point of the gesture leaves the sphere with the preset control threshold as the radius, it outputs a speed control command. The calculation formula is:

V＝kdV=kd

其中V为输出的速度控制指令，k为控制系数，d为手势中心偏离初始位置的距离；速度控制指令用于控制机器人末端进行平移运动。Among them, V is the output speed control command, k is the control coefficient, and d is the distance from the initial position of the gesture center; the speed control command is used to control the translational movement of the end of the robot.

优选地，步骤六具体包括：Preferably, step six specifically includes:

步骤六一、在步骤四六中得到目标框中通过基于肤色检测和背景差分法相结合的方法得到手势的轮廓，再通过凸包检测和凸包缺陷检测算法得到食指、中指和无名指以及食指中指凹陷处、中指无名指凹陷处共5个特征点；并通过步骤五一中的公式得到这5个特征点的空间坐标。Step 61: Obtain the outline of the gesture in the target frame by combining the method based on skin color detection and background difference method in step 46, and then obtain the index finger, middle finger and ring finger and the depression of the index finger and middle finger through the convex hull detection and convex hull defect detection algorithm There are altogether 5 feature points at the depression of the middle finger and ring finger; and the spatial coordinates of these 5 feature points are obtained through the formula in step 51.

步骤六二、在手掌上定义坐标系，以中指的根部为原点，定义指向中指指尖方向为y轴正方向，定义平行于两个凹陷处连线为x轴，指向小拇指的方向为x轴正方向。Step 62. Define the coordinate system on the palm, take the root of the middle finger as the origin, define the direction pointing to the fingertip of the middle finger as the positive direction of the y-axis, define the line parallel to the two depressions as the x-axis, and define the direction pointing to the little finger as the x-axis Positive direction.

步骤六三、根据Carley定理利用5个特征点求解旋转矩阵。Step 63: Solve the rotation matrix by using 5 feature points according to Carley's theorem.

步骤六四、将旋转矩阵转换为pitch-yaw-roll欧拉角，获取手势从当前姿态转换为原始状态过程中的相对旋转角度，根据相对旋转角度输出欧拉角角速度指令控制机器人的姿态变化。Step 64: Convert the rotation matrix into pitch-yaw-roll Euler angles, obtain the relative rotation angle during the conversion of the gesture from the current posture to the original state, and output the Euler angular velocity command to control the posture change of the robot according to the relative rotation angle.

优选地，步骤六三具体包括：Preferably, step 63 specifically includes:

步骤六三一、建立任意不含特征值-1的旋转矩阵R和一个反对称矩阵S_b之间有如下关系：Step 631, establishing any rotation matrix R that does not contain eigenvalue-1 and an antisymmetric matrix S _b has the following relationship:

R＝(I-S_b)^-1(I+S_b)R＝(IS _b ) ^-1 (I+S _b )

S_b＝(R+I)^-1(R-I)S _b ＝(R+I) ^-1 (RI)

式中I为单位矩阵，b＝(b₁,b₂,b₃)^T为Carley向量；其中b₁,b₂,b₃分别为Carley向量中的第一、第二、第三分量；且In the formula, I is the identity matrix, b=(b ₁ , b ₂ , b ₃ ) ^T is the Carley vector; where b ₁ , b ₂ , b ₃ are the first, second, and third components in the Carley vector; and

步骤六三二、设p_i为i号特征点在的空间坐标值，q_i为i号特征点在手掌坐标系中的坐标值。则求解旋转矩阵方程为：Step 632: Let p _i be the spatial coordinate value of the feature point i, and q _i be the coordinate value of the feature point i in the palm coordinate system. Then solve the rotation matrix equation as:

对上式做恒等变换，得到：Do identity transformation on the above formula to get:

式中：In the formula:

v_i＝p_i-q_i v _i =p _i -q _i

u_i＝p_i+q_i u _i =p _i +q _i

S_ui为u_i对应的反对称阵；S _ui is the antisymmetric matrix corresponding to u _i ;

可得方程：Available equations:

Ab＝cAb=c

式中：In the formula:

步骤六三三、解方程Ab＝c得到Carley向量，再计算出旋转矩阵R。Step 633: Solve the equation Ab=c to obtain the Carley vector, and then calculate the rotation matrix R.

本发明的有益效果为：1、位置和姿态指令不会相互干扰，控制空间和机器人的工作空间容易统一，使得操作很简便；2、对手的识别使用了特征点和旋转矩阵，受光照影响很小；3、离线示教的部分计算量小，算法不复杂；4、对机器人模型精度及机器人坐标系的确定的要求不高。The beneficial effects of the present invention are as follows: 1. The position and attitude instructions will not interfere with each other, and the control space and the working space of the robot are easily unified, which makes the operation very simple; 2. The identification of the opponent uses feature points and rotation matrices, which are greatly affected by the light. Small; 3. The calculation amount of the offline teaching part is small, and the algorithm is not complicated; 4. The requirements for the accuracy of the robot model and the determination of the robot coordinate system are not high.

附图说明Description of drawings

图1为本发明的手势机器人控制装置示意图；Fig. 1 is a schematic diagram of a gesture robot control device of the present invention;

图2为控制手势示意图，其中图2(a)为姿态控制模式的手势示意图；图2(b)为位置控制模式的手势示意图；Fig. 2 is a schematic diagram of control gestures, wherein Fig. 2(a) is a schematic diagram of gestures in attitude control mode; Fig. 2(b) is a schematic diagram of gestures in position control mode;

图3为本发明的基于双目视觉多手势机器人控制方法的流程图。FIG. 3 is a flow chart of the control method for a multi-gesture robot based on binocular vision of the present invention.

具体实施方式Detailed ways

本发明的基于双目视觉多手势机器人控制方法，是基于如图1所示的装置实现的，其中101为上位机，其中包括用于计算和控制机器人的处理器；102为喷漆机器人；103为双目摄像机的左摄像头，也可称为左摄像机，104为双目摄像机摄像机右摄像头，也可称为右摄像机。105为操作人员的手部。由一组双目相机平行放置作为手势的检测部分，通过计算机处理得到控制信号发送给机器人。操作人员只需保证手出现在两个相机视野中。The multi-gesture robot control method based on binocular vision of the present invention is realized based on the device shown in Figure 1, wherein 101 is a host computer, which includes a processor for computing and controlling the robot; 102 is a painting robot; 103 is The left camera of the binocular camera may also be called the left camera, and 104 is the right camera of the binocular camera, which may also be called the right camera. 105 is the operator's hand. A group of binocular cameras are placed in parallel as the gesture detection part, and the control signals are obtained through computer processing and sent to the robot. The operator only needs to keep the hand in the field of view of both cameras.

为了防止位置控制和姿态控制耦合，可以提前记录并学习两种手势，一种是位置的控制，一种是姿态的控制。本发明定义当手掌张开时进行姿态控制，如图2(a)所示；当拇指、食指和中指捏在一起时进行位置控制，如图2(b)所示。In order to prevent the coupling of position control and attitude control, two gestures can be recorded and learned in advance, one is position control and the other is attitude control. The present invention defines gesture control when the palm is open, as shown in Figure 2(a); position control when the thumb, index finger and middle finger are pinched together, as shown in Figure 2(b).

图3为基于双目视觉多手势机器人控制方法，具体包括：Figure 3 is a multi-gesture robot control method based on binocular vision, specifically including:

步骤一具体可以为：设置双目相机中左摄像机和右摄像机的间距为20cm，尽量摆放平行。利用张正友标定法得到相机的内参和外参，然后进行立体校正，分别对左右视图进行消除畸变和行对准，使得左右视图的成像原点坐标一致、两摄像头光轴平行、左右成像平面共面、对极线行对齐。Step 1 can specifically be: set the distance between the left camera and the right camera in the binocular camera to be 20cm, and place them as parallel as possible. Use the Zhang Zhengyou calibration method to obtain the internal and external parameters of the camera, and then perform stereo correction to eliminate distortion and line alignment for the left and right views, so that the imaging origin coordinates of the left and right views are consistent, the optical axes of the two cameras are parallel, and the left and right imaging planes are coplanar. Align the epipolar lines.

步骤二和步骤三是建立样本集以及训练过程，具体可以为：训练过程对贝叶斯分类器和最近邻分类器进行训练，将初始帧中鼠标选择的图像块进行缩放、旋转、仿射，最终归一化为相同大小的图形块作为正样本集，选取远离上述图像块的若干图像块作为负样本集，用此样本集训练最近邻分类器。根据此样本集，提取出2bitBP特征的正负样本集训练贝叶斯分类器，得到贝叶斯计算后验概率的公式。在下一步的跟踪模式中在线更新样本集，并迭代训练上述两个分类器。Step 2 and Step 3 are to establish the sample set and the training process, which can be specifically: the training process trains the Bayesian classifier and the nearest neighbor classifier, zooms, rotates, and affines the image block selected by the mouse in the initial frame, Finally, the image blocks normalized to the same size are used as positive sample sets, and several image blocks far away from the above image blocks are selected as negative sample sets, and the nearest neighbor classifier is trained with this sample set. According to this sample set, the positive and negative sample sets of 2bitBP features are extracted to train the Bayesian classifier, and the Bayesian formula for calculating the posterior probability is obtained. In the next step of the tracking mode, the sample set is updated online, and the above two classifiers are iteratively trained.

本装置使用级联分类器的方式进行手势的检测，包括方差分类器，基于随机森林的贝叶斯分类器和最近邻分类器。方差分类器指的是求出待检测滑动矩形框图像片的方差，因为跟踪目标区域的方差一般比背景区域方差大，因此通过方差过滤器大概能过滤掉绝大多数的扫描矩形框。随机森林中包含10个贝叶斯分类器。贝叶斯分类器选取的特征为2bitBP特征，2bitBP特征就是任意两个点的灰度值大小关系，取值只有0和1。将图像所属的类用y_i(i＝1,2)表示，课题中的检测问题可以看作一个分类问题，且只有两类，即前景类和背景类，可以令y₁＝0表示图像中没有目标，y₁＝1表示图像中包含目标。用x_i(i＝1,2,3,...,2¹³)表示图像的特征集，也就是上述的2bitBP特征。那么贝叶斯分类器得到前景类的后验概率为：The device uses cascaded classifiers to detect gestures, including variance classifiers, Bayesian classifiers based on random forests and nearest neighbor classifiers. The variance classifier refers to finding the variance of the image of the sliding rectangular frame to be detected, because the variance of the tracking target area is generally larger than the variance of the background area, so most of the scanning rectangular frames can be filtered out through the variance filter. The random forest contains 10 Bayesian classifiers. The feature selected by the Bayesian classifier is the 2bitBP feature. The 2bitBP feature is the relationship between the gray value of any two points, and the values are only 0 and 1. The class to which the image belongs is represented by y _i (i=1,2). The detection problem in the subject can be regarded as a classification problem, and there are only two classes, namely the foreground class and the background class. Let y ₁ =0 represent the There is no target, and y ₁ =1 indicates that the image contains a target. Use x _i (i=1,2,3,...,2 ¹³ ) to represent the feature set of the image, that is, the above-mentioned 2bitBP features. Then the Bayesian classifier obtains the posterior probability of the foreground class as:

在样本集中令x_i对应的前景类样本个数为#p，对应的背景类样本个数为#n，总样本个数为#m。则In the sample set, let the number of foreground samples corresponding to x _i be #p, the number of corresponding background samples be #n, and the total number of samples be #m. but

整个随机森林就有10个后验概率了，将10个后验概率进行平均，如果大于阈值，就认为该图像片含有前景目标。There are 10 posterior probabilities in the entire random forest, and the 10 posterior probabilities are averaged. If it is greater than the threshold, the image is considered to contain a foreground target.

最近邻分类器有两个功能，一是使用NCC算法依次匹配每一个图像块与在线模型的相似度，二是更新在线模型的正样本空间。为了比较图像块P₁和P₂，用NCC算法表征图像块的相似度。The nearest neighbor classifier has two functions, one is to use the NCC algorithm to sequentially match the similarity between each image block and the online model, and the other is to update the positive sample space of the online model. In order to compare the image blocks P ₁ and P ₂ , the NCC algorithm is used to characterize the similarity of the image blocks.

式中μ₁,μ₂,σ₁,σ₂表示图像P1和P2的平均值和标准差。两个图像越接近则结果越接近1。定义两个图像的距离为：In the formula, μ ₁ , μ ₂ , σ ₁ , σ ₂ represent the average value and standard deviation of images P1 and P2. The closer the two images are, the closer the result is to 1. Define the distance between two images as:

若距离小于阈值则认为图像片包含目标。If the distance is less than the threshold, the image slice is considered to contain the target.

在左摄像机视频中用鼠标选取矩形框选中手势，本装置将训练得到方差分类器的阈值、贝叶斯分类器的参数并保存最近邻分类器的模板，操作人员此时必须保持手势不变，可以适当地移动和多角度旋转手部，模拟控制过程中可能出现的旋转角度，学习完成后点击保存，本装置可以将多尺度多变换的模板保存。Use the mouse to select the gesture in the left camera video, and the device will train the threshold of the variance classifier and the parameters of the Bayesian classifier and save the template of the nearest neighbor classifier. The operator must keep the gesture unchanged at this time. It can properly move and rotate the hand at multiple angles to simulate possible rotation angles during the control process. After the learning is completed, click Save, and the device can save the multi-scale and multi-transformation template.

步骤四是跟踪步骤，其目的是在摄像机拍摄到的画面中，先识别出手部所在的矩形框，再根据手部的运动轨迹，具体为：操作人员按照图2中的手势出现在摄像机的视野中，本装置则自动进行检测，利用训练出的级联方差分类器，基于随机森林的贝叶斯分类器，最近邻分类器检测得到目标(即识别出手部所在矩形框的过程)。当检测得到目标后进入跟踪和检测的循坏(即跟踪手部运动轨迹的过程)：Step 4 is the tracking step, the purpose of which is to first identify the rectangular frame where the hand is located in the picture captured by the camera, and then according to the movement trajectory of the hand, specifically: the operator appears in the camera's field of view according to the gesture in Figure 2 Among them, the device automatically detects, using the trained cascade variance classifier, Bayesian classifier based on random forest, and the nearest neighbor classifier to detect the target (that is, the process of identifying the rectangular frame where the hand is located). When the target is detected, it enters the cycle of tracking and detection (that is, the process of tracking the trajectory of the hand):

使用金字塔LK光流法对目标框中的Shi-Tomasi角点进行跟踪，为了优化跟踪效果，使用前向后向误差和NCC算法将一部分跟踪不好的点去除。跟踪流程如下：Use the pyramid LK optical flow method to track the Shi-Tomasi corner points in the target frame. In order to optimize the tracking effect, use the forward and backward error and NCC algorithm to remove some of the poorly tracked points. The tracking process is as follows:

1.根据初始选择的矩形框，在另一个摄像头中使用NCC算法进行模板匹配得到另一个摄像头中的初始矩形框。1. According to the initially selected rectangular frame, use the NCC algorithm to perform template matching in another camera to obtain the initial rectangular frame in another camera.

2.进入跟踪循环，在生成的滑动矩形框中，找出目标跟踪框重叠度最高的那个矩形框作为最佳跟踪样本，然后在此矩形框中计算Shi-Tomasi角点作为特征点。2. Enter the tracking loop. In the generated sliding rectangles, find the rectangle with the highest overlapping degree of the target tracking frame as the best tracking sample, and then calculate the Shi-Tomasi corner points in this rectangle as the feature points.

3.前向和后向误差和匹配相似度，筛选出满足条件的点(这些点满足小于给定平均前后向误差阔值的且大于指定匹配相似度阔值的特征点)，结束后会过滤掉一半左右的特征点。3. Forward and backward errors and matching similarity, filter out the points that meet the conditions (these points meet the feature points that are less than the given average forward and backward error threshold and greater than the specified matching similarity threshold), and will be filtered after the end Drop about half of the feature points.

4.用剩余的特征点来预测目标框在当前帧中的大小和位置，根据跟踪成功的特征点与上一帧对应的特征点的平均平移，得到当前帧目标框的位置，同时根据这些特征点在前后两帧图像中对应欧式距离的比值，得到当前帧中目标框的大小。如果位置超出图像则无效。4. Use the remaining feature points to predict the size and position of the target frame in the current frame. According to the average translation of the tracked feature points and the feature points corresponding to the previous frame, the position of the target frame in the current frame is obtained. At the same time, according to these features The ratio of the corresponding Euclidean distance between the points in the two frames of images before and after to get the size of the target frame in the current frame. Has no effect if the position exceeds the image.

5.计算归一化的图像片到在线模型的相似度，如果相似度大于指定的阀值，则最终认为此次跟踪时有效的，存储到正样本集。否则就认为是无效的依然丢弃。5. Calculate the similarity between the normalized image slice and the online model. If the similarity is greater than the specified threshold, the tracking is finally considered valid and stored in the positive sample set. Otherwise, it is considered invalid and still discarded.

6.返回步骤2。6. Return to step 2.

左右摄像头中根据初始帧中目标的位置并行进行跟踪计算，使用分类器进行检测的循坏和上述跟踪循坏是并行计算的，在左摄像头得到图像中对整幅图像进行搜索检测，将跟踪结果和检测结果进行融合得到最终结果，同时更新手势模板中的样本，在左摄像机图像中跟踪检测成功后，则在右相机图像的对应的极线上进行检测，并与右摄像机的跟踪结果融合，得到最终结果，这样可以简化检测算法的计算量。如果左右视图同时跟踪成功则输出目标矩形框。The left and right cameras perform tracking calculations in parallel according to the position of the target in the initial frame. The detection cycle using the classifier and the above tracking cycle are calculated in parallel. The entire image is searched and detected in the image obtained by the left camera, and the tracking results are Fusion with the detection results to get the final result, and update the samples in the gesture template at the same time. After the tracking and detection in the left camera image is successful, it will be detected on the corresponding epipolar line of the right camera image and fused with the tracking result of the right camera. The final result can be obtained, which can simplify the calculation amount of the detection algorithm. If the left and right views are tracked successfully at the same time, the target rectangle will be output.

步骤五是确定步骤四识别出的手势的终点位置空间坐标与初始位置空间坐标，然后确定应当让机器人如何运动，具体为：Step five is to determine the space coordinates of the end position and the initial position of the gesture recognized in step four, and then determine how the robot should move, specifically:

操作人员按照图2(b)右侧中的手势出现在摄像机的视野中，指向本装置按照步骤四的方法进行检测跟踪。一直跟踪的矩形框的中心即可视为控制点。The operator appears in the field of view of the camera according to the gesture on the right side of Figure 2(b), points to the device and performs detection and tracking according to the method of step 4. The center of the rectangular box that is being tracked can be regarded as the control point.

利用立体视觉的原理视差测距法计算手势中心的空间坐标值。公式如下：The spatial coordinate value of the gesture center is calculated by using the parallax ranging method based on the principle of stereo vision. The formula is as follows:

式中X，Y，Z为手势中心在空间中的位置，u₁为标记球在左相机图像坐标系中的x坐标，u₀为左相机图像坐标系的x原点，u₂为标记球在右相机图像坐标系中的x坐标，d为两相机之间的平移距离，v₁为标记球在左相机图像坐标系中的y坐标，v₀为左相机图像坐标系的y原点，f为相机焦距。In the formula, X, Y, Z are the positions of the gesture center in space, u ₁ is the x coordinate of the marker ball in the left camera image coordinate system, u ₀ is the x origin of the left camera image coordinate system, u ₂ is the marker ball in The x coordinate in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ is the y coordinate of the marker ball in the left camera image coordinate system, v ₀ is the y origin of the left camera image coordinate system, f is Camera focal length.

鼠标点击标定按钮后则以此点作为原点，当控制点离开以控制阈值为半径的球体时，则输出速度指令，大小与偏移距离成正比，计算公式如下After the mouse clicks the calibration button, this point is used as the origin. When the control point leaves the sphere with the control threshold as the radius, the speed command is output, and the size is proportional to the offset distance. The calculation formula is as follows

V＝kdV=kd

其中V为输出的控制指令，k为控制系数，d为手势中心偏离初始位置的距离。控制机器人末端进行平移运动。Among them, V is the output control command, k is the control coefficient, and d is the distance from the initial position of the gesture center. Control the end of the robot for translational motion.

步骤六是确定步骤四识别出的手势的初始姿态与终点姿态，然后确定应当让机器人如何调整姿态。姿态通过旋转矩阵或欧拉角表示。步骤六具体为：Step six is to determine the initial posture and end posture of the gesture recognized in step four, and then determine how the robot should adjust the posture. The pose is represented by a rotation matrix or Euler angles. Step six is specifically:

操作人员按照图2左侧中的手势出现在摄像机的视野中，本装置按照步骤3的方法进行检测跟踪。在目标矩形框中利用背景建模法得到手势的轮廓，运用凸包检测和凸包缺陷检测算法得到食指、中指和无名指和相应的手指相连凹陷处，一共得到五个特征点。根据步骤4中的定位方法得到这五个特征点的空间坐标。The operator appears in the field of view of the camera according to the gesture on the left side of Figure 2, and the device performs detection and tracking according to the method in step 3. In the target rectangular frame, the contour of the gesture is obtained by using the background modeling method, and the convex hull detection and convex hull defect detection algorithms are used to obtain the depressions where the index finger, middle finger and ring finger are connected to the corresponding fingers, and a total of five feature points are obtained. The spatial coordinates of these five feature points are obtained according to the positioning method in step 4.

使用基于肤色检测和背景差分法相结合的方法进行人手分割，其中肤色检测将颜色空间从RGB空间转换到HSV空间得到更好的分割效果，为了解决肤色受光照的影响，同时使用基于混合高斯模型的背景差分法，得到更完备的分割效果。得到人手的二值图像之后使用数学形态学中的开操作和轮廓面积检测过滤噪声。然后使用Graham扫描法求得人手轮廓的凸点，选择最上端的三个即可得到食指、中指和无名指的位置。然后计算在相邻的凸点之间距离两个凸点最远的点即为手指相连凹陷处，一共得到五个特征点。根据步骤4中的定位方法得到这五个特征点的空间坐标。Use the method based on the combination of skin color detection and background difference method for human hand segmentation, in which skin color detection converts the color space from RGB space to HSV space to get better segmentation results. The background subtraction method can obtain a more complete segmentation effect. After obtaining the binary image of the human hand, use the open operation and contour area detection in mathematical morphology to filter the noise. Then use the Graham scanning method to obtain the convex points of the human hand contour, and select the top three to get the positions of the index finger, middle finger and ring finger. Then calculate the point farthest from the two convex points between the adjacent convex points, which is the depression where the fingers are connected, and a total of five feature points are obtained. The spatial coordinates of these five feature points are obtained according to the positioning method in step 4.

按照2左侧的手势在手掌上定义的坐标系，以中指根部为原点，定义指向中指指尖方向为y轴正方向，定义平行于两个凹陷处连线在x轴，指向小拇指为x轴正方向。According to the coordinate system defined on the palm of the left hand gesture in 2, take the root of the middle finger as the origin, define the direction pointing to the fingertip of the middle finger as the positive direction of the y-axis, define the line parallel to the two depressions on the x-axis, and point to the little finger as the x-axis Positive direction.

根据Carley定理求解旋转矩阵，旋转矩阵的Carley向量表示法为：任何不含特征值‐1的旋转矩阵和一个反对称矩阵之间有如下关系：According to the Carley theorem to solve the rotation matrix, the Carley vector representation of the rotation matrix is: any rotation matrix without eigenvalue-1 and an anti-symmetric matrix have the following relationship:

R＝(I-S_b)^-1(I+S_b)R＝(IS _b ) ^-1 (I+S _b )

S_b＝(R+I)^-1(R-I)S _b ＝(R+I) ^-1 (RI)

式中I为单位矩阵，b＝(b₁,b₂,b₃)^T为Carley向量。In the formula, I is the unit matrix, b=(b ₁ , b ₂ , b ₃ ) ^T is the Carley vector.

设p_i为i号特征点在的空间坐标值，q_i为i号特征点在手掌坐标系中的坐标值。则求解旋转矩阵方程为：Let p _i be the space coordinate value of feature point i, and q _i be the coordinate value of feature point i in the palm coordinate system. Then solve the rotation matrix equation as:

上式可以转换为：The above formula can be transformed into:

式中:In the formula:

v_i＝p_i-q_i v _i =p _i -q _i

u_i＝p_i+q_i u _i =p _i +q _i

S_ui为u_i对应的反对称阵。S _ui is the antisymmetric matrix corresponding to u _i .

可得方程：Available equations:

Ab＝cAb=c

式中：In the formula:

解方程可以得到Carley向量，继而计算出旋转矩阵R。然后将旋转矩阵转化成pitch-yaw-roll欧拉角。鼠标点击标定按钮后以当前姿态为原始姿态，根据相对旋转角度输出欧拉角角速度指令控制机器人姿态变化。The Carley vector can be obtained by solving the equation, and then the rotation matrix R can be calculated. Then convert the rotation matrix into pitch-yaw-roll Euler angles. After clicking the calibration button with the mouse, the current posture is taken as the original posture, and the Euler angular velocity command is output according to the relative rotation angle to control the posture change of the robot.

最后，若操作人员将手移开视野，或者做出不是模板中的手势，则控制结束。重复步骤五、步骤六则可以继续控制。Finally, if the operator moves his hand out of view, or makes a gesture that is not in the template, control ends. Repeat steps five and six to continue the control.

<实施例><Example>

本发明的一个实施例的具体过程为：The concrete process of an embodiment of the present invention is:

(1)放置双目相机尽量保持平行，使用张正友棋盘格标定法进行立体标定和校正。(1) Place the binocular cameras as parallel as possible, and use Zhang Zhengyou's checkerboard calibration method for stereo calibration and correction.

(2)进入学习模板模式，在左摄像机视频中用鼠标选取矩形框选中手势，本装置将开始学习并保存此模板，操作人员此时必须保持手势不变，可以适当地移动和多角度旋转手部，模拟控制过程中可能出现的旋转角度，学习完成后点击保存，本装置可以将多尺度多变换的模板保存。如果需要多个不同的手势模板，则可以重复上述步骤。(2) Enter the learning template mode, use the mouse to select the gesture in the left camera video, and the device will start to learn and save the template. The operator must keep the gesture unchanged at this time, and can properly move and rotate the hand at multiple angles part, simulate the rotation angle that may occur during the control process, click save after learning, this device can save the multi-scale and multi-transformation template. If multiple different gesture templates are required, the above steps can be repeated.

(3)双目视觉跟踪检测，操作人员按照步骤2中的手势出现在摄像机的视野中，本装置则自动进行检测，通过级联方差分类器，集合分类器，最近邻分类器检测得到目标后使用金字塔LK光流法对目标框中的Shi-Tomasi角点进行跟踪，并将跟踪结果和检测结果进行融合，同时更新手势模板中的样本，在左摄像机图像中跟踪成功后，则在右相机图像的极线上进行检测和跟踪，如果左右视图同时跟踪成功则输出目标矩形框。(3) Binocular vision tracking detection, the operator appears in the field of view of the camera according to the gesture in step 2, and the device automatically detects, after the target is detected by the cascade variance classifier, set classifier, and nearest neighbor classifier Use the pyramid LK optical flow method to track the Shi-Tomasi corner points in the target frame, and fuse the tracking results with the detection results, and update the samples in the gesture template. After successful tracking in the left camera image, the right camera The epipolar line of the image is detected and tracked, and if the left and right views are tracked successfully at the same time, the target rectangle is output.

(4)在位置控制模式中，操作人员按照步骤2中的手势出现在摄像机的视野中，本装置按照步骤3的方法进行检测跟踪。一直跟踪的矩形框的中心即可视为控制点。利用立体视觉三维重建原理得到手势中心的空间坐标值。鼠标点击标定按钮后则以此点作为原点，当控制点离开以控制阈值为半径的球体时，则输出速度指令，大小与偏移距离成正比：V＝kd。(4) In the position control mode, the operator appears in the field of view of the camera according to the gesture in step 2, and the device performs detection and tracking according to the method in step 3. The center of the rectangular box that is being tracked can be regarded as the control point. The spatial coordinate value of the gesture center is obtained by using the three-dimensional reconstruction principle of stereo vision. After the mouse clicks the calibration button, this point is used as the origin. When the control point leaves the sphere with the control threshold as the radius, the speed command is output, and the size is proportional to the offset distance: V=kd.

(5)在姿态控制模式中，操作人员按照步骤2中的手势出现在摄像机的视野中，本装置按照步骤3的方法进行检测跟踪。在目标矩形框中利用背景建模法得到手势的轮廓，运用凸包检测和凸包缺陷检测算法得到食指、中指和无名指和相应的手指相连凹陷处。提前在手掌上定义的坐标系，根据Carley定理求解旋转矩阵，然后转化成pitch-yaw＝roll欧拉角。鼠标点击标定按钮后以当前姿态为原始姿态，根据相对旋转角度输出欧拉角角速度指令控制机器人姿态变化。(5) In the attitude control mode, the operator appears in the field of view of the camera according to the gesture in step 2, and the device performs detection and tracking according to the method in step 3. In the target rectangular frame, the contour of the gesture is obtained by the background modeling method, and the index finger, middle finger and ring finger are connected with the corresponding fingers by using the convex hull detection and convex hull defect detection algorithms. The coordinate system defined on the palm in advance, solves the rotation matrix according to Carley's theorem, and then converts it into pitch-yaw=roll Euler angles. After clicking the calibration button with the mouse, the current posture is taken as the original posture, and the Euler angular velocity command is output according to the relative rotation angle to control the posture change of the robot.

(6)操作人员将手移开视野，或者做出不是模板中的手势，则控制结束。重复步骤3、步骤4则可以继续控制。(6) The operator moves his hand away from the field of vision, or makes a gesture that is not in the template, then the control ends. Repeat steps 3 and 4 to continue the control.

本发明还可有其它多种实施例，在不背离本发明精神及其实质的情况下，本领域技术人员当可根据本发明作出各种相应的改变和变形，但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The present invention can also have other various embodiments, without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and deformations according to the present invention, but these corresponding changes and deformations are all Should belong to the scope of protection of the appended claims of the present invention.

Claims

1. A multi-gesture robot control method based on binocular vision is characterized by comprising

Step one, setting a binocular camera, and calibrating and correcting;

secondly, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into a training sample set;

step three, training a nearest neighbor classifier and a Bayes classifier by using a training sample set;

step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, and a Bayes classifier based on a random forest and a nearest neighbor classifier are used for detecting to obtain a target; tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the polar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views;

step five, tracking the central point of the target rectangular frame; calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion;

and step six, extracting characteristic points for describing the gesture outline from the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points to enable the robot to perform posture conversion.

2. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the first step specifically comprises:

step one, setting the distance between a left camera and a right camera in a binocular camera to be 20cm, and horizontally placing the left camera and the right camera;

and step two, calibrating the binocular camera by using a Zhang Zhengyou calibration method, and eliminating distortion and line alignment of the views of the left camera and the right camera so as to enable the imaging origin coordinates of the views of the left camera and the right camera to be consistent, the optical axes to be parallel, the imaging planes to be coplanar and the polar lines to be aligned.

3. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the second step specifically comprises:

secondly, performing gesture demonstration by an operator in the visual field range of the binocular camera;

secondly, manually selecting a rectangular frame containing gestures from a video shot by a left camera of a binocular camera;

secondly, scaling, rotating and affine processing are carried out on the image blocks in the selected rectangular frame, and the scaled, rotated and affine images are normalized into image blocks with the same size to form a positive sample set; selecting a preset number of image blocks in the original image, wherein the distance between the image blocks in the original image and the selected image blocks is greater than a preset threshold value to form a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.

4. The binocular vision-based multi-gesture robot control method of claim 1, wherein in step three,

calculating the posterior probability of the foreground class of the Bayesian classifier by the following formula:

wherein y is ₁ Represents the foreground when y ₁ When =0, y represents that there is no object in the image ₁ 1, indicating that the image contains the target; x is the number of _i An ith feature representing an image; each characteristic of the image is the gray value size relation of two randomly selected points in the image, and the gray value size relation is represented by 0 or 1.

5. The binocular vision-based multi-gesture robot control method of claim 4, wherein in step three,

the number of the Bayesian classifiers is 10; characteristic x _i The corresponding number of positive samples is # p, the number of negative samples is # n, and the total number of samples is # m, then:

solving for p (y) for each Bayesian classifier ₁ |x _i ) And averaging the results, and if the average value is larger than a preset threshold value, determining that the target exists in the image.

6. The binocular vision-based multi-gesture robot control method according to claim 1 or 4, wherein in step three, the nearest neighbor classifier is used for calculating the similarity of two image blocks, and the calculation formula is as follows:

in the formula of ₁ ,μ ₂ ,σ ₁ ,σ ₂ Represents the mean and standard deviation of images P1 and P2; the more similar the two images, the closer the result is to 1; the distance between the two images is defined as:the image slice is considered to contain the object when the distance between the two images is less than a predetermined threshold.

7. The binocular vision based multi-gesture robot control method according to claim 6, wherein the fourth step is specifically:

fourthly, the operator appears in the visual field of the binocular camera according to the gesture in the second step;

step two, manually selecting an initial rectangular frame by an operator;

step two, the processor generates a sliding rectangular frame, a cascade variance classifier is used for filtering out the rectangular frame which does not meet the variance threshold condition, then an image block which possibly contains the foreground is obtained through screening of a Bayesian classifier, and then the similarity between the sliding rectangular frame and the manually selected initial rectangular frame is calculated through a nearest neighbor classifier;

selecting a rectangular frame with the highest overlapping degree as a sample rectangular frame, and calculating Shi-Tomasi angular points in the sample rectangular frame as characteristic points;

fourthly, calculating a forward prediction error, a backward prediction error and a similarity in a sample rectangular frame, and screening out feature points which are smaller than the average value of the forward prediction error and the backward prediction error and are larger than a preset similarity threshold;

step four, calculating the average displacement of the feature points screened from the current frame and the corresponding feature points of the previous frame to obtain the position of the target frame of the current frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distance between the feature points of the previous frame and the current frame;

and step four, carrying out normalization processing on the target frame obtained in the step four, calculating the similarity between the normalized target frame and all images in the positive sample set, if one similarity is greater than a specified threshold value, effectively tracking, adding the obtained target frame into the sample set, and if not, considering that the tracking is invalid and discarding.

8. The binocular vision-based multi-gesture robot control method according to claim 7, wherein the step five specifically comprises:

fifthly, taking the center point of the target frame obtained in the fourth step and the sixth step as a gesture center point, and calculating a space coordinate value of the gesture center by using a stereoscopic vision principle parallax ranging method, wherein the method specifically comprises the following steps:

where X, Y, Z are the positions of the gesture center points in space, u ₁ For marking the x-coordinate, u, of the sphere in the left camera image coordinate system ₀ Is the x origin, u, of the left camera image coordinate system ₂ Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ For marking the y-coordinate, v, of the sphere in the left camera image coordinate system ₀ Is the y origin of the left camera image coordinate system, and f is the camera focal length;

step two, an operator sets an origin at any position through a calibration button; when the processor detects that the gesture center point leaves a sphere with a preset control threshold value as a radius, a speed control instruction is output, and the calculation formula is as follows:

V＝kd

v is an output speed control instruction, k is a control coefficient, and d is the distance of the center of the gesture deviating from the initial position; the speed control command is used for controlling the robot to perform translational motion.

9. The binocular vision based multi-gesture robot control method according to claim 8, wherein the sixth step specifically includes:

sixthly, obtaining the outline of the gesture in the target frame obtained in the step IV by a method based on combination of skin color detection and a background difference method, and obtaining 5 feature points of the index finger, the middle finger, the ring finger, the dent of the index finger and the ring finger and the dent of the middle finger and the ring finger by a convex hull detection algorithm and a convex hull defect detection algorithm; obtaining the space coordinates of the 5 characteristic points through the formula in the fifth step;

defining a coordinate system on the palm, taking the root of the middle finger as an origin, defining the direction pointing to the fingertip of the middle finger as the positive direction of the y axis, defining a connecting line parallel to the two dents as the x axis, and defining the direction pointing to the little finger as the positive direction of the x axis;

sixthly, solving a rotation matrix by utilizing 5 characteristic points according to Carley theorem;

and sixthly, converting the rotation matrix into a pitch-yaw-roll Euler angle, acquiring a relative rotation angle of the gesture in the process of converting the current gesture into the original state, and outputting an Euler angular speed instruction according to the relative rotation angle to control the gesture change of the robot.