CN106200971A

CN106200971A - Man-machine interactive system device based on gesture identification and operational approach

Info

Publication number: CN106200971A
Application number: CN201610553998.7A
Authority: CN
Inventors: 梁鹏; 郑振兴; 林泽芳; 吴玉婷; 余经烈
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2016-07-07
Filing date: 2016-07-07
Publication date: 2016-12-07

Abstract

The invention discloses a human-computer interaction system device and an operation method based on gesture recognition. The system device includes a monocular camera, a PC installed with a human-computer interaction program, a PLC board, and physical application equipment; the operation method includes gesture input. module, fingertip positioning module, gesture tracking module and keyboard and mouse mapping module. The present invention adopts the human-computer interaction scheme based on gesture recognition, solves the limitation problem of the existing human-computer interaction mode that requires a large number of hardware devices to sense, and makes the human-computer interaction more natural and convenient.

Description

Human-computer interaction system device and operation method based on gesture recognition

技术领域technical field

本发明涉及一种人机交互系统装置及操作方法，特别是涉及基于手势识别的人机交互系统装置及操作方法。The invention relates to a human-computer interaction system device and an operating method, in particular to a gesture recognition-based human-computer interaction system device and an operating method.

背景技术Background technique

随着科学技术的迅猛发展与计算机视觉的日益普及，人们对人机交流自然性的要求越来越高，一切传统的人机交流方式，如鼠标、键盘、麦克风等日益满足不了人们的需求。将人手作为计算机与人之间的交流手段，相比较起其他的交流手段更加自然、简洁、丰富和直接，故能够提供手势识别的计算机会使人机之间的交流更加的自然方便。With the rapid development of science and technology and the increasing popularity of computer vision, people have higher and higher requirements for the naturalness of human-computer communication. All traditional human-computer communication methods, such as mouse, keyboard, microphone, etc., are increasingly unable to meet people's needs. Using human hands as a means of communication between computers and humans is more natural, concise, rich, and direct than other means of communication. Therefore, a computer that can provide gesture recognition will make communication between humans and machines more natural and convenient.

目前的手势识别系统多采用以下两种：The current gesture recognition system mostly uses the following two types:

(1)数据手套或佩戴物：此方式可降低检测和识别算法的复杂性，但佩戴式的操作方式显然难以满足自然的人机交互的需要；(1) Data gloves or wearables: This method can reduce the complexity of detection and recognition algorithms, but the wearable operation method is obviously difficult to meet the needs of natural human-computer interaction;

(2)基于3D深度相机：3D扫描设备体积较大，硬件成本较高，所需运算能力更高，难以集成并应用到大众化的智能终端上。(2) Based on 3D depth camera: 3D scanning equipment is large in size, high in hardware cost, and requires higher computing power, so it is difficult to integrate and apply to popular smart terminals.

其次，传统的掌间定位算法有个缺陷：当手成水平状态时，无法准确定位到其中某个掌间点，导致了手势识别的局限性。Secondly, the traditional palm positioning algorithm has a flaw: when the hand is in a horizontal state, it cannot accurately locate one of the palm points, which leads to the limitation of gesture recognition.

申请公布号为CN105138136A的发明专利申请公开了一种“手势识别装置、手势识别方法及手势识别系统”，该识别系统包括有一种手势识别装置，手势识别装置包括设置在与手指位置相应的至少一个传感器，输入方式判断单元和输入内容生成单元。该发明专利申请实现了虚拟人机交互，可得到快速准确的模拟虚拟内容，但其装置需要设置传感器用于检测识别，所需硬件多且成本较高。The invention patent application with the application publication number CN105138136A discloses a "gesture recognition device, gesture recognition method and gesture recognition system". The recognition system includes a gesture recognition device. The gesture recognition device includes at least one A sensor, an input mode judging unit and an input content generating unit. This invention patent application realizes virtual human-computer interaction, which can obtain fast and accurate simulated virtual content, but its device needs to be equipped with sensors for detection and recognition, which requires a lot of hardware and is expensive.

申请公布号为CN104992171A的发明专利申请公开了一种“一种基于2D视频序列的手势识别及人机交互方法和系统”，该系统主要涉及一种人机交互方法，通过构建人手的联合特征模型对2D摄像头运动前景下人手的姿态和手势进行识别。的该发明专利申请实现在复杂背景下的目标人手甄选，并实现对人手的精确度、高稳定性的跟踪，但需遍历提取联合特征模型，与样本库的样本匹配，模块及过程多，较为繁琐耗时，且没有具体精确定位到指尖。The invention patent application with the application publication number CN104992171A discloses a "a method and system for gesture recognition and human-computer interaction based on 2D video sequences". Recognize the gestures and gestures of human hands in the foreground of 2D camera motion. The patent application for this invention realizes the selection of target personnel in complex backgrounds, and realizes the accurate and high-stability tracking of personnel, but it needs to traverse and extract the joint feature model to match the samples in the sample library. There are many modules and processes, which are relatively complex. It is cumbersome and time-consuming, and there is no specific and precise positioning to the fingertips.

申请公布号为CN105045398A的发明专利申请公开了一种“一种基于手势识别的虚拟现实交互设备”，该设备包括3D摄像头接口、头盔式虚拟现实显示器、信号处理组件以及移动设备接口。该发明专利申请可捕获含有深度信息的使用者手部的待测图像序列并经过处理识别，实现虚拟现实交互，但使用者需佩戴头盔使用，难以满足自然的人机交互的需求。The invention patent application with the application publication number CN105045398A discloses a "virtual reality interactive device based on gesture recognition", which includes a 3D camera interface, a helmet-mounted virtual reality display, a signal processing component, and a mobile device interface. The invention patent application can capture the image sequence of the user's hand containing depth information and process and recognize it to realize virtual reality interaction. However, the user needs to wear a helmet to use it, which is difficult to meet the needs of natural human-computer interaction.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于手势识别的人机交互系统装置，该装置采用基于单目摄像头下捕捉的视频进行手势跟踪的方法，通过人机交互程序对用户手势的识别处理，最终实现对物理应用设备的操控。The object of the present invention is to overcome the deficiencies in the prior art, and provide a human-computer interaction system device based on gesture recognition. The identification process of the system finally realizes the manipulation of the physical application equipment.

本发明的另一个目的在于提供一种应用上述基于手势识别的人机交互系统装置的操作方法，提出掌间点计算的解决方法，并采用Camshift算法，实现对键盘鼠标的模拟，解决现有的需用大量硬件设备感应的人机交互方式的局限性问题，使人机交互更加自然、方便，并较好解决了手势多义性、多样性带来的难题。Another object of the present invention is to provide an operation method for applying the above-mentioned human-computer interaction system device based on gesture recognition, propose a solution for palm point calculation, and use the Camshift algorithm to realize the simulation of the keyboard and mouse, and solve the existing problems. The limitations of the human-computer interaction method that needs to be sensed by a large number of hardware devices make the human-computer interaction more natural and convenient, and better solve the problems caused by the ambiguity and diversity of gestures.

本发明解决上述技术问题的技术方案为：The technical scheme that the present invention solves the problems of the technologies described above is:

一种基于手势识别的人机交互系统装置及操作方法，其特征在于，其中：A human-computer interaction system device and operation method based on gesture recognition, characterized in that:

所述系统装置包括单目摄像头、安装有人机交互程序的PC机、PLC板和物理应用设备。其中，所述的单目摄像头用于捕获用户的手势图像；所述的PC机通过USB接口获取单目摄像头捕获的手势图像，其中的人机交互程序将对图像进行识别、分析及处理；所述的PLC板通过串口或USB获取经PC机处理后的手势信息；所述的物理应用设备通过电路获取信息并对信息进行反馈实施。The system device includes a monocular camera, a PC installed with a man-machine interaction program, a PLC board and physical application equipment. Wherein, the monocular camera is used to capture the gesture image of the user; the PC obtains the gesture image captured by the monocular camera through the USB interface, and the human-computer interaction program will identify, analyze and process the image; The above-mentioned PLC board obtains the gesture information processed by the PC through the serial port or USB; the above-mentioned physical application device obtains the information through the circuit and performs feedback on the information.

所述系统操作方法包括手势输入模块、指尖定位模块、手势跟踪模块和键鼠映射模块，其中：The system operation method includes a gesture input module, a fingertip positioning module, a gesture tracking module and a keyboard and mouse mapping module, wherein:

所述的手势输入模块用于单目摄像头捕获用户手势，并将捕获到的手势图像输入指尖定位模块；The gesture input module is used to capture user gestures with a monocular camera, and input the captured gesture images into the fingertip positioning module;

所述的指尖定位模块用于定位手势图像中的指尖位置，并根据此识别出用户手势的类型和手势的位置，作为手势跟踪模块和键鼠映射模块的输入；The fingertip positioning module is used to locate the fingertip position in the gesture image, and recognize the type of user gesture and the position of the gesture based on this, as the input of the gesture tracking module and the keyboard and mouse mapping module;

所述的手势跟踪模块用于追踪用过户手势的移动轨迹，并获取手势的移动量，作为键鼠映射模块的输入；The gesture tracking module is used to track the moving trajectory of the user transfer gesture, and obtain the movement amount of the gesture as the input of the keyboard and mouse mapping module;

所述的键鼠映射模块用于将用户手势的类型以及手势移动量识别为对应的键盘和鼠标的操作，并据此进行计算机控制。The keyboard and mouse mapping module is used to recognize the type of user gestures and the amount of gesture movement as corresponding keyboard and mouse operations, and perform computer control accordingly.

本发明的基于手势识别的人机交互系统操作方法中的指尖定位模块，包括以下步骤：The fingertip positioning module in the gesture recognition-based human-computer interaction system operating method of the present invention comprises the following steps:

(1)摄像头载入图像：调出摄像头载入图像；(1) Camera loading image: call out the camera loading image;

(2)图像预处理：对图像进行色彩空间转换、肤色阈值处理、图像去噪处理、图像二值处理及开运算处理；(2) Image preprocessing: perform color space conversion, skin color threshold processing, image denoising processing, image binary processing and open operation processing on the image;

(3)轮廓与指尖寻找：基于轮廓上的掌间点和指尖点的寻找；(3) Contour and fingertip search: based on the palm point and fingertip point search on the contour;

(4)掌间定位和指尖过滤：通过指尖定位进行掌间定位计算，掌间与指尖间的定位起着相互制约的作用。(4) Palm location and fingertip filtering: perform palm location calculation through fingertip location, and the location between palm and fingertips plays a role of mutual restriction.

本发明的基于手势识别的人机交互系统操作方法的指尖定位模块，在步骤(2)中，所述图像预处理包括色彩空间转换、肤色阈值处理、图像去噪处理、图像二值处理以及开运算处理，其中：In the fingertip positioning module of the gesture recognition-based human-computer interaction system operating method of the present invention, in step (2), the image preprocessing includes color space conversion, skin color threshold processing, image denoising processing, image binary processing and Open operation processing, in which:

色彩空间转换：将RGB图像转换到HSV色彩模型下；Color space conversion: convert the RGB image to the HSV color model;

肤色阈值处理：利用OpenCV的otsu自适应阈值分割；Skin color thresholding: using OpenCV's otsu adaptive threshold segmentation;

图像去噪处理：去除图像中识别目标周围的噪声；Image denoising processing: remove the noise around the identified target in the image;

图像二值处理：将图像的前景与背景进行分割；Image binary processing: segment the foreground and background of the image;

开运算处理：消除经二值处理后图像中不连通的散点，并填充缺失点。Open operation processing: Eliminate disconnected scattered points in the image after binary processing, and fill in missing points.

本发明的基于手势识别的人机交互系统操作方法的指尖定位模块，在步骤(3)中，所述轮廓与指尖寻找包括轮廓查找、指尖点查找、掌间定位以及指尖过滤，其中：In the fingertip positioning module of the gesture recognition-based human-computer interaction system operating method of the present invention, in step (3), the contour and fingertip search includes contour search, fingertip point search, palm positioning and fingertip filtering, in:

轮廓查找：通过四连通或八连通获取多个轮廓后筛选唯一轮廓；Contour search: filter unique contours after obtaining multiple contours through four-connected or eight-connected;

指尖点查找：识别当前指尖的个数和位置；Fingertip search: identify the number and position of the current fingertips;

掌间定位：计算极小值点得到掌间定位；Palm positioning: Calculate the minimum value point to get the palm positioning;

指尖过滤：筛选并去除多余的指尖点。Fingertip Filter: Filter and remove redundant fingertip points.

本发明的基于手势识别的人机交互系统操作方法的指尖定位模块，在步骤(4)中，所述掌间定位和指尖过滤，其中，针对掌间点计算，提出了两种解决方案：利用极小值原理、利用三角形两边之和大于第三边的原理。In the fingertip positioning module of the gesture recognition-based human-computer interaction system operating method of the present invention, in step (4), the palm positioning and fingertip filtering, wherein, for the palm point calculation, two solutions are proposed : Use the principle of minimum value and the principle that the sum of the two sides of a triangle is greater than the third side.

本发明的基于手势识别的人机交互系统操作方法的手势跟踪模块，包括以下步骤：The gesture tracking module of the human-computer interaction system operating method based on gesture recognition of the present invention comprises the following steps:

(1)跟踪区域选择：通过指尖定位选取感兴趣的区域；(1) Tracking area selection: select the area of interest through fingertip positioning;

(2)Camshift跟踪手部算法处理：主要通过视频中运动物体的颜色信息来达到跟踪的目的；(2) Camshift tracking hand algorithm processing: mainly through the color information of moving objects in the video to achieve the purpose of tracking;

(3)跟踪重心与面积提取：采用帧差法来计算指尖坐标与跟踪重心的移动向量，跟踪面积的变化与指尖和掌间点的数量，据此做简单的判断。(3) Tracking center of gravity and area extraction: use the frame difference method to calculate the coordinates of the fingertip and the moving vector of the tracking center of gravity, track the change of the area and the number of points between the fingertip and the palm, and make simple judgments based on this.

进一步地，所述Camshift跟踪手部算法处理通过指尖坐标定位与Camshift算法结合起来，将Camshift算法改进为一个无监督学习的目标跟踪方法。Further, the Camshift hand tracking algorithm is processed by combining the fingertip coordinate positioning with the Camshift algorithm, and the Camshift algorithm is improved into an unsupervised learning target tracking method.

本发明与现有技术相比具有以下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、对2D摄像头捕捉到的图像进行识别，并通过指尖定位、手势跟踪及映射三大模块实现键鼠模拟，克服现有技术方案所需硬件设备多、成本高的缺点；1. Recognize the images captured by the 2D camera, and realize the keyboard and mouse simulation through the three modules of fingertip positioning, gesture tracking and mapping, so as to overcome the disadvantages of many hardware devices and high cost required by the existing technical solutions;

2、提出两种解决方案计算掌间点，解决了现有算法无法准确定位到其中某个掌间点的难题，提高了手势识别的准确性。2. Two solutions are proposed to calculate the palm point, which solves the problem that the existing algorithm cannot accurately locate one of the palm points, and improves the accuracy of gesture recognition.

附图说明Description of drawings

图1为本发明的基于手势识别的人机交互系统装置的一个具体实施方式的结构示意图。FIG. 1 is a schematic structural diagram of a specific embodiment of the gesture recognition-based human-computer interaction system device of the present invention.

图2为本发明的一个具体实施方式中人机交互系统的模块结构框图。Fig. 2 is a block diagram of the module structure of the human-computer interaction system in a specific embodiment of the present invention.

图3为本发明的一个具体实施方式中指尖定位模块图像预处理步骤的工作流程图。Fig. 3 is a working flow chart of the image preprocessing steps of the fingertip positioning module in a specific embodiment of the present invention.

图4为本发明的一个具体实施方式中手势图像经开运算处理后的前后对比图。Fig. 4 is a comparison diagram before and after the gesture image is processed by opening operation in a specific embodiment of the present invention.

图5为本发明的一个具体实施方式中指尖定位模块轮廓与指尖寻找步骤的工作流程图。Fig. 5 is a working flow chart of the fingertip positioning module outline and fingertip finding steps in a specific embodiment of the present invention.

图6为四连通区域和八连通区域的原理对照图。Fig. 6 is a principle comparison diagram of four-connected regions and eight-connected regions.

图7为一个凸包集，即指尖点集。Figure 7 is a convex hull set, that is, a set of fingertip points.

图8为掌间点示意图。Figure 8 is a schematic diagram of the point between the palms.

图9为三角形顶点到两点的距离大于该边上所有点到两点距离的原理描述图。Fig. 9 is a diagram illustrating the principle that the distance from a triangle vertex to two points is greater than the distance from all points on the side to two points.

图10为跟踪区域选择示意图。Fig. 10 is a schematic diagram of tracking area selection.

具体实施方式detailed description

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

参见图1，本发明的基于手势识别的人机交互系统装置由图中几个重要组件构成，其中：Referring to Fig. 1, the human-computer interaction system device based on gesture recognition of the present invention is composed of several important components in the figure, wherein:

所述的用户1通过手势表达出其操作指令；所述的单目摄像头2运行工作并捕获用户的手势图像，其中，所述的单目摄像头2通过USB接口将其捕获的图像传送给所述的PC机3；所述的PC机3接收到图像后，其中的人机交互程序对手势图像进行识别、分析及处理，其中，所述的PC机通过串口或USB接口将处理后的手势信息发送给所述的PLC板4；所述的PLC板4获取PC机发送的信息后通过电路将信息传送给所述的物理应用设备5；所述的物理应用设备5获取信息并对信息进行反馈实施，其中，本实施例中所述的物理应用设备5以空调为例，但不限于此。The user 1 expresses its operation instructions through gestures; the monocular camera 2 runs and captures the gesture image of the user, wherein the monocular camera 2 transmits the captured image to the PC 3; after the PC 3 receives the image, the human-computer interaction program therein recognizes, analyzes and processes the gesture image, wherein the PC passes the processed gesture information through a serial port or a USB interface Send to the described PLC board 4; The described PLC board 4 obtains the information sent by the PC and transmits the information to the described physical application device 5 through the circuit; the described physical application device 5 obtains the information and feeds back the information Implementation, wherein the physical application device 5 described in this embodiment is an air conditioner as an example, but it is not limited thereto.

参见图2，所述操作方法包括手势输入模块、指尖定位模块、手势跟踪模块和键鼠映射模块，其中：Referring to Fig. 2, the operation method includes a gesture input module, a fingertip positioning module, a gesture tracking module and a keyboard and mouse mapping module, wherein:

(1)手势输入模块：单目摄像头捕获用户手势图像，并将捕获到的手势图像传送给PC机，进而输入指尖定位模块；(1) Gesture input module: the monocular camera captures the user's gesture image, and transmits the captured gesture image to the PC, and then inputs the fingertip positioning module;

(2)指尖定位模块：对摄像头捕捉的图片进行一系列预处理，之后通过寻找轮廓与指尖点定位，通过掌间点来再次定位。指尖定位模块由以下部分组成：摄像头载入图像、图像预处理、轮廓与指尖点寻找、掌间定位：(2) Fingertip positioning module: Perform a series of preprocessing on the pictures captured by the camera, and then locate by looking for contours and fingertip points, and re-position by palm points. The fingertip positioning module consists of the following parts: camera loading image, image preprocessing, contour and fingertip point finding, palm positioning:

(2.1)摄像头载入图像：调出摄像头，进行图像载入。(2.1) Image loading from the camera: Call up the camera to load the image.

(2.2)图像预处理：参见图3，预处理过程包括对图像进行色彩空间转换、肤色阈值处理、图像去噪处理、图像二值处理、开运算处理，具体描述如下：(2.2) Image preprocessing: referring to Fig. 3, the preprocessing process includes color space conversion, skin color threshold processing, image denoising processing, image binary processing, and opening operation processing to the image, and the specific description is as follows:

(2.2.1)色彩空间转换：一般情况下，图片都是RGB色彩模型下的。但是RGB三分量间常有很高的相关性，直接利用这些分量往往不能达到预想的效果，所以需要将RGB图像转换到HSV色彩模型下。其中，H、S、V的值可分别由公式(2)、(3)、(4)得到。(2.2.1) Color space conversion: In general, pictures are under the RGB color model. However, there is often a high correlation between the RGB three components, and the direct use of these components often cannot achieve the expected effect, so it is necessary to convert the RGB image to the HSV color model. Among them, the values of H, S, and V can be obtained by formulas (2), (3), and (4) respectively.

V＝MAX (4)V=MAX (4)

以上公式(1)中MAX、MIN分别是RGB图像的最大值、最小值，H、S、V分别是HSV图像的H值、S值和V值。In the above formula (1), MAX and MIN are the maximum and minimum values of the RGB image respectively, and H, S and V are the H value, S value and V value of the HSV image respectively.

转换到HSV色彩空间后，通过取H值的0～180进行阈值分割，则可获取到所需的二值图像。After converting to the HSV color space, the required binary image can be obtained by taking the H value from 0 to 180 for threshold segmentation.

(2.2.2)肤色阈值处理：利用OpenCV的otsu自适应阈值分割。程序流程为：计算直方图并归一化histogram；计算图像灰度均值avgValue；计算直方图的零阶w[i]和一级矩u[i]计算并找到最大的类间方差(between-class variance)。(2.2.2) Skin color thresholding: using OpenCV's otsu adaptive threshold segmentation. The program flow is: calculate the histogram and normalize the histogram; calculate the image gray average avgValue; calculate the zero-order w[i] and first-order moment u[i] of the histogram; calculate and find the largest inter-class variance (between-class variance).

variance[i]＝(avgValue*w[i]-u[i])*(avgValue*w[i]-u[i])/(w[i]*(1-w[i]))对应此最大方差的灰度值即为要找的阈值。variance[i]=(avgValue*w[i]-u[i])*(avgValue*w[i]-u[i])/(w[i]*(1-w[i])) corresponds to this The gray value of the maximum variance is the threshold to be found.

(2.2.3)图像去噪处理：现实中的数字图像在数字化和传输过程中常受到成像设备与外部环境噪声干扰等影响，故需对图像进行去噪处理。本实施例使用团块面积阈值法进行图像滤波去噪声，去除图像中目标对象周围的噪声，过程为：(2.2.3) Image denoising processing: In reality, digital images are often affected by imaging equipment and external environmental noise interference during digitization and transmission, so image denoising processing is required. In this embodiment, the cluster area threshold method is used for image filtering and denoising, and the noise around the target object in the image is removed. The process is as follows:

采用二值数学形态学中的连通组元提取算法来求取团块的面积，小于阈值的团块为噪声，将该团块的像素点灰度值都设成255即可去除噪声。The connected component extraction algorithm in binary mathematical morphology is used to calculate the area of the clusters. The clusters smaller than the threshold are noise, and the gray value of the pixels of the clusters is set to 255 to remove the noise.

(2.2.4)图像二值处理：进行图像二值化，将图像的前景与背景进行分割。图像二值化指将图像上的像素点的灰度值设置为0或255，使得整个图像呈现出明显的黑白效果。图像二值化是图像分析与处理中最常见最重要的处理手段，其使得图像中数据量大为减少，从而能凸显出目标的轮廓。在OpenCV中，可用关键函数cvThreshold()实现图像的二值化。(2.2.4) Image binary processing: perform image binarization, and segment the foreground and background of the image. Image binarization refers to setting the gray value of pixels on the image to 0 or 255, so that the entire image presents an obvious black and white effect. Image binarization is the most common and important processing method in image analysis and processing, which greatly reduces the amount of data in the image, thereby highlighting the outline of the target. In OpenCV, the key function cvThreshold() can be used to realize the binarization of the image.

(2.2.5)开运算处理：为了消除二值化后不连通的散点同时填充手部中的缺失点，从而达到较好的图像效果，使用形态学方法中的开运算，即先腐蚀后膨胀。设f(x，y)是输入图像，b(x，y)是结构元素，用结构元素b对输入图像f进行腐蚀和膨胀运算分别定义为：(2.2.5) Opening operation processing: In order to eliminate the disconnected scattered points after binarization and fill the missing points in the hand at the same time, so as to achieve a better image effect, use the opening operation in the morphological method, that is, first corrode and then swell. Let f(x, y) be the input image, b(x, y) be the structural element, and use the structural element b to corrode and dilate the input image f, respectively defined as:

(f⊙b)(s，t)＝min{f(s+x，t+y)+b(x，y)|(s+x，t+y)∈D_f，(x，y)∈D_b} (5)(f⊙b)(s, t)=min{f(s+x, t+y)+b(x, y)|(s+x, t+y)∈D _f , (x, y)∈ D _b } (5)

其中，s，t为输入图像f的参数，x，y为结构元素b的参数，D_f为图像f的一个集合，D_b为结构元素b的一个集合。Among them, s, t are the parameters of the input image f, x, y are the parameters of the structural element b, D _f is a set of image f, and D _b is a set of structural element b.

参见图3，为开运算处理的手势图像对比图。Referring to FIG. 3 , it is a comparison diagram of gesture images processed by opening operation.

(2.3)轮廓与指尖点寻找：对预处理后的图像进行轮廓寻找，并基于轮廓进行一系列步骤，寻找指尖点。参见图5，轮廓与指尖点寻找过程包括轮廓查找、指尖点查找、掌间定位以及指尖过滤，具体描述如下：(2.3) Contour and fingertip point search: perform contour search on the preprocessed image, and perform a series of steps based on the contour to find the fingertip point. Referring to Figure 5, the contour and fingertip search process includes contour search, fingertip search, palm positioning, and fingertip filtering. The specific description is as follows:

(2.3.1)轮廓查找：一个轮廓一般对应一系列的点，也就是图像中的一条曲线，是通过顺序找出边缘点来跟踪边界的。由于每个区域内的像素值相同，可通过四连通或八连通区域进行轮廓查找。四连通和八连通可标注二进制图像中已连接的部分，语法实现为L＝(BW，n)[L，num]。其中BW为输入图像；n可取值为4或8表示连接四连通或八连通区域；num为找到的连通区域数目；L为输出图像矩阵，其元素值为整数，背景被标记为0，第一个连通区域被标记为1，第二个连通区域被标记为2，依此类推。(2.3.1) Contour search: A contour generally corresponds to a series of points, that is, a curve in the image, and the boundary is traced by sequentially finding edge points. Since the pixel values in each region are the same, the contour search can be performed through four-connected or eight-connected regions. Four-connectivity and eight-connectivity can mark the connected parts in the binary image, and the syntax is realized as L=(BW,n)[L,num]. Among them, BW is the input image; the value of n can be 4 or 8 to indicate the connection of four-connected or eight-connected regions; num is the number of connected regions found; L is the output image matrix, its element value is an integer, the background is marked as 0, the first One connected region is labeled 1, the second 2, and so on.

参见图6，四连通和八连通的原理对照图，图中的0是中心像素点所在的位置就是四连通或八连通区域，即四连通区域指0的上下左右四个点，八连通还包含左上角、右上角、左下角和右下角四个位置，故八连通区域包含了四连通区域。See Figure 6, the principle comparison diagram of four-connected and eight-connected, the 0 in the figure is the position of the central pixel point is the four-connected or eight-connected area, that is, the four-connected area refers to the four points up, down, left, and right of 0, and the eight-connected also includes There are four positions in the upper left corner, upper right corner, lower left corner and lower right corner, so the eight-connected region contains the four-connected region.

轮廓查找结束后，会得到多个轮廓，通过最大轮廓定义筛选获取唯一的轮廓作为手势轮廓，并用于后续的指尖查找。After the contour search is completed, multiple contours will be obtained, and the only contour obtained through the maximum contour definition filter will be used as the gesture contour, which will be used for subsequent fingertip search.

(2.3.2)指尖点查找：从点集的角度来说，指尖在手掌轮廓中是一个凸包，凸包是指一个最小凸多边形，满足一个点集中的所有点在多边形边上或者内部，参见图7，线段围成的多边形就是点集{p₀，p₁……p₁₂}的凸包，其中 p₀，p₁，……p₁₂均指节点或顶点。这里用凸包查找进行指尖的定位，即识别手势中凸包的个数和位置，即可识别出当前指尖的个数和位置。(2.3.2) Fingertip point search: From the point of view of the point set, the fingertip is a convex hull in the outline of the palm, and the convex hull refers to a minimum convex polygon, satisfying that all points in a point set are on the polygon side or Internally, see Figure 7, the polygon surrounded by line segments is the convex hull of the point set {p ₀ , p ₁ ...p ₁₂ }, where p ₀ , p ₁ , ...p ₁₂ all refer to nodes or vertices. Here, the convex hull search is used to locate the fingertips, that is, to identify the number and position of the convex hulls in the gesture, and then the number and position of the current fingertips can be identified.

(2.3.3)掌间定位：通过指尖定位进行掌间定位计算，掌间与指尖间的定位起着相互制约的作用。参见图8的掌间示意图，图中A-E为对应的掌间点。通过计算极小值点得到掌间定位，即通过公式(7)找出指尖点间的各点坐标与其所对应前后坐标点间的相对位置来做出的判断找到基于相邻两个指尖点间的极小值点。(2.3.3) Palm location: The palm location is calculated through the fingertip location, and the location between the palm and the fingertip plays a role of mutual restriction. Refer to the schematic diagram of the inter-palm in Fig. 8, A-E in the figure are the corresponding inter-palm points. The palm positioning is obtained by calculating the minimum value point, that is, the judgment is made based on the formula (7) to find the coordinates of each point between the fingertip points and the relative position between the corresponding front and rear coordinate points. The minimum point between the points.

f(x₀)≥f(x₁)and f(x₀)≥f(x₂) (7)f(x ₀ )≥f(x ₁ )and f(x ₀ )≥f(x ₂ ) (7)

其中f(x₀)为当前垂直方向像素点坐标值，f(x₁)与f(x₂)分别为其前驱坐标值与后继坐标值，代码中选择的前驱后继跨度为3个像素点，此意图是为了防止像素点间构成线条的布局影响。该算法有个缺陷，当手成水平状态时，则无法准确定位到其中某个掌间点。Among them, f(x ₀ ) is the coordinate value of the pixel point in the current vertical direction, f(x ₁ ) and f(x ₂ ) are the predecessor coordinate value and the successor coordinate value respectively, and the predecessor and successor span selected in the code is 3 pixels, The intent is to prevent layout effects of lines formed between pixels. This algorithm has a flaw. When the hand is in a horizontal state, it cannot accurately locate a point between the palms.

由于在三角形内部过任意角做一条穿过对边的线所围成的三角形的三边和没有原三角形三边大的原理。参见图9，可知三角形顶点到两点的距离大于该边上所有点到两点距离，即AB+AC＞BD+DC，所以解决方案可基于此原理通过公式(8)获取相邻指尖点(即凸包点)间轮廓点至两点间欧几里德距离和最大值点，这种方法可以有效地解决当手成水平状态时，无法检测掌间点的问题。Due to the principle that the three sides of the triangle surrounded by a line passing through the opposite side through any angle in the interior of the triangle are not as large as the three sides of the original triangle. Referring to Figure 9, it can be seen that the distance from the triangle vertex to the two points is greater than the distance from all points on the side to the two points, that is, AB+AC>BD+DC, so the solution can be based on this principle to obtain adjacent fingertip points through formula (8) (that is, the convex hull point) between the contour point and the Euclidean distance between the two points and the maximum point, this method can effectively solve the problem that the palm point cannot be detected when the hand is in a horizontal state.

其中ACB为相邻指尖至目标欧式距离之和，(C_x，C_y)为目标像素点C的坐标，(A_x，A_y)、(B_x，B_y)分别为相邻指尖点A与B的坐标。目标点C从A点出发，当ABC大于MAX时则赋值给MAX，如此直至C到达B点，最终获取的MAX点则为所需的掌间点。Where ACB is the sum of Euclidean distances from adjacent fingertips to the target, (C _x , C _y ) is the coordinate of the target pixel point C, (A _x , A _y ), (B _x , By _y ) are the adjacent fingertips The coordinates of points A and B. The target point C starts from point A, and when ABC is greater than MAX, it is assigned to MAX, and so on until C reaches point B, and the MAX point finally obtained is the required palm point.

此方法求出的并非全是指尖间的掌间点，有时会包含手掌或手腕的两侧，可以通过向量夹角公式(11)去除该两点。This method does not calculate all the inter-palm points between the fingertips, and sometimes includes the palm or both sides of the wrist, which can be removed by the vector angle formula (11).

p_x＝A_x-C_x，p_y＝A_y-C_y，q_x＝B_x-C_x，q_y＝B_Y-C_y (10)p _x =A _x -C _x , p _y =A _y -C _y , q _x =B _x -C _x , q _y =B _y -C _y (10)

p_x、p_y与q_x、q_y分别表示从C点出发指向A与B点的向量，如此便可有向量夹角公式得出cosθ，故可以定义当cosθ小于0时则该点去除。p _x , p _y and q _x , q _y represent the vectors starting from point C and pointing to points A and B respectively, so that cosθ can be obtained from the vector angle formula, so it can be defined that when cosθ is less than 0, the point will be removed.

(2.3.4)指尖过滤：获取指尖点后最重要的部分。但是指尖部位并不是只有一个指尖点，故主要工作是进行筛选，首先需要去除多余的指尖点。使用函数deldot()删除距离较近的指尖点，处理完紧密的点后，便剔除非指尖的指尖点。此时需要用到掌间点，通过掌间点在两指尖之间这一特性，查找同时符合相邻点之间有掌间点和距掌间点的距离超过自定的阈值这两个条件时，即可绘制出指尖点与掌间点。(2.3.4) Fingertip filtering: Get the most important part after the fingertip point. However, there is not only one fingertip point at the fingertip, so the main task is to screen, first of all, redundant fingertip points need to be removed. Use the function deldot() to delete the fingertip points that are close to each other. After processing the close points, remove the non-fingertip points. At this time, you need to use the palm point. Through the feature that the palm point is between the two fingertips, you can find that there is a palm point between adjacent points and the distance from the palm point exceeds a self-defined threshold. When conditions are met, the fingertip point and palm point can be drawn.

(3)手势跟踪模块：追踪用户手势的移动轨迹，并获取手势的移动量，作为键鼠映射模块的输入信息。采用Camshift改进算法进行手部跟踪即不需要在跟踪之前用鼠标选取跟踪区域。手势跟踪模块由以下部分组成：跟踪区域选择、Camshift算法改进、跟踪重心、面积提取：(3) Gesture Tracking Module: track the moving trajectory of the user's gesture, and obtain the movement amount of the gesture as the input information of the keyboard and mouse mapping module. Using Camshift improved algorithm for hand tracking does not need to use the mouse to select the tracking area before tracking. The gesture tracking module consists of the following parts: tracking area selection, Camshift algorithm improvement, tracking center of gravity, area extraction:

(3.1)跟踪区域选择：通过指尖定位选取感兴趣的区域。(3.1) Tracking area selection: select the area of interest by fingertip positioning.

(3.2)Camshift跟踪手部算法改进：Camshift算法，即”Continuously adaptiveMean-Shift”算法，是一种运动跟踪算法。在通常的跟踪处理中，它主要通过视频中运动物体的颜色信息来达到跟踪的目的。其是Meanshift算法的改进。(3.2) Improvement of Camshift tracking hand algorithm: Camshift algorithm, namely "Continuously adaptive Mean-Shift" algorithm, is a motion tracking algorithm. In the usual tracking process, it mainly uses the color information of the moving object in the video to achieve the purpose of tracking. It is an improvement of the Meanshift algorithm.

Mean-Shift算法仅对数据局部区域中的点进行处理，处理完成后再移动区域。与Mean-Shift不同的是，Camshift搜索窗口会自动调整尺寸。如果有一个易于分割的分布(如保持紧密的手特征)，此算法可以根据手在张开与拳握时手的尺寸来自动调整窗口尺寸。在跟踪的应用中，它会把由前一帧计算出的新尺寸作为下一帧的跟踪区域。The Mean-Shift algorithm only processes the points in the local area of the data, and then moves the area after the processing is completed. Unlike Mean-Shift, the Camshift search window is automatically resized. If there is an easy-to-segment distribution (such as hand features that keep tight), this algorithm can automatically adjust the window size according to the size of the hand when it is opened and closed. In tracking applications, it will use the new size calculated from the previous frame as the tracking area for the next frame.

在Mean-Shift与Camshift跟踪算法中，都需要事先选定感兴趣区域，一般来说，触发一条鼠标事件来选取跟踪范围，但这属于有监督性的，并不符合本项目的期望。故需要进行Camshift算法改进，使之可以无监督选取感兴趣的区域。本文主要通过指尖定位来获取该区域，即在达到指尖定位的功能后，分别选取指尖坐标x、y值的组成一个坐标作为左上角点，宽度和高度取水平垂直最长距离的最小值。参见图10，A点与B点则分别为指尖的左上角点与右下角点、左上角点的坐标值即可保留，再对两点进行差值运算通过公式MIN(|A_x-B_x|，|A_y-B_y|)计算得出跨度大小。其中，(A_x，A_y)、(B_x，B_y)分别为点A与点B的坐标。In the Mean-Shift and Camshift tracking algorithms, the region of interest needs to be selected in advance. Generally speaking, a mouse event is triggered to select the tracking range, but this is supervised and does not meet the expectations of this project. Therefore, it is necessary to improve the Camshift algorithm so that it can unsupervisedly select the region of interest. This article mainly obtains this area through fingertip positioning, that is, after the function of fingertip positioning is achieved, a coordinate composed of fingertip coordinates x and y values is selected as the upper left corner point, and the width and height are the smallest of the longest horizontal and vertical distances. value. Referring to Figure 10, point A and point B are respectively the upper left corner point and the lower right corner point of the fingertip, and the coordinate values of the upper left corner point can be retained, and then the difference between the two points is calculated by the formula MIN(|A _x -B _x |, |A _y -B _y |) calculates the span size. Wherein, (A _x , A _y ), (B _x , B _y ) are the coordinates of point A and point B, respectively.

(3.3)跟踪重心与面积提取：主要为模拟键盘和鼠标的算法设计。本发明专利采用帧差法来计算指尖坐标与跟踪重心的移动向量，跟踪面积的变化与指尖和掌间点的数量，做一个简单的判断。(3.3) Tracking center of gravity and area extraction: mainly for the algorithm design of simulating keyboard and mouse. The patent of the present invention uses the frame difference method to calculate the coordinates of the fingertips and track the movement vector of the center of gravity, track the change of the area and the number of points between the fingertips and the palm, and make a simple judgment.

(4)映射模块：整合指尖定位模块与手势跟踪模块，对在摄像头下的操作与模拟键鼠操作间起着搭建枢纽的作用。其包括键盘模拟和鼠标模拟：(4) Mapping module: Integrating the fingertip positioning module and the gesture tracking module, it plays the role of building a hub between the operation under the camera and the simulated keyboard and mouse operation. It includes keyboard emulation and mouse emulation:

(4.1)键盘模拟：用到的参数为跟踪重心与跟踪面积，首先通过帧差法计算出重心移动向量与面积变化向量。在模拟方向键的计算中，用事先定义的时间来计算出移动的速度，当超过该速度则判断方向按键。由于垂直方向比水平方向做功多，所以默认垂直方向键的优先级较高。空格的优先级是最低的，判断其按键时，让变化后的面积与上一帧的面积进行对比，如果变化后的面积小于上一面积0.6倍，则判定激活空格键。(4.1) Keyboard simulation: The parameters used are tracking center of gravity and tracking area. First, the center of gravity movement vector and area change vector are calculated by the frame difference method. In the calculation of simulating the direction key, the time defined in advance is used to calculate the moving speed, and when the speed exceeds the speed, the direction key is judged. Since the vertical direction does more work than the horizontal direction, the priority of the vertical direction key is higher by default. The priority of the space is the lowest. When judging the button, compare the changed area with the area of the previous frame. If the changed area is less than 0.6 times the previous area, it is determined to activate the space bar.

(4.2)鼠标模拟：用到的是指尖点与掌间点的个数。同键盘模拟一样先用帧差发计算出指尖的移动向量与指尖、掌间点的个数。然后通过事先定义的时间来计算出移动的速度，当超过该速度时则判断移动的向量，在赋值给当前鼠标的坐标。掌间点的个数是为了判定是否进行鼠标的移动与是否进行鼠标单击。(4.2) Mouse simulation: the number of fingertip points and palm points is used. Like the keyboard simulation, the frame difference is used to calculate the movement vector of the fingertip and the number of points between the fingertip and the palm. Then the moving speed is calculated by the time defined in advance, and when the speed exceeds the speed, the moving vector is judged and assigned to the current mouse coordinates. The number of palm points is to determine whether to move the mouse and whether to click the mouse.

上述为本发明较佳的实施方式，但本发明的实施方式并不受上述内容的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above content, and any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principles of the present invention are all Replacement methods that should be equivalent are all included within the protection scope of the present invention.

Claims

1. A human-computer interaction system device and operating method based on gesture recognition, characterized in that, the simulation of keyboard and mouse is realized through four modules and then human-computer interaction is realized, wherein:

The system device includes a monocular camera, a PC installed with a human-computer interaction program, a PLC board and physical application equipment, wherein the monocular camera is used to capture the user's gesture image; the PC obtains the image through a USB interface. The gesture image captured by the monocular camera, wherein the human-computer interaction program will identify, analyze and process the image; the PLC board obtains the gesture information processed by the PC through a serial port or USB; the physical application device passes The circuit acquires the information and performs feedback on the information.

The operation method includes a gesture input module, a fingertip positioning module, a gesture tracking module and a keyboard and mouse mapping module, wherein:

The gesture input module is used to capture user gestures with a monocular camera, and input the captured gesture images into the fingertip positioning module;

The fingertip positioning module is used to locate the fingertip position in the gesture image, and recognize the type of user gesture and the position of the gesture based on this, as the input of the gesture tracking module and the keyboard and mouse mapping module;

The gesture tracking module is used to track the movement track of the user's gesture, and obtain the movement amount of the gesture as the input of the keyboard and mouse mapping module;

The keyboard and mouse mapping module is used to recognize the type of user gestures and the amount of gesture movement as corresponding keyboard and mouse operations, and perform computer control accordingly.

2. The human-computer interaction system device and operating method based on gesture recognition according to claim 1, wherein the fingertip positioning module comprises the following steps:

(1) Camera loading image: call out the camera loading image;

(2) Image preprocessing: perform color space conversion, skin color threshold processing, image denoising processing, image binary processing and open operation processing on the image;

(3) Contour and fingertip search: based on the palm point and fingertip point search on the contour;

(4) Palm location and fingertip filtering: perform palm location calculation through fingertip location, and the location between palm and fingertips plays a role of mutual restriction.

3. The human-computer interaction system device and operating method based on gesture recognition according to claim 1, wherein the gesture tracking module combines the Camshift algorithm with the fingertip coordinate positioning to improve the Camshift algorithm into a wireless A target tracking method for supervised learning, wherein the module includes the following steps:

(1) Tracking area selection: select the area of interest through fingertip positioning;

(2) Camshift tracking hand algorithm processing: mainly through the color information of moving objects in the video to achieve the purpose of tracking;

(3) Tracking center of gravity and area extraction: use the frame difference method to calculate the coordinates of the fingertip and the moving vector of the tracking center of gravity, track the change of the area and the number of points between the fingertip and the palm, and make simple judgments based on this.

4. The human-computer interaction system device and operation method based on gesture recognition according to any one of claims 1-2, characterized in that, the fingertip positioning module, wherein, for the palm point calculation, two methods are proposed: Solution: Use the principle of minimum value and the principle that the sum of the two sides of a triangle is greater than the third side.

5. in the human-computer interaction system operating method based on gesture recognition according to claim 2, it is characterized in that, the fingertip positioning module, in step (2), the image preprocessing includes color space conversion, skin color Threshold processing, image denoising processing, image binary processing and opening operation processing, in which:

Color space conversion: convert the RGB image to the HSV color model;

Skin color thresholding: using OpenCV's otsu adaptive threshold segmentation;

Image denoising processing: remove the noise around the identified target in the image;

Image binary processing: segment the foreground and background of the image;

Open operation processing: Eliminate disconnected scattered points in the image after binary processing, and fill in missing points.

6. in the human-computer interaction system operating method based on gesture recognition according to claim 2, it is characterized in that, described fingertip positioning module, in step (3), described contour and fingertip search comprise contour search, Fingertip point search, palm positioning and fingertip filtering, among which:

Contour search: filter unique contours after obtaining multiple contours through four-connected or eight-connected;

Fingertip search: identify the number and position of the current fingertips;

Palm positioning: Calculate the minimum value point to get the palm positioning;

Fingertip Filter: Filter and remove redundant fingertip points.