CN107300976A

CN107300976A - A kind of gesture identification household audio and video system and its operation method

Info

Publication number: CN107300976A
Application number: CN201710685075.1A
Authority: CN
Inventors: 甘俊英; 戚玲; 曾军英; 何国辉; 翟懿奎
Original assignee: Wuyi University Fujian
Current assignee: Wuyi University Fujian
Priority date: 2017-08-11
Filing date: 2017-08-11
Publication date: 2017-10-27

Abstract

The invention discloses a home theater system for gesture recognition, which includes a sound pickup module, a control module, audio-visual equipment, and a camera. The control module is respectively connected to the sound pickup module, audio-visual equipment, and camera; the camera is provided with a gesture recognition module, The gesture recognition module is connected with the control module. The invention also discloses an operation method of the gesture recognition home theater system. The gesture recognition home theater system is used to accurately control audio-visual equipment using a combination of voice and gestures.

Description

A gesture recognition home theater system and its operation method

技术领域technical field

本发明涉及家庭影院系统，具体涉及一种手势识别家庭影院系统及其运作方法。The invention relates to a home theater system, in particular to a gesture recognition home theater system and an operating method thereof.

背景技术Background technique

在2017年3月，联想发布了65i3，65i3是一台会聊天的智能电视，实现了完全的语音操控，65i3集成远场语音、近场摇控器语音，它能够做到识别、理解并回应。65i3去除了影音设备的额外设备，如遥控器等，其增强了用户体验的舒适感，减少了设备的成本。另外，目前还出现了一套适用于智能家居环境的手势操作指令集及其识别方法，其是针对家庭内部所有的设备提出的手势操作方案，其提供了一套适用于智能家居环境的手势操作指令集及其基于计算机视觉的识别方法，其指令集包括通用操作方法手势和快捷方式操作手势，在通用操作方式下提出了一种利用双目摄像头以人眼位置和手指指向判断待控设备的方法。In March 2017, Lenovo released the 65i3, which is a smart TV that can chat and realize complete voice control. The 65i3 integrates far-field voice and near-field remote controller voice, which can recognize, understand and respond . The 65i3 removes the extra equipment of the audio-visual equipment, such as the remote control, which enhances the comfort of the user experience and reduces the cost of the equipment. In addition, there is also a set of gesture operation instruction sets and recognition methods suitable for smart home environments. It is a gesture operation solution for all devices in the home, and it provides a set of gesture operations for smart home environments. Instruction set and its recognition method based on computer vision. The instruction set includes general operation method gestures and shortcut operation gestures. Under the general operation mode, a method of using binocular cameras to judge the device to be controlled based on the position of the human eyes and finger pointing is proposed. method.

我国智能家居起步很晚，研究不够深入，技术还不够成熟，智能家庭影院的发展还存在一些比较明显的问题和缺陷。上述的65i3虽然是语音控制，但是文化背景不同的地域语言的差异较大，所以并不能很顺利的推广出去。上述的智能家居环境的手势操作指令集及其识别方法中使用的双目摄像头需要对人眼、人脸、手指、手势多处进行跟踪识别，无法保证其图像处理的质量和复杂程度，因此无法保证指令的有效性。另外，智能家居环境的手势操作指令集及其识别方法中提到的受控家具较多，而人类的手势种类有限，无法对所有家具设备都有详细的控制指令。同种手势对多种家具有不同的控制指令，使得两种家具不方便同时使用操作，会形成“机器的误解”，从而会使得用户体验不佳。my country's smart home started very late, the research is not deep enough, the technology is not mature enough, and there are still some obvious problems and defects in the development of smart home theater. Although the above-mentioned 65i3 is voice control, there are large differences in regional languages with different cultural backgrounds, so it cannot be promoted smoothly. The binocular camera used in the above-mentioned gesture operation instruction set and recognition method of the smart home environment needs to track and recognize human eyes, faces, fingers, and gestures, and the quality and complexity of its image processing cannot be guaranteed. Guarantee the validity of the order. In addition, there are many controlled furniture mentioned in the gesture operation instruction set and its recognition method of the smart home environment, but the types of human gestures are limited, and it is impossible to have detailed control instructions for all furniture equipment. The same gesture has different control commands for different kinds of furniture, which makes it inconvenient to use and operate two kinds of furniture at the same time, which will form "machine misunderstanding", which will make the user experience poor.

发明内容Contents of the invention

本发明的其中一个目的在于提供一种手势识别家庭影院系统，用以实现使用语音和手势相结合的模式精确控制影音设备。One of the objectives of the present invention is to provide a home theater system with gesture recognition, which is used to precisely control audio-visual equipment by combining voice and gesture.

为解决上述目的，本发明采用如下技术方案：To solve the above object, the present invention adopts the following technical solutions:

一种手势识别家庭影院系统，包括拾音模块、控制模块、影音设备和摄影机，所述控制模块分别与拾音模块、影音设备和摄影机连接；所述拾音模块用于识别开机语音信号，识别成功后发送开机信号至控制模块；所述控制模块用于接收拾音模块的开机信号，然后对影音设备和摄影机发出开机指令；所述控制模块用于接收摄影机的信号，然后对影音设备发出操作指令；所述摄影机内设有手势识别模块，所述手势识别模块与控制模块连接；所述手势识别模块用于识别摄影机所摄取的手势图像信号，将识别结果发送至控制模块。A gesture recognition home theater system, comprising a pickup module, a control module, audio-visual equipment and a video camera, the control module is respectively connected to the pickup module, audio-visual equipment and the video camera; After success, send the power-on signal to the control module; the control module is used to receive the power-on signal of the pickup module, and then sends a power-on command to the audio-visual equipment and the camera; the control module is used to receive the signal of the camera, and then sends an operation to the audio-visual equipment instruction; the camera is provided with a gesture recognition module, which is connected to the control module; the gesture recognition module is used to recognize the gesture image signal captured by the camera, and send the recognition result to the control module.

作为优选，所述摄影机为Kinect摄影机。Preferably, the camera is a Kinect camera.

作为优选，所述手势识别模块包括图像预处理模块、手势分割模块、手势特征提取模块和手势匹配模块，所述图像预处理模块、手势分割模块、手势特征提取模块和手势匹配模块依次连接。Preferably, the gesture recognition module includes an image preprocessing module, a gesture segmentation module, a gesture feature extraction module and a gesture matching module, and the image preprocessing module, gesture segmentation module, gesture feature extraction module and gesture matching module are connected in sequence.

作为优选，所述手势分割模块包括肤色分割模块和轮廓分割模块，所述肤色分割模块和轮廓分割模块连接。肤色分割模块中的基于肤色分割和轮廓分割模块中的基于手部形状轮廓分割两者顺利可以互换，即预处理后的图像可先进入肤色分割模块中进行基于肤色分割，后进入轮廓分割模块进行基于手部形状轮廓分割，亦可先进入轮廓分割模块进行基于手部形状轮廓分割，后进入肤色分割模块中进行基于肤色分割。Preferably, the gesture segmentation module includes a skin color segmentation module and a contour segmentation module, and the skin color segmentation module is connected to the contour segmentation module. The skin-color-based segmentation in the skin-color segmentation module and the hand-shape-based contour segmentation in the contour segmentation module can be interchanged smoothly, that is, the preprocessed image can first enter the skin-color segmentation module for skin-color-based segmentation, and then enter the contour segmentation module Carry out segmentation based on the contour of the hand shape, or first enter the contour segmentation module to perform segmentation based on the contour of the hand shape, and then enter the skin color segmentation module to perform segmentation based on skin color.

作为优选，所述手势匹配模块内设有手势库，所述手势库中包含一个以上的手势，每个手势对应不同的操作指令，该操作指令用于发送至控制模块。Preferably, a gesture library is provided in the gesture matching module, and the gesture library contains more than one gesture, and each gesture corresponds to a different operation instruction, and the operation instruction is used to send to the control module.

作为优选，所述手势库中包含14个不同的手势，14个不同的手势分别对应上一首曲目或上一个频道、上一首曲目或上一个频道、减小一级音量、增加一级音量、减小五级音量、增加五级音量、弹出菜单进行系统设置、确定选中的选项、选择上一选项、选择下一选项、静音、释放静音、暂停正在播放的视频和关机的操作指令。Preferably, the gesture library contains 14 different gestures, and the 14 different gestures correspond to the previous track or channel, the previous track or channel, the volume down by one level, and the volume up by one level , Decrease the volume by five levels, increase the volume by five levels, pop up the menu for system settings, confirm the selected option, select the previous option, select the next option, mute, release the mute, pause the video being played and shut down the operation instructions.

本发明的另一目的是提供一种手势识别家庭影院系统的运作方法。Another object of the present invention is to provide an operation method of a gesture recognition home theater system.

一种手势识别家庭影院系统的运作方法，该运作方法中采用卷积神经网络。An operating method of a gesture recognition home theater system employs a convolutional neural network.

作为优选，包括以下步骤：Preferably, the following steps are included:

1)拾音模块接收并识别开机的语音信号，识别成功后发送开机信号到控制模块，控制模块对影音设备和摄像机发出开机指令；1) The sound pick-up module receives and recognizes the voice signal for power-on, and sends the power-on signal to the control module after the recognition is successful, and the control module sends a power-on command to the audio-visual equipment and the camera;

2)摄像机接收并识别手势图像，识别成功后发送识别结果到控制模块，控制模块对影音设备发出操作指令。2) The camera receives and recognizes the gesture image, and sends the recognition result to the control module after the recognition is successful, and the control module sends an operation command to the audio-visual equipment.

作为优选，所述步骤2)中，摄像机接收到手势图像后，对手势图像中的深度图像进行识别，包括以下步骤：As preferably, in said step 2), after the camera receives the gesture image, the depth image in the gesture image is recognized, including the following steps:

a)对深度图像进行噪声滤波操作；a) performing a noise filtering operation on the depth image;

b)对噪声滤波操作后的深度图像进行分割；b) Segmenting the depth image after the noise filtering operation;

c)在分割后的深度图像上提取深度图像特征作为样本；c) Extracting depth image features on the segmented depth image as samples;

d)将提取的深度图像特征样本与手势库中的数据进行一对多的对比匹配，匹配出对应的操作指令。d) Perform one-to-many comparison and matching between the extracted depth image feature samples and the data in the gesture library, and match the corresponding operation instructions.

作为优选，所述步骤b)中经过噪声滤波操作后的深度图像利用基于肤色和基于手部形状轮廓两种方法对其进行分割。Preferably, the depth image after the noise filtering operation in the step b) is segmented using two methods based on skin color and based on the shape of the hand.

本发明的有益效果是：The beneficial effects of the present invention are:

1.本发明手势识别家庭影院系统通过设有拾音模块和手势识别模块，结合了语音和手势两种操作方法，用户只要通过手势的改变就可以调节影音设备状态，比如加减音量，转换频道，音乐播放暂停，系统设置，关机等等一系列操作，其控制灵活和精确度高，用户体验效果佳；1. The gesture recognition home theater system of the present invention is equipped with a sound pickup module and a gesture recognition module, and combines two operation methods of voice and gesture. The user can adjust the state of the audio-visual equipment as long as the gesture is changed, such as increasing or decreasing the volume, switching channels , music playback pause, system setting, shutdown and a series of operations, the control is flexible and precise, and the user experience is good;

2.本发明手势识别家庭影院系统的运作方法采用的是卷积神经网络对手势进行训练和识别，该运作方法对手势识别的精度高，稳定性好；2. The operation method of the gesture recognition home theater system of the present invention adopts a convolutional neural network to train and recognize gestures, and the operation method has high precision and good stability for gesture recognition;

3.本发明手势识别家庭影院系统完全脱离遥控器的限制，节省额外设备的制造成本，有利于市场发展。3. The gesture recognition home theater system of the present invention completely breaks away from the limitation of the remote control, saves the manufacturing cost of additional equipment, and is beneficial to market development.

附图说明Description of drawings

图1为本发明实施例提供的手势识别家庭影院系统结构框图。FIG. 1 is a structural block diagram of a gesture recognition home theater system provided by an embodiment of the present invention.

图2为本发明实施例提供的手势识别家庭影院系统中手势识别模块结构框图。FIG. 2 is a structural block diagram of a gesture recognition module in a gesture recognition home theater system provided by an embodiment of the present invention.

图3为本发明实施例提供的手势识别家庭影院系统工作流程图。FIG. 3 is a flowchart of the operation of the gesture recognition home theater system provided by the embodiment of the present invention.

图4为本发明实施例中神经网络的每个单元的示意图。Fig. 4 is a schematic diagram of each unit of the neural network in the embodiment of the present invention.

图5为本发明实施例中一个具有一个隐含层的神经网络的示意图。FIG. 5 is a schematic diagram of a neural network with one hidden layer in an embodiment of the present invention.

具体实施方式detailed description

下面结合图1-5对本发明提供的技术方案进行更为详细的阐述。The technical solutions provided by the present invention will be described in more detail below in conjunction with FIGS. 1-5 .

本发明实施例提供一种手势识别家庭影院系统，如图1所示，该手势识别家庭影院系统包括拾音模块、控制模块、影音设备和Kinect摄影机，控制模块分别与拾音模块、影音设备和摄影机连接。Embodiments of the present invention provide a home theater system for gesture recognition. As shown in FIG. Camera connection.

拾音模块用于识别开机语音信号，识别成功后发送开机信号至控制模块。The sound pick-up module is used to recognize the start-up voice signal, and sends the start-up signal to the control module after the recognition is successful.

控制模块用于接收拾音模块的开机信号，然后对影音设备和Kinect摄影机发出开机指令；控制模块用于接收Kinect摄影机的信号，然后对影音设备发出操作指令。The control module is used for receiving the power-on signal of the pickup module, and then sends a power-on instruction to the audio-visual equipment and the Kinect camera; the control module is used for receiving the signal of the Kinect camera, and then sends an operation instruction to the audio-visual equipment.

通过使用Kinect摄影机，使本发明实施例中的手势识别家庭影院系统可以获取手势图像的深度信息，并且可以对图像进行3D处理，能更加准确的判断手势动作，对于近距离和远距离的手势能够进行区别划分，比使用一般的摄影机都要智能。如图1所示，Kinect摄影机内设有手势识别模块，手势识别模块与控制模块连接；手势识别模块用于识别Kinect摄影机所摄取的手势图像信号，将识别结果发送至控制模块。如图2所示，手势识别模块包括图像预处理模块、手势分割模块、手势特征提取模块和手势匹配模块，图像预处理模块、手势分割模块、手势特征提取模块和手势匹配模块依次连接。图像预处理模块用于对Kinect摄影机接收的手势图像进行预处理，预处理过程就是通过去除噪声进行图像增强，Kinect摄影机的传感器由于采用激光散斑技术，因而获取的深度信息常常包含很大的噪声，这对于后续的数据处理和实验会产生不小的影响，因此在预处理阶段需要对深度图像进行噪声的滤波操作。By using the Kinect camera, the gesture recognition home theater system in the embodiment of the present invention can obtain the depth information of the gesture image, and can perform 3D processing on the image, and can judge the gesture action more accurately. It is smarter to make distinctions than to use ordinary cameras. As shown in Figure 1, the Kinect camera is provided with a gesture recognition module, which is connected to the control module; the gesture recognition module is used to recognize the gesture image signal captured by the Kinect camera, and sends the recognition result to the control module. As shown in Figure 2, the gesture recognition module includes an image preprocessing module, a gesture segmentation module, a gesture feature extraction module and a gesture matching module, and the image preprocessing module, gesture segmentation module, gesture feature extraction module and gesture matching module are connected in sequence. The image preprocessing module is used to preprocess the gesture images received by the Kinect camera. The preprocessing process is to enhance the image by removing noise. Because the sensor of the Kinect camera uses laser speckle technology, the acquired depth information often contains a lot of noise. , which will have a considerable impact on subsequent data processing and experiments, so it is necessary to perform noise filtering operations on the depth image in the preprocessing stage.

手势分割模块包括肤色分割模块和轮廓分割模块，肤色分割模块和轮廓分割模块连接。手势分割有基于肤色分割的和基于手部形状轮廓分割两种方法，两种方法单独操作都存在很大的不足，结合两种方法可以使手势图像的分割既不会受到背景颜色的影响，又可以根据手部形状轮廓进行准确的分割，即手势分割融合了两种判定方法，避免了单种判定方法的局限性导致机器识别准确率低的问题。图像预处理模块、肤色分割模块、轮廓分割模块和手势特征提取模块依次连接，预处理后的图像先进入肤色分割模块中进行基于肤色分割，再进入轮廓分割模块进行基于手部形状轮廓分割，最后进行手势特征提取。手势分割中基于肤色分割的划分需要注意的就是与背景颜色的区分以及不同人种的肤色区分。手势分割模块可以去除一部分手部冗余信息，保留重要信息，以备特征提取使用。要获得手势图像深度区间的特征就要考虑区间的划分，在包含整个手势图像信息的情况下，适当的权衡每个区间的长度和总区间数，使得区分尽可能的明显，如此便可以提取到手势图像的深度信息的特征。The gesture segmentation module includes a skin color segmentation module and a contour segmentation module, and the skin color segmentation module and the contour segmentation module are connected. Gesture segmentation has two methods based on skin color segmentation and hand shape contour segmentation. Both methods have great deficiencies when operated alone. Combining the two methods can make the segmentation of gesture images neither affected by the background color nor Accurate segmentation can be performed according to the shape and outline of the hand, that is, gesture segmentation combines two determination methods, avoiding the problem of low accuracy of machine recognition caused by the limitations of a single determination method. The image preprocessing module, skin color segmentation module, contour segmentation module and gesture feature extraction module are connected sequentially. The preprocessed image first enters the skin color segmentation module for skin color segmentation, then enters the contour segmentation module for hand shape contour segmentation, and finally Perform gesture feature extraction. The division based on skin color segmentation in gesture segmentation needs to pay attention to the distinction from the background color and the skin color distinction of different races. The gesture segmentation module can remove part of the hand redundant information and retain important information for feature extraction. In order to obtain the characteristics of the depth interval of the gesture image, the division of the interval must be considered. In the case of including the entire gesture image information, the length of each interval and the total number of intervals should be properly weighed to make the distinction as obvious as possible, so that it can be extracted Characterization of depth information for gesture images.

手势匹配模块内设有手势库，手势库中包含14个不同的手势，每个手势对应不同的操作指令，该操作指令用于发送至控制模块，通过设置多种手势动作形成了更为完善的手势库，操作指令旋转种类多，该手势库中的手势均按照人们生活习惯设计而成，操作简单灵活，容易被人们接受。14个不同的手势分别对应上一首曲目或上一个频道、上一首曲目或上一个频道、减小一级音量、增加一级音量、减小五级音量、增加五级音量、弹出菜单进行系统设置、确定选中的选项、选择上一选项、选择下一选项、静音、释放静音、暂停正在播放的视频和关机的操作指令。手势对应的操作指令如下表：There is a gesture library in the gesture matching module. The gesture library contains 14 different gestures. Each gesture corresponds to a different operation instruction. The gesture library has many types of operation instructions to rotate. The gestures in the gesture library are designed according to people's living habits. The operation is simple and flexible, and it is easy to be accepted by people. 14 different gestures correspond to the previous track or channel, previous track or channel, volume down one level, volume up one level, volume down five levels, volume up five levels, pop-up menu Operation commands for system settings, confirm selected option, select previous option, select next option, mute, release mute, pause playing video and shut down. The operation instructions corresponding to gestures are as follows:

如图3所示，手势识别家庭影院系统的工作流程是：拾音模块先接收开机的语音指令，Kinect摄影机开启后采集接收手势图像，图像预处理模块对接收的手势图像进行预处理，手势分割模块对预处理后的手势图像进行分割，手势特征提取模块对分割后的手势图像进行特征提取，提取后的特征与已经训练好的手势库中的匹配相应的手势特征，匹配成功后，输出相应的操作指令，匹配失败后，Kinect摄影机重新采集接收手势图像，匹配成功后对应的操作指令输出至控制模块，控制模块最后对影音设备发出对应的操作指令。As shown in Figure 3, the workflow of the gesture recognition home theater system is as follows: the pickup module first receives the voice command for starting up, the Kinect camera is turned on to collect and receive gesture images, the image preprocessing module preprocesses the received gesture images, and gesture segmentation The module segments the preprocessed gesture image, and the gesture feature extraction module performs feature extraction on the segmented gesture image. The extracted features match the corresponding gesture features in the trained gesture library. After the matching is successful, the corresponding After the matching fails, the Kinect camera re-acquires and receives gesture images. After the matching is successful, the corresponding operating instructions are output to the control module, and the control module finally sends corresponding operating instructions to the audio-visual equipment.

一种手势识别家庭影院系统的运作方法，该运作方法中采用卷积神经网络。通过采集手势作为训练集，训练卷积神经网络，卷积神经网络与普通神经网络的区别在于，卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。在CNN的一个卷积层中，通常包含若干个特征平面，每个特征平面由一些矩形排列的的神经元组成，同一特征平面的神经元共享权值，这里共享的权值就是卷积核。卷积核一般以随机小数矩阵的形式初始化，在网络的训练过程中卷积核将学习得到合理的权值。共享权值带来的直接好处是减少网络各层之间的连接，同时又降低了过拟合的风险。子采样也叫做池化，通常有均值子采样和最大值子采样两种形式。子采样可以看作一种特殊的卷积过程。卷积和子采样大大简化了模型复杂度，减少了模型的参数。卷积神经网络由三部分构成，第一部分是输入层，第二部分由n个卷积层和池化层的组合组成，第三部分由一个全连结的多层感知机分类器构成。神经网络中的每个单元如图4所示，对应的公式如下：An operating method of a gesture recognition home theater system employs a convolutional neural network. By collecting gestures as the training set, the convolutional neural network is trained. The difference between the convolutional neural network and the ordinary neural network is that the convolutional neural network includes a feature extractor composed of a convolutional layer and a sub-sampling layer. In a convolutional layer of CNN, it usually contains several feature planes, and each feature plane is composed of some rectangularly arranged neurons. The neurons of the same feature plane share weights, and the shared weights here are convolution kernels. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel will learn to obtain reasonable weights during the training process of the network. The immediate benefit of sharing weights is to reduce the connections between the layers of the network while reducing the risk of overfitting. Subsampling is also called pooling, usually in two forms: mean subsampling and maximum subsampling. Subsampling can be seen as a special kind of convolution process. Convolution and subsampling greatly simplify the model complexity and reduce the parameters of the model. The convolutional neural network consists of three parts, the first part is the input layer, the second part is composed of n convolutional layers and pooling layers, and the third part is composed of a fully connected multilayer perceptron classifier. Each unit in the neural network is shown in Figure 4, and the corresponding formula is as follows:

其中，该单元也可以被称作是Logistic回归模型。当将多个单元组合起来并具有分层结构时，就形成了神经网络模型。Wherein, this unit may also be called a Logistic regression model. When multiple units are combined and have a hierarchical structure, a neural network model is formed.

如图5所示，其展示的是一个具有一个隐含层的神经网络，这是很简单的三层神经网络模型，包括输入层、隐含层以及输出层。类似的可以根据需求将隐含层扩展到三层、四层或者更多层。以上的神经网络模型公式如下：As shown in Figure 5, it shows a neural network with one hidden layer, which is a very simple three-layer neural network model, including an input layer, a hidden layer, and an output layer. Similarly, the hidden layer can be extended to three, four or more layers according to requirements. The above neural network model formula is as follows:

一种手势识别家庭影院系统的运作方法，包括以下步骤：An operation method of a gesture recognition home theater system, comprising the following steps:

1)拾音模块接收并识别开机的语音信号，识别成功后发送开机信号到控制模块，控制模块对影音设备和Kinect摄像机发出开机指令；1) The sound pick-up module receives and recognizes the voice signal of starting up, and sends the starting up signal to the control module after the recognition is successful, and the control module sends a starting up command to the audio-visual equipment and the Kinect camera;

2)Kinect摄像机接收并识别手势图像，识别成功后发送识别结果到控制模块，控制模块对影音设备发出操作指令。2) The Kinect camera receives and recognizes the gesture image, and sends the recognition result to the control module after the recognition is successful, and the control module sends an operation command to the audio-visual equipment.

上述步骤2)中，Kinect摄像机接收到手势图像后，对手势图像中的深度图像进行识别，包括以下步骤：Above-mentioned step 2) in, after Kinect camera receives gesture image, the depth image in gesture image is identified, comprises the following steps:

上述步骤b)中经过噪声滤波操作后的深度图像利用基于肤色和基于手部形状轮廓两种方法对其进行分割。The depth image after the noise filtering operation in the above step b) is segmented using two methods based on skin color and based on the shape of the hand.

Claims

1. A gesture recognition home theater system, characterized in that:

It includes a sound pickup module, a control module, audio-visual equipment and a camera, and the control module is respectively connected with the sound pickup module, audio-visual equipment and the camera;

The sound pick-up module is used to identify the start-up voice signal, and sends the start-up signal to the control module after the recognition is successful;

The control module is used for receiving the power-on signal of the pickup module, and then sends a power-on command to the audio-visual equipment and the camera; the control module is used to receive the signal of the camera, and then sends an operation command to the audio-visual equipment;

A gesture recognition module is provided in the camera, and the gesture recognition module is connected with the control module;

The gesture recognition module is used to recognize the gesture image signal captured by the camera, and send the recognition result to the control module.

2. The gesture recognition home theater system according to claim 1, wherein the camera is a Kinect camera.

3. The gesture recognition home theater system according to claim 2, characterized in that: the gesture recognition module includes an image preprocessing module, a gesture segmentation module, a gesture feature extraction module and a gesture matching module, the image preprocessing module, The gesture segmentation module, the gesture feature extraction module and the gesture matching module are connected in sequence.

4. The gesture recognition home theater system according to claim 3, wherein the gesture segmentation module includes a skin color segmentation module and an outline segmentation module, and the skin color segmentation module is connected to the outline segmentation module.

5. The gesture recognition home theater system according to claim 4, wherein a gesture library is provided in the gesture matching module, and the gesture library contains more than one gesture, and each gesture corresponds to a different operation command, The operation command is used to send to the control module.

6. The gesture recognition home theater system according to claim 5, characterized in that: the gesture library contains 14 different gestures, and the 14 different gestures respectively correspond to the previous song or the previous channel and the previous song Track or previous channel, decrease volume by one level, increase volume by one level, decrease volume by five levels, increase volume by five levels, pop-up menu for system settings, confirm selected option, select previous option, select next option, mute , release mute, pause the video being played and shut down the operating instructions.

7. An operation method based on the gesture recognition home theater system according to claims 1-6, characterized in that a convolutional neural network is used in the operation method.

8. The operation method of the gesture recognition home theater system according to claim 7, comprising the following steps:

1) The sound pick-up module receives and recognizes the voice signal for power-on, and sends the power-on signal to the control module after the recognition is successful, and the control module sends a power-on command to the audio-visual equipment and the camera;

2) The camera receives and recognizes the gesture image, and sends the recognition result to the control module after the recognition is successful, and the control module sends an operation command to the audio-visual equipment.

9. The operation method of the gesture recognition home theater system according to claim 8, characterized in that: in the step 2), after the camera receives the gesture image, recognizing the depth image in the gesture image comprises the following steps:

a) performing a noise filtering operation on the depth image;

b) Segmenting the depth image after the noise filtering operation;

c) Extracting depth image features on the segmented depth image as samples;

d) Perform one-to-many comparison and matching between the extracted depth image feature samples and the data in the gesture library, and match the corresponding operation instructions.

10. The operation method of the gesture recognition home theater system according to claim 9, characterized in that: the depth image after the noise filtering operation in the step b) is aligned by using two methods based on skin color and based on the shape of the hand to split.